• 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于分层策略强化学习的多类型流量差异化路由优化

赵之栩 刘坤 王璐瑶 夏元清

赵之栩, 刘坤, 王璐瑶, 夏元清. 基于分层策略强化学习的多类型流量差异化路由优化. 自动化学报, xxxx, xx(x): x−xx doi: 10.16383/j.aas.c250413
引用本文: 赵之栩, 刘坤, 王璐瑶, 夏元清. 基于分层策略强化学习的多类型流量差异化路由优化. 自动化学报, xxxx, xx(x): x−xx doi: 10.16383/j.aas.c250413
Zhao Zhi-Xu, Liu Kun, Wang Lu-Yao, Xia Yuan-Qing. Differentiated routing optimization for multi-type traffic based on hierarchical policy reinforcement learning. Acta Automatica Sinica, xxxx, xx(x): x−xx doi: 10.16383/j.aas.c250413
Citation: Zhao Zhi-Xu, Liu Kun, Wang Lu-Yao, Xia Yuan-Qing. Differentiated routing optimization for multi-type traffic based on hierarchical policy reinforcement learning. Acta Automatica Sinica, xxxx, xx(x): x−xx doi: 10.16383/j.aas.c250413

基于分层策略强化学习的多类型流量差异化路由优化

doi: 10.16383/j.aas.c250413 cstr: 32138.14.j.aas.c250413
详细信息
    作者简介:

    赵之栩:北京理工大学自动化学院硕士研究生. 主要研究方向为路由优化, 对抗攻击, 机器学习. E-mail: zhixuzhaobit@163.com

    刘坤:北京理工大学自动化学院研究员. 主要研究方向为网络化控制理论与应用, 复杂网络安全. 本文通信作者. E-mail: kunliubit@bit.edu.cn

    王璐瑶:北京理工大学自动化学院硕士研究生. 主要研究方向为对抗攻击与防御, 时间序列, 网络安全. E-mail: luyaowangbit@gmail.com

    夏元清:北京理工大学自动化学院教授. 主要研究方向为云控制, 云数据中心优化调度管理, 智能交通, 模型预测控制, 自抗扰控制, 鲁棒控制, 复杂网络控制与安全, 网络化控制理论与应用, 飞行器控制和空天地一体化网络协同控制. E-mail: xia_yuanqing@bit.edu.cn

Differentiated Routing Optimization for Multi-type Traffic Based on Hierarchical Policy Reinforcement Learning

More Information
    Author Bio:

    ZHAO Zhi-Xu Master student at the School of Automation, Beijing Institute of Technology. His research interests include routing optimization, adversarial attacks, and machine learning

    LIU Kun Professor at the School of Automation, Beijing Institute of Technology. His research interests include theory and applications of networked control, and security of complex networked systems. Corresponding author of this paper

    WANG Lu-Yao Master student at the School of Automation, Beijing Institute of Technology. Her research interests include adversarial attacks and defense, time series and cyber security

    XIA Yuan-Qing Professor at the School of Automation, Beijing Institute of Technology. His research interests include cloud control, cloud data center optimization scheduling and management, intelligent transportation, model predictive control, active disturbance rejection control, robust control, control and security of complex networked systems, theory and applications of networked control, flight control and networked cooperative control for integration of space, air and earth

  • 摘要: 路由是优化网络资源分配的重要方法. 然而, 传统路由算法依赖静态策略优化单一服务质量指标, 难以应对多类型流量爆发性增长下的差异化需求. 尽管深度强化学习为动态网络环境下的路由优化提供了新思路, 现有方法仍缺乏对流量类型的精细化感知能力, 无法灵活调整路由策略. 为此, 本文针对不同类型流量的差异化路由需求, 设计一种基于分层策略强化学习的流量感知路由算法. 首先, 引入流量分类模块, 实现对不同流量差异化业务需求的精细感知. 其次, 利用图卷积网络对网络拓扑进行高效建模, 并在此基础上设计分层决策网络以及差异化奖励函数, 引导智能体生成自适应路由决策, 实现对各流量类别路由策略的动态调整. 同时, 在演员-评论家框架中引入全局注意力机制, 增强智能体对网络状态时空依赖关系的建模能力, 并通过广义优势估计和近端策略优化算法提升训练的效率与稳定性. 最后, 在多种拓扑网络上验证了所提算法的有效性.
  • 图  1  多路径路由模型

    Fig.  1  Multi-path routing model

    图  2  智能路由系统框架

    Fig.  2  Intelligent routing system framework

    图  3  TR-HPRL算法模型

    Fig.  3  TR-HPRL algorithm model

    图  4  1D-CNN分类模型

    Fig.  4  1D-CNN classification model

    图  5  基于流量类型的多头注意力决策网络

    Fig.  5  Multi-head attention decision-making network based on traffic type

    图  6  网络拓扑图

    Fig.  6  Network topology

    图  7  分类结果混淆矩阵

    Fig.  7  Classification result confusion matrix

    图  8  actor网络损失收敛曲线

    Fig.  8  actor network loss convergence curve

    图  9  critic网络损失收敛曲线

    Fig.  9  critic network loss convergence curve

    图  10  奖励值收敛曲线

    Fig.  10  Reward value convergence curve

    图  11  不同缩放因子下的奖励值

    Fig.  11  Rewards at different scaling factors

    图  12  分类误差率对QoS性能影响

    Fig.  12  Impact of classification error rate on QoS performance

    图  13  丢包奖励

    Fig.  13  Packet loss reward

    图  14  延迟奖励

    Fig.  14  Delay reward

    表  1  主要符号及其含义

    Table  1  Main notations and their meanings

    符号含义
    $ G $网络拓扑
    $ {\cal{V}} $节点集合
    $ {\cal{E}} $有向链路集合
    $ N $、$ M $节点与链路的数量
    $ \lambda $流的到达速率
    $ \varphi $节点的服务速率
    $ \alpha $流量分割比
    $ \rho $节点利用率
    $ P $节点丢包率
    $ D $节点/路径延迟
    $ U_{QoS} $QoS感知效用函数
    $ {\cal{S}} $马尔科夫决策过程(Markov decision process, MDP)状态空间
    $ {\cal{A}} $MDP中动作空间
    $ {\cal{R}} $MDP中奖励函数
    $ \gamma $折扣因子
    $ \eta $流量类型
    下载: 导出CSV

    表  2  服务类型分类

    Table  2  Service type classification

    类别 类型应用 主要特征
    延迟敏感型 网络电话、在线聊天和音频流 对实时性要求较高
    丢包敏感型 视频流、点对点传输和文件传输 需要较高带宽传输
    容错型 电子邮件和网页浏览 对QoS要求不明显或
    介于上述两者之间
    下载: 导出CSV

    表  3  缩放因子默认值

    Table  3  Default values of scaling factors

    类别$ \beta_{P, \delta_{n}} $$ \beta_{D, \delta_{n}} $
    延迟敏感0.20.8
    丢包敏感0.80.2
    容错型0.50.5
    下载: 导出CSV

    表  4  流量分类结果

    Table  4  Results of traffic classification

    类别$ Pr $$ Rc $
    延迟敏感型1.000.97
    丢包敏感型0.990.98
    容错型0.951.00
    下载: 导出CSV

    表  5  算法模块消融实验结果

    Table  5  Results of algorithm module

    算法模型$ r_P $$ r_D $
    TR-RL0.880.42
    R-HPRL0.920.45
    TR-HPRL$ \underline{0.94} $$ \underline{0.48} $
    下载: 导出CSV
  • [1] 第55次《中国互联网络发展状况统计报告》发布. 传媒论坛, 2025: 121
    [2] 2024年通信业统计公报. 通信企业管理, 2025: 22-26
    [3] 中国信息通信研究院. 中国数字经济发展研究报告(2024年). 2024.
    [4] 蔡岳平, 姚宗辰, 李天驰. 时间敏感网络标准与研究综述. 计算机学报, 2021, 44(07): 1378−1397 doi: 10.11897/SP.J.1016.2021.01378

    Cai Yue-Ping, Yao Zong-Chen, Li Tian-Chi. A Survey on Time-Sensitive Networking: Standards and State-of-the-Art. Chinese Journal of Computers, 2021, 44(07): 1378−1397 doi: 10.11897/SP.J.1016.2021.01378
    [5] 李永福, 何昌鹏, 朱浩, 郑太雄. 通信延时环境下异质网联车辆队列非线性纵向控制. 自动化学报, 2021, 47(12): 2841−2856 doi: 10.16383/j.aas.c190442

    Li Yong-Fu, He Chang-Peng, Zhu Hao, Zheng Tai-Xiong. Nonlinear longitudinal control for heterogeneous connected vehicle platoon in the presence of communication delays. Acta Automatica Sinica, 2021, 47(12): 2841−2856 doi: 10.16383/j.aas.c190442
    [6] Moy J. OSPF version 2. 1997.
    [7] Zhang H, Guo X, Yan J, Liu B, Shuai Q J. SDN-based ECMPalgorithm for data center networks. In: Proceedings of the 1st IEEE Computers, Communications and IT Applications Conference. Beijing, China, 2014: 13-18
    [8] Mestres A, Rodriguez-Natal A, Carner J, Barlet-Ros P, Alarcón E, Solé M, et al. Knowledge-defined networking. ACM SIGCOMM Computer Communication Review, 2017, 47(3): 2−10 doi: 10.1145/3138808.3138810
    [9] Ashtari S, Zhou I, Abolhasan M, Shariati N, Lipman J, Ni W. Knowledge-defined networking: Applications, challenges and future work. Array, 2022, 14: 100136 doi: 10.1016/j.array.2022.100136
    [10] Akyildiz I F, Lee A, Wang P, Luo M, Chou W. A roadmap for traffic engineering in SDN-OpenFlow networks. Computer Networks, 2014, 71: 1−30 doi: 10.1016/j.comnet.2014.06.002
    [11] Singh S, Jha R K. A Survey on Software Defined Networking: Architecture for Next Generation Network. Journal of Network and Systems Management, 2017, 25(2): 321−374 doi: 10.1007/s10922-016-9393-9
    [12] 夏元清. 云控制系统及其面临的挑战. 自动化学报, 2016, 42(1): 1−12

    Xia Yuan-Qing. Cloud control systems and their challenges. Acta Automatica Sinica, 2016, 42(1): 1−12
    [13] Rusek K, Suárez-Varela J, Almasan P, Barlet-Ros P, Cabellos-Aparicio A. RouteNet: Leveraging graph neural networks for network modeling and optimization in SDN. IEEE Journal on Selected Areas in Communications, 2020, 38(10): 2260−2270 doi: 10.1109/JSAC.2020.3000405
    [14] Azzouni A, Pujolle G. NeuTM: A neural network-based framework for traffic matrix prediction in SDN. In: Proceedings of the 37th IEEE/IFIP Network Operations and Management Symposium(NOMS). Taipei, Taiwan: IEEE, 2018: 1-5.
    [15] Xu Z Y, Tang J, Meng J S, Zhang W Y, Wang Y Z, Liu C H. Experience-driven networking: A deep reinforcement learning based approach. In: Proceedings of the 37th IEEE International Conference on Computer Communications. Honolulu, HI, USA, 2018: 1871-1879
    [16] Dai B, Cao Y Y, Wu Z L, Xu Y. IQoR-LSE: an intelligent QoS on-demand routing algorithm with link state estimation. IEEE Systems Journal, 2022, 16(4): 5821−5830 doi: 10.1109/JSYST.2022.3149990
    [17] Yin X, Wu D, Wang Z L, Shi X G, Wu J P. DIMR: Disjoint interdomain multipath routing. Computer Networks, 2015, 91: 356−375 doi: 10.1016/j.comnet.2015.08.028
    [18] Li J, Giotsas V, Wang Y Y, Zhou S. Bgp-multipath routing in the internet. IEEE Transactions on Network and Service Management, 2022, 19(3): 2812−2826 doi: 10.1109/tnsm.2022.3177471
    [19] Singh R, Singh Y N, Yadav A. Loop free multipath routing algorithm. arXiv preprint arXiv: 1601.01245, 2016.
    [20] Lutimath N M, Suresh L, Naikodi C. Efficient power aware multipath routing protocol for MANETs. In: Proceedings of the 2nd IEEE International Conference on Circuits, Controls, Communications and Computing. Bangalore, India, 2016: 1-4
    [21] Chen C, Xue F F, Lu Z Y, Tang Z Y, Li C H. Rlmr: Reinforcement learning based multipath routing for sdn. Wireless Communications and Mobile Computing, 2022, 2022(1): 5124960 doi: 10.1155/2022/5124960
    [22] Gurusamy U, Hariharan K, Manikandan M. Path optimization of box-covering based routing to minimize average packet delay in software defined network. Peer-to-Peer Networking and Applications, 2020, 13: 932−939 doi: 10.1007/s12083-019-00855-8
    [23] Prabhavat S, Nishiyama H, Ansari N, Kato N. On load distribution over multipath networks. IEEE Communications Surveys and Tutorials, 2011, 14(3): 662−680 doi: 10.1109/surv.2011.082511.00013
    [24] He J Y, Rexford J. Toward internet-wide multipath routing. IEEE Network, 2008, 22(2): 16−21 doi: 10.1109/MNET.2008.4476066
    [25] Deng G C, Wang K C. An application-aware QoS routing algorithm for SDN-based IoT networking. In: Proceedings of the 23rd IEEE Symposium on Computers and Communications. Natal, Brazil, 2018: 186-191
    [26] Lin X J, Shroff N B. Utility maximization for communication networks with multipath routing. IEEE Transactions on Automatic Control, 2006, 51(5): 766−781 doi: 10.1109/TAC.2006.875032
    [27] 李凯文, 张涛, 王锐, 覃伟健, 贺惠晖, 黄鸿. 基于深度强化学习的组合优化研究进展. 自动化学报, 2021, 47(11): 2521−2537 doi: 10.16383/j.aas.c200551

    Li Kai-Wen, Zhang Tao, Wang Rui, Qin Wei-Jian, He Hui-Hui, Huang Hong. Research reviews of combinatorial optimization methods based on deep reinforcement learning. Acta Automatica Sinica, 2021, 47(11): 2521−2537 doi: 10.16383/j.aas.c200551
    [28] Bao K, Matyjas J D, Hu F, Kumar S. Intelligent software-defined mesh networks with link-failure adaptive traffic balancing. IEEE Transactions on Cognitive Communications and Networking, 2018, 4(2): 266−276 doi: 10.1109/TCCN.2018.2790974
    [29] Zou G B, Li T F, Jiang M, Hu S X, Cao C H, Zhang B F, et al. DeepTSQP: Temporal-aware service QoS prediction via deep ne Bural network and feature integration. Knowledge-Based Systems, 2022, 241: 108062 doi: 10.1016/j.knosys.2021.108062
    [30] De Assis M V, Carvalho L F, Rodrigues J J, Lloret J, Proença Jr M L. Near real-time security system applied to SDN environments in IoT networks using convolutional neural network. Computers and Electrical Engineering, 2020, 86: 106738 doi: 10.1016/j.compeleceng.2020.106738
    [31] Arulkumaran K, Deisenroth M P, Brundage M, Bharath A A. Deep reinforcement learning: A brief survey. IEEE Signal Processing Magazine, 2017, 34(6): 26−38 doi: 10.1109/MSP.2017.2743240
    [32] Tang J Q, Mihailovic A, Aghvami H. Constructing a DRL decision making scheme for multi-path routing in all-IP access network. In: Proceedings of the 2022 IEEE Global Communications Conference. Rio de Janeiro, Brazil, 2022: 3623-3628
    [33] Altamirano J C, Guitouni M, Hassan H, Drira K. Routing optimization based on DRL and Generative Adversarial Networks for SDN environments. In: Proceedings of the IEEE Network Operations and Management Symposium. Athens, Greece, 2024: 1-5
    [34] Casas-Velasco D M, Rendon O M C, da Fonseca N L. Intelligent routing based on reinforcement learning for software-defined networking. IEEE Transactions on Network and Service Management, 2020, 18(1): 870−881 doi: 10.47749/t/unicamp.2020.1157788
    [35] Ye M H, Zhang J J, Guo Z H, Chao H J. Date: Disturbance-aware traffic engineering with reinforcement learning in software-defined networks. In: Proceedings of the 29th IEEE/ACM International Symposium on Quality of Service. Toronto, Canada, 2021: 1−10
    [36] Zhang J J, Ye M H, Guo Z H, Yen C Y, Chao H J. CFR-RL: Traffic engineering with reinforcement learning in SDN. IEEE Journal on Selected Areas in Communications, 2020, 38(10): 2249−2259 doi: 10.1109/JSAC.2020.3000371
    [37] Rezaei S, Liu X. Multitask learning for network traffic classification. In: Proceedings of the 29th International Conference on Computer Communications and Networks. Honolulu, HI, USA, 2020: 1−9
    [38] Zhang Y, Qiu L X, Xu Y Z, Wang X J, Wang S J, Paul A, et al. Multi-Path Routing Algorithm Based on Deep Reinforcement Learning for SDN. Applied Sciences, 2023, 13(22): 12520 doi: 10.3390/app132212520
    [39] He N, Yang S, Li F, Trajanovski S, Zhu L H, Wang Y. Leveraging Deep Reinforcement Learning With Attention Mechanism for Virtual Network Function Placement and Routing. IEEE Transactions on Parallel and Distributed Systems, 2023, 34(4): 1186−1201 doi: 10.1109/TPDS.2023.3240404
    [40] He Q, Wang Y, Wang X W, Xu W Q, Li F L, Yang K Q. Routing optimization with deep reinforcement learning in knowledge defined networking. IEEE Transactions on Mobile Computing, 2024, 23(2): 1444−1455 doi: 10.1109/TMC.2023.3235446
    [41] Ding M J, Guo Y Y, Huang Z B, Lin B, Luo H. GROM: A generalized routing optimization method with graph neural network and deep reinforcement learning. Journal of Network and Computer Applications, 2024, 229: 103927 doi: 10.1016/j.jnca.2024.103927
    [42] Xu Z Y, Yan F Y, Singh R, Chiu J T, Rush A M, Yu M L. Teal: Learning-Accelerated Optimization of WAN Traffic Engineering. Proceedings of the ACM SIGCOMM 2023 Conference, 2023378−393
    [43] Lin B, Guo Y Y, Luo H, Ding M J. TITE: A transformer-based deep reinforcement learning approach for traffic engineering in hybrid SDN with dynamic traffic. Future Generation Computer Systems, 2024, 161: 95−105 doi: 10.1016/j.future.2024.07.006
    [44] Hu F, Hao Q, Bao K. A survey on software-defined network and openflow: From concept to implementation. IEEE Communications Surveys and Tutorials, 2014, 16(4): 2181−2206 doi: 10.1109/COMST.2014.2326417
    [45] Richardson L, Ruby S. RESTful web services. ” O'Reilly Media, Inc.”, 2008.
    [46] Draper-Gil G, Lashkari A H, Mamun M S I, Ghorbani A A. Characterization of encrypted and VPN traffic using time-related. In: Proceedings of the 2nd International Conference on Information Systems Security and Privacy. Rome, Italy, 2016: 407−414
    [47] Hagberg A, Swart P, Schult D. Exploring network structure, dynamics, and function using NetworkX. Proceedings of the 7th Python in Science Conference, 2008.
  • 加载中
计量
  • 文章访问数:  15
  • HTML全文浏览量:  22
  • 被引次数: 0
出版历程
  • 收稿日期:  2025-08-28
  • 录用日期:  2025-12-24
  • 网络出版日期:  2026-01-20

目录

    /

    返回文章
    返回