2.765

2022影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

一种组合型的深度学习模型学习率策略

贺昱曜 李宝奇

贺昱曜, 李宝奇. 一种组合型的深度学习模型学习率策略. 自动化学报, 2016, 42(6): 953-958. doi: 10.16383/j.aas.2016.c150681
引用本文: 贺昱曜, 李宝奇. 一种组合型的深度学习模型学习率策略. 自动化学报, 2016, 42(6): 953-958. doi: 10.16383/j.aas.2016.c150681
HE Yu-Yao, LI Bao-Qi. A Combinatory Form Learning Rate Scheduling for Deep Learning Model. ACTA AUTOMATICA SINICA, 2016, 42(6): 953-958. doi: 10.16383/j.aas.2016.c150681
Citation: HE Yu-Yao, LI Bao-Qi. A Combinatory Form Learning Rate Scheduling for Deep Learning Model. ACTA AUTOMATICA SINICA, 2016, 42(6): 953-958. doi: 10.16383/j.aas.2016.c150681

一种组合型的深度学习模型学习率策略

doi: 10.16383/j.aas.2016.c150681
基金项目: 

国家自然科学基金 61271143

详细信息
    作者简介:

    贺昱曜 西北工业大学教授. 主要研究方向为智能控制与非线性控制理论, 精确制导与仿真, 信息融合, 现代电力电子技术与功率变换理论. E-mail: heyyao@nwpu.edu.cn

    通讯作者:

    李宝奇 西北工业大学博士研究生. 主要研究方向为目标检测、识别和跟踪, 信息融合, 深度学习. 本文通信作者. E-mail: bqli@mail.nwpu.edu.cn

A Combinatory Form Learning Rate Scheduling for Deep Learning Model

Funds: 

National Natural Science Foundation of China 61271143

More Information
    Author Bio:

    HE Yu-Yao Professor at Northwestern Polytechnical Univer-sity. His research interest covers intelligent control and nonlinear control theory, precision guidance and simulation, information fusion, modern power electronics technology, and power trans-formation theory

    Corresponding author: LI Bao-Qi Ph. D. candidate at Northwestern Polytechnical University. His research interest covers target detection, recog nition and tracking, information fusion, and deep learning. Cor responding author of this paper
  • 摘要: 一个设计良好的学习率策略可以显著提高深度学习模型的收敛速度, 减少模型的训练时间. 本文针对AdaGrad和AdaDec学习策略只对模型所有参数提供单一学习率方式的问题, 根据模型参数的特点, 提出了一种组合型学习策略: AdaMix. 该策略为连接权重设计了一个仅与当前梯度有关的学习率, 为偏置设计使用了幂指数型学习率.利用深度学习模型Autoencoder对图像数据库MNIST进行重构, 以模型反向微调过程中测试阶段的重构误差作为评价指标, 验证几种学习策略对模型收敛性的影响.实验结果表明, AdaMix比AdaGrad和AdaDec的重构误差小并且计算量也低, 具有更快的收敛速度.
  • 图  1  Autoencoder 模型的训练过程

    Fig.  1  The training process of Autoencoder model

    图  2  RBM 的结构图

    Fig.  2  The network graph of an RBM

    图  3  人工神经元结构

    Fig.  3  The network graph of an arti¯cial neuron

    图  4  AdaMix 与其他三种方法的收敛性能比较

    Fig.  4  Comparison of the convergence performance of AdaMix and other three methods

    图  5  权重和偏置对深度学习模型收敛性的影响

    Fig.  5  The in°uence of weight and bias on the convergence of deep learning model

    图  6  不同学习率对深度学习模型权重的影响

    Fig.  6  The in°uence of di®erent learning rates on the weight of deep learning model

    图  7  不同学习率对深度学习模型偏置的影响

    Fig.  7  The in°uence of di®erent learning rates on the bias of deep learning model

    图  8  不同数据量下的AdaMix 对深度学习模型收敛性能的影响

    Fig.  8  The convergence of deep learning model under AdaMix in di®erent scale data sets

  • [1] Hinton G. Where do features come from? Cognitive Science, 2014, 38(6): 1078-1101
    [2] LeCun Y, Bengio Y, Hinton G. Deep learning. Nature, 2015, 521(7553): 436-444
    [3] Mnih V, Kavukcuoglu K, Silver D, Rusu A A, Veness J, Bellemare M G, Graves A, Riedmiller M, Fidjeland A K, Ostrovski G, Petersen S, Beattie C, Sadik A, Antonoglou I, King H, Kumaran D, Wierstra D, Legg S, Hassabis D. Human-level control through deep reinforcement learning. Nature, 2015, 518(7540): 529-533
    [4] Schmidhuber J. Deep learning in neural networks: an overview. Neural Networks, 2015, 61(7553): 85-117
    [5] (高莹莹, 朱维彬. 深层神经网络中间层可见化建模. 自动化学报, 2015, 41(9): 1627-1637)

    Gao Ying-Ying, Zhu Wei-Bin. Deep neural networks with visible intermediate layers. Acta Automatica Sinica, 2015, 41(9): 1627-1637
    [6] (乔俊飞, 潘广源, 韩红桂. 一种连续型深度信念网的设计与应用. 自动化学报, 2015, 41(12): 2138-2146)

    Qiao Jun-Fei, Pan Guang-Yuan, Han Hong-Gui. Design and application of continuous deep belief network. Acta Automatica Sinica, 2015, 41(12): 2138-2146
    [7] Yu D, Deng L. Deep learning and its applications to signal and information processing. IEEE Signal Processing Magazine, 2011, 28(1): 145-154
    [8] Hinton G E, Salakhutdinov R R. Reducing the dimensionality of data with neural networks. Science, 2006, 313(5786): 504-507
    [9] Duchi J, Hazan E, Singer Y. Adaptive subgradient methods for online learning and stochastic optimization. The Journal of Machine Learning Research, 2011, 12: 2121-2159
    [10] Senior A, Heigold G, Ranzato M A, Yang K. An empirical study of learning rates in deep neural networks for speech recognition. In: Proceedings of the 2013 IEEE International Conference on Acoustics, Speech, and Signal Processing. Vancouver, BC: IEEE, 2013. 6724-6728
    [11] Hinton G E, Dayan P, Frey B J, Neal R M. The "wake-sleep" algorithm for unsupervised neural networks. Science, 1995, 268(5214): 1158-1161
    [12] Hinton G E, Osindero S, Teh Y W. A fast learning algorithm for deep belief nets. Neural Computation, 2006, 18(7): 1527-1554
    [13] Fischer A, Igel C. Training restricted Boltzmann machines: an introduction. Pattern Recognition, 2014, 47(1): 25-39
    [14] Salakhutdinov R, Hinton G. An efficient learning procedure for deep Boltzmann machines. Neural Computation, 2012, 24(8): 1967-2006
    [15] Robbins H, Monro S. A stochastic approximation method. The Annals of Mathematical Statistics, 1951, 22(3): 400-407
    [16] You Z, Wang X R, Xu B. Exploring one pass learning for deep neural network training with averaged stochastic gradient descent. In: Proceedings of the 2014 IEEE International Conference on Acoustics, Speech, and Signal Processing. Florence, Italy: IEEE, 2014. 6854-6858
    [17] Klein S, Pluim J P W, Staring M, Viergever M A. Adaptive stochastic gradient descent optimisation for image registration. International Journal of Computer Vision, 2009, 81(3): 227-239
    [18] Shapiro A, Wardi Y. Convergence analysis of gradient descent stochastic algorithms. Journal of Optimization Theory and Applications, 1996, 91(2): 439-454
  • 加载中
图(8)
计量
  • 文章访问数:  2861
  • HTML全文浏览量:  503
  • PDF下载量:  1941
  • 被引次数: 0
出版历程
  • 收稿日期:  2015-10-20
  • 录用日期:  2016-04-01
  • 刊出日期:  2016-06-20

目录

    /

    返回文章
    返回