一种组合型的深度学习模型学习率策略

贺昱曜; 李宝奇

doi:10.16383/j.aas.2016.c150681

一种组合型的深度学习模型学习率策略

doi: 10.16383/j.aas.2016.c150681

贺昱曜,
李宝奇^,

西北工业大学航海学院西安 710072

基金项目:

国家自然科学基金 61271143

详细信息

作者简介:
贺昱曜西北工业大学教授. 主要研究方向为智能控制与非线性控制理论, 精确制导与仿真, 信息融合, 现代电力电子技术与功率变换理论. E-mail: heyyao@nwpu.edu.cn

通讯作者:
李宝奇西北工业大学博士研究生. 主要研究方向为目标检测、识别和跟踪, 信息融合, 深度学习. 本文通信作者. E-mail: bqli@mail.nwpu.edu.cn

计量
- 文章访问数: 3257
- HTML全文浏览量: 683
- PDF下载量: 1955
- 被引次数: 0
出版历程
- 收稿日期: 2015-10-20
- 录用日期: 2016-04-01
- 刊出日期: 2016-06-20

A Combinatory Form Learning Rate Scheduling for Deep Learning Model

HE Yu-Yao,
LI Bao-Qi^,

School of Marine Science and Technology, Northwestern Polytechnical University, Xi'an 710072

Funds:

National Natural Science Foundation of China 61271143

More Information

Author Bio:
HE Yu-Yao Professor at Northwestern Polytechnical Univer-sity. His research interest covers intelligent control and nonlinear control theory, precision guidance and simulation, information fusion, modern power electronics technology, and power trans-formation theory

Corresponding author: LI Bao-Qi Ph. D. candidate at Northwestern Polytechnical University. His research interest covers target detection, recog nition and tracking, information fusion, and deep learning. Cor responding author of this paper

摘要

摘要: 一个设计良好的学习率策略可以显著提高深度学习模型的收敛速度, 减少模型的训练时间. 本文针对AdaGrad和AdaDec学习策略只对模型所有参数提供单一学习率方式的问题, 根据模型参数的特点, 提出了一种组合型学习策略: AdaMix. 该策略为连接权重设计了一个仅与当前梯度有关的学习率, 为偏置设计使用了幂指数型学习率.利用深度学习模型Autoencoder对图像数据库MNIST进行重构, 以模型反向微调过程中测试阶段的重构误差作为评价指标, 验证几种学习策略对模型收敛性的影响.实验结果表明, AdaMix比AdaGrad和AdaDec的重构误差小并且计算量也低, 具有更快的收敛速度.
- 深度学习 /
- 学习率 /
- 组合学习策略 /
- 图像重构
Abstract: A good learning rate scheduling can significantly improve the convergence rate of the deep learning model and reduce the training time. The AdaGrad and AdaDec learning strategies only provide a single form learning rate for all the parameters of the deep learning model. In this paper, AdaMix is proposed. According to the characteristics of the model parameters, and a learning rate form which is only based on the current epoch gradient is designed for the connection weights, a power exponential learning rate form is used for the bias. The test reconstruction error in the fine-turning phase of the deep learning model is used as the evaluation index. In order to verify the convergence of the deep learning based on different learning rate strategies, Autoencoder, a deep learning model, is trained to restructure the MNIST database. The experimental results show that Adamix has the lowest reconstruction error and minimum calculation compared with AdaGrad and AdaDec, so the deep learning model can quickly converge by using AdaMix.
- Deep learning /
- learning rate /
- combined learning scheduling /
- image reconstruction

HTML全文

图 1 Autoencoder 模型的训练过程

Fig. 1 The training process of Autoencoder model

下载: 全尺寸图片幻灯片

图 2 RBM 的结构图

Fig. 2 The network graph of an RBM

下载: 全尺寸图片幻灯片

图 3 人工神经元结构

Fig. 3 The network graph of an arti¯cial neuron

下载: 全尺寸图片幻灯片

图 4 AdaMix 与其他三种方法的收敛性能比较

Fig. 4 Comparison of the convergence performance of AdaMix and other three methods

下载: 全尺寸图片幻灯片

图 5 权重和偏置对深度学习模型收敛性的影响

Fig. 5 The in°uence of weight and bias on the convergence of deep learning model

下载: 全尺寸图片幻灯片

图 6 不同学习率对深度学习模型权重的影响

Fig. 6 The in°uence of di®erent learning rates on the weight of deep learning model

下载: 全尺寸图片幻灯片

图 7 不同学习率对深度学习模型偏置的影响

Fig. 7 The in°uence of di®erent learning rates on the bias of deep learning model

下载: 全尺寸图片幻灯片

图 8 不同数据量下的AdaMix 对深度学习模型收敛性能的影响

Fig. 8 The convergence of deep learning model under AdaMix in di®erent scale data sets

下载: 全尺寸图片幻灯片

参考文献(18)

[1]	Hinton G. Where do features come from? Cognitive Science, 2014, 38(6): 1078-1101
[2]	LeCun Y, Bengio Y, Hinton G. Deep learning. Nature, 2015, 521(7553): 436-444
[3]	Mnih V, Kavukcuoglu K, Silver D, Rusu A A, Veness J, Bellemare M G, Graves A, Riedmiller M, Fidjeland A K, Ostrovski G, Petersen S, Beattie C, Sadik A, Antonoglou I, King H, Kumaran D, Wierstra D, Legg S, Hassabis D. Human-level control through deep reinforcement learning. Nature, 2015, 518(7540): 529-533
[4]	Schmidhuber J. Deep learning in neural networks: an overview. Neural Networks, 2015, 61(7553): 85-117
[5]	(高莹莹, 朱维彬. 深层神经网络中间层可见化建模. 自动化学报, 2015, 41(9): 1627-1637) Gao Ying-Ying, Zhu Wei-Bin. Deep neural networks with visible intermediate layers. Acta Automatica Sinica, 2015, 41(9): 1627-1637
[6]	(乔俊飞, 潘广源, 韩红桂. 一种连续型深度信念网的设计与应用. 自动化学报, 2015, 41(12): 2138-2146) Qiao Jun-Fei, Pan Guang-Yuan, Han Hong-Gui. Design and application of continuous deep belief network. Acta Automatica Sinica, 2015, 41(12): 2138-2146
[7]	Yu D, Deng L. Deep learning and its applications to signal and information processing. IEEE Signal Processing Magazine, 2011, 28(1): 145-154
[8]	Hinton G E, Salakhutdinov R R. Reducing the dimensionality of data with neural networks. Science, 2006, 313(5786): 504-507
[9]	Duchi J, Hazan E, Singer Y. Adaptive subgradient methods for online learning and stochastic optimization. The Journal of Machine Learning Research, 2011, 12: 2121-2159
[10]	Senior A, Heigold G, Ranzato M A, Yang K. An empirical study of learning rates in deep neural networks for speech recognition. In: Proceedings of the 2013 IEEE International Conference on Acoustics, Speech, and Signal Processing. Vancouver, BC: IEEE, 2013. 6724-6728
[11]	Hinton G E, Dayan P, Frey B J, Neal R M. The "wake-sleep" algorithm for unsupervised neural networks. Science, 1995, 268(5214): 1158-1161
[12]	Hinton G E, Osindero S, Teh Y W. A fast learning algorithm for deep belief nets. Neural Computation, 2006, 18(7): 1527-1554
[13]	Fischer A, Igel C. Training restricted Boltzmann machines: an introduction. Pattern Recognition, 2014, 47(1): 25-39
[14]	Salakhutdinov R, Hinton G. An efficient learning procedure for deep Boltzmann machines. Neural Computation, 2012, 24(8): 1967-2006
[15]	Robbins H, Monro S. A stochastic approximation method. The Annals of Mathematical Statistics, 1951, 22(3): 400-407
[16]	You Z, Wang X R, Xu B. Exploring one pass learning for deep neural network training with averaged stochastic gradient descent. In: Proceedings of the 2014 IEEE International Conference on Acoustics, Speech, and Signal Processing. Florence, Italy: IEEE, 2014. 6854-6858
[17]	Klein S, Pluim J P W, Staring M, Viergever M A. Adaptive stochastic gradient descent optimisation for image registration. International Journal of Computer Vision, 2009, 81(3): 227-239
[18]	Shapiro A, Wardi Y. Convergence analysis of gradient descent stochastic algorithms. Journal of Optimization Theory and Applications, 1996, 91(2): 439-454

施引文献

资源附件(0)

访问统计

图(8)

计量

文章访问数: 3257
HTML全文浏览量: 683
PDF下载量: 1955
被引次数: 0

姓名
邮箱
手机号码
标题
留言内容
验证码

留言板

一种组合型的深度学习模型学习率策略

doi: 10.16383/j.aas.2016.c150681

作者简介:
贺昱曜西北工业大学教授. 主要研究方向为智能控制与非线性控制理论, 精确制导与仿真, 信息融合, 现代电力电子技术与功率变换理论. E-mail: heyyao@nwpu.edu.cn

通讯作者:
李宝奇西北工业大学博士研究生. 主要研究方向为目标检测、识别和跟踪, 信息融合, 深度学习. 本文通信作者. E-mail: bqli@mail.nwpu.edu.cn

计量

A Combinatory Form Learning Rate Scheduling for Deep Learning Model

Author Bio:
HE Yu-Yao Professor at Northwestern Polytechnical Univer-sity. His research interest covers intelligent control and nonlinear control theory, precision guidance and simulation, information fusion, modern power electronics technology, and power trans-formation theory

Corresponding author: LI Bao-Qi Ph. D. candidate at Northwestern Polytechnical University. His research interest covers target detection, recog nition and tracking, information fusion, and deep learning. Cor responding author of this paper

计量

目录

留言板

一种组合型的深度学习模型学习率策略

doi: 10.16383/j.aas.2016.c150681

作者简介: 贺昱曜 西北工业大学教授. 主要研究方向为智能控制与非线性控制理论, 精确制导与仿真, 信息融合, 现代电力电子技术与功率变换理论. E-mail: heyyao@nwpu.edu.cn

通讯作者: 李宝奇 西北工业大学博士研究生. 主要研究方向为目标检测、识别和跟踪, 信息融合, 深度学习. 本文通信作者. E-mail: bqli@mail.nwpu.edu.cn

计量

出版历程

A Combinatory Form Learning Rate Scheduling for Deep Learning Model

Author Bio: HE Yu-Yao Professor at Northwestern Polytechnical Univer-sity. His research interest covers intelligent control and nonlinear control theory, precision guidance and simulation, information fusion, modern power electronics technology, and power trans-formation theory

Corresponding author: LI Bao-Qi Ph. D. candidate at Northwestern Polytechnical University. His research interest covers target detection, recog nition and tracking, information fusion, and deep learning. Cor responding author of this paper

计量

出版历程

目录

作者简介:
贺昱曜西北工业大学教授. 主要研究方向为智能控制与非线性控制理论, 精确制导与仿真, 信息融合, 现代电力电子技术与功率变换理论. E-mail: heyyao@nwpu.edu.cn

通讯作者:
李宝奇西北工业大学博士研究生. 主要研究方向为目标检测、识别和跟踪, 信息融合, 深度学习. 本文通信作者. E-mail: bqli@mail.nwpu.edu.cn

Author Bio:
HE Yu-Yao Professor at Northwestern Polytechnical Univer-sity. His research interest covers intelligent control and nonlinear control theory, precision guidance and simulation, information fusion, modern power electronics technology, and power trans-formation theory