[1] |
Hinton G. Where do features come from? Cognitive Science, 2014, 38(6): 1078-1101 |
[2] |
LeCun Y, Bengio Y, Hinton G. Deep learning. Nature, 2015, 521(7553): 436-444 |
[3] |
Mnih V, Kavukcuoglu K, Silver D, Rusu A A, Veness J, Bellemare M G, Graves A, Riedmiller M, Fidjeland A K, Ostrovski G, Petersen S, Beattie C, Sadik A, Antonoglou I, King H, Kumaran D, Wierstra D, Legg S, Hassabis D. Human-level control through deep reinforcement learning. Nature, 2015, 518(7540): 529-533 |
[4] |
Schmidhuber J. Deep learning in neural networks: an overview. Neural Networks, 2015, 61(7553): 85-117 |
[5] |
(高莹莹, 朱维彬. 深层神经网络中间层可见化建模. 自动化学报, 2015, 41(9): 1627-1637)
Gao Ying-Ying, Zhu Wei-Bin. Deep neural networks with visible intermediate layers. Acta Automatica Sinica, 2015, 41(9): 1627-1637 |
[6] |
(乔俊飞, 潘广源, 韩红桂. 一种连续型深度信念网的设计与应用. 自动化学报, 2015, 41(12): 2138-2146)
Qiao Jun-Fei, Pan Guang-Yuan, Han Hong-Gui. Design and application of continuous deep belief network. Acta Automatica Sinica, 2015, 41(12): 2138-2146 |
[7] |
Yu D, Deng L. Deep learning and its applications to signal and information processing. IEEE Signal Processing Magazine, 2011, 28(1): 145-154 |
[8] |
Hinton G E, Salakhutdinov R R. Reducing the dimensionality of data with neural networks. Science, 2006, 313(5786): 504-507 |
[9] |
Duchi J, Hazan E, Singer Y. Adaptive subgradient methods for online learning and stochastic optimization. The Journal of Machine Learning Research, 2011, 12: 2121-2159 |
[10] |
Senior A, Heigold G, Ranzato M A, Yang K. An empirical study of learning rates in deep neural networks for speech recognition. In: Proceedings of the 2013 IEEE International Conference on Acoustics, Speech, and Signal Processing. Vancouver, BC: IEEE, 2013. 6724-6728 |
[11] |
Hinton G E, Dayan P, Frey B J, Neal R M. The "wake-sleep" algorithm for unsupervised neural networks. Science, 1995, 268(5214): 1158-1161 |
[12] |
Hinton G E, Osindero S, Teh Y W. A fast learning algorithm for deep belief nets. Neural Computation, 2006, 18(7): 1527-1554 |
[13] |
Fischer A, Igel C. Training restricted Boltzmann machines: an introduction. Pattern Recognition, 2014, 47(1): 25-39 |
[14] |
Salakhutdinov R, Hinton G. An efficient learning procedure for deep Boltzmann machines. Neural Computation, 2012, 24(8): 1967-2006 |
[15] |
Robbins H, Monro S. A stochastic approximation method. The Annals of Mathematical Statistics, 1951, 22(3): 400-407 |
[16] |
You Z, Wang X R, Xu B. Exploring one pass learning for deep neural network training with averaged stochastic gradient descent. In: Proceedings of the 2014 IEEE International Conference on Acoustics, Speech, and Signal Processing. Florence, Italy: IEEE, 2014. 6854-6858 |
[17] |
Klein S, Pluim J P W, Staring M, Viergever M A. Adaptive stochastic gradient descent optimisation for image registration. International Journal of Computer Vision, 2009, 81(3): 227-239 |
[18] |
Shapiro A, Wardi Y. Convergence analysis of gradient descent stochastic algorithms. Journal of Optimization Theory and Applications, 1996, 91(2): 439-454 |