[1] Hinton G. Where do features come from? Cognitive Science, 2014, 38(6): 1078-1101
[2] LeCun Y, Bengio Y, Hinton G. Deep learning. Nature, 2015, 521(7553): 436-444
[3] Mnih V, Kavukcuoglu K, Silver D, Rusu A A, Veness J, Bellemare M G, Graves A, Riedmiller M, Fidjeland A K, Ostrovski G, Petersen S, Beattie C, Sadik A, Antonoglou I, King H, Kumaran D, Wierstra D, Legg S, Hassabis D. Human-level control through deep reinforcement learning. Nature, 2015, 518(7540): 529-533
[4] Schmidhuber J. Deep learning in neural networks: an overview. Neural Networks, 2015, 61(7553): 85-117
[5] (高莹莹, 朱维彬. 深层神经网络中间层可见化建模. 自动化学报, 2015, 41(9): 1627-1637)

Gao Ying-Ying, Zhu Wei-Bin. Deep neural networks with visible intermediate layers. Acta Automatica Sinica, 2015, 41(9): 1627-1637
[6] (乔俊飞, 潘广源, 韩红桂. 一种连续型深度信念网的设计与应用. 自动化学报, 2015, 41(12): 2138-2146)

Qiao Jun-Fei, Pan Guang-Yuan, Han Hong-Gui. Design and application of continuous deep belief network. Acta Automatica Sinica, 2015, 41(12): 2138-2146
[7] Yu D, Deng L. Deep learning and its applications to signal and information processing. IEEE Signal Processing Magazine, 2011, 28(1): 145-154
[8] Hinton G E, Salakhutdinov R R. Reducing the dimensionality of data with neural networks. Science, 2006, 313(5786): 504-507
[9] Duchi J, Hazan E, Singer Y. Adaptive subgradient methods for online learning and stochastic optimization. The Journal of Machine Learning Research, 2011, 12: 2121-2159
[10] Senior A, Heigold G, Ranzato M A, Yang K. An empirical study of learning rates in deep neural networks for speech recognition. In: Proceedings of the 2013 IEEE International Conference on Acoustics, Speech, and Signal Processing. Vancouver, BC: IEEE, 2013. 6724-6728
[11] Hinton G E, Dayan P, Frey B J, Neal R M. The "wake-sleep" algorithm for unsupervised neural networks. Science, 1995, 268(5214): 1158-1161
[12] Hinton G E, Osindero S, Teh Y W. A fast learning algorithm for deep belief nets. Neural Computation, 2006, 18(7): 1527-1554
[13] Fischer A, Igel C. Training restricted Boltzmann machines: an introduction. Pattern Recognition, 2014, 47(1): 25-39
[14] Salakhutdinov R, Hinton G. An efficient learning procedure for deep Boltzmann machines. Neural Computation, 2012, 24(8): 1967-2006
[15] Robbins H, Monro S. A stochastic approximation method. The Annals of Mathematical Statistics, 1951, 22(3): 400-407
[16] You Z, Wang X R, Xu B. Exploring one pass learning for deep neural network training with averaged stochastic gradient descent. In: Proceedings of the 2014 IEEE International Conference on Acoustics, Speech, and Signal Processing. Florence, Italy: IEEE, 2014. 6854-6858
[17] Klein S, Pluim J P W, Staring M, Viergever M A. Adaptive stochastic gradient descent optimisation for image registration. International Journal of Computer Vision, 2009, 81(3): 227-239
[18] Shapiro A, Wardi Y. Convergence analysis of gradient descent stochastic algorithms. Journal of Optimization Theory and Applications, 1996, 91(2): 439-454