[1] Hinton G E, Salakhutdinov R R. Reducing the dimensionality of data with neural networks. Science, 2006, 313(5786): 504-507 doi: 10.1126/science.1127647
[2] Le Roux N, Heess N, Shotton J, Winn J. Learning a generative model of images by factoring appearance and shape. Neural Computation, 2011, 23(3): 593-650 doi: 10.1162/NECO_a_00086
[3] Lee H, Grosse R, Ranganath R, Ng A Y. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In: Proceedings of the 26th Annual International Conference on Machine Learning (ICML). Montreal, Canada: ACM, 2009. 609-616
[4] Bengio Y. Learning deep architectures for AI. Foundations and Trends in Machine Learning, 2009, 2(1): 1-127 doi: 10.1561/2200000006
[5] Deng L, Abdel-Hamid O, Yu D. A deep convolutional neural network using heterogeneous pooling for trading acoustic invariance with phonetic confusion. In: Proceedings of the 2013 International Conference on Acoustics, Speech and Signal Processing (ICASSP). Vancouver, BC, Canada: IEEE, 2013. DOI: 10.1109/ICASSP.2013.6638952
[6] Deng L. Design and learning of output representations for speech recognition. In: Proceedings of the 2013 Neural Information Processing Systems (NIPS) Workshop on Learning Output Representations. South Lake Tahoe, United States: NIPS, 2013.
[7] Tan C C, Eswaran C. Reconstruction and recognition of face and digit images using autoencoders. Neural Computing and Application, 2010, 19(7): 1069-1079 doi: 10.1007/s00521-010-0378-4
[8] 郭潇逍, 李程, 梅俏竹.深度学习在游戏中的应用.自动化学报, 2016, 42(5): 676-684 http://www.aas.net.cn/CN/abstract/abstract18857.shtml

Guo Xiao-Xiao, Li Cheng, Mei Qiao-Zhu. Deep learning applied to games. Acta Automatica Sinica, 2016, 42(5): 676-684 http://www.aas.net.cn/CN/abstract/abstract18857.shtml
[9] 田渊栋.阿法狗围棋系统的简要分析.自动化学报, 2016, 42(5): 671-675 http://www.aas.net.cn/CN/abstract/abstract18856.shtml

Tian Yuan-Dong. A simple analysis of AlphaGo. Acta Automatica Sinica, 2016, 42(5): 671-675 http://www.aas.net.cn/CN/abstract/abstract18856.shtml
[10] 段艳杰, 吕宜生, 张杰, 赵学亮, 王飞跃.深度学习在控制领域的研究现状与展望.自动化学报, 2016, 42(5): 643-654 http://www.aas.net.cn/CN/abstract/abstract18852.shtml

Duan Yan-Jie, Lv Yi-Sheng, Zhang Jie, Zhao Xue-Liang, Wang Fei-Yue. Deep learning for control: the state of the art and prospects. Acta Automatica Sinica, 2016, 42(5): 643-654 http://www.aas.net.cn/CN/abstract/abstract18852.shtml
[11] 耿杰, 范剑超, 初佳兰, 王洪玉.基于深度协同稀疏编码网络的海洋浮筏SAR图像目标识别.自动化学报, 2016, 42(4): 593-604 http://www.aas.net.cn/CN/abstract/abstract18846.shtml

Geng Jie, Fan Jian-Chao, Chu Jia-Lan, Wang Hong-Yu. Research on marine floating raft aquaculture SAR image target recognition based on deep collaborative sparse coding network. Acta Automatica Sinica, 2016, 42(4): 593-604 http://www.aas.net.cn/CN/abstract/abstract18846.shtml
[12] Deng L, Hinton G, Kingsbury B. New types of deep neural network learning for speech recognition and related applications: an overview. In: Proceedings of the 2013 International Conference on Acoustics, Speech and Signal Processing (ICASSP). Vancouver, BC, Canada: IEEE, 2013. DOI: 10.1109/ICASSP.2013.6639344
[13] Erhan D, Courville A C, Bengio Y, Vincent P. Why does unsupervised pre-training help deep learning? In: Proceedings of the 13th International Conference on Artificial Intelligence and Statistics (AISTATS). Sardinia, Italy: JMLR, 2010. 201-208
[14] Salakhutdinov R, Hinton G E. Deep Boltzmann machines. In: Proceedings of the 12th International Conference on Artificial Intelligence and Statistics (AISTATS). Florida, USA: JMLR, 2009. 448-455
[15] Swersky K, Chen B, Marlin B, De Freitas M. A tutorial on stochastic approximation algorithms for training restricted Boltzmann machines and deep belief nets. In: Proceedings of the 2010 Information Theory and Applications Workshop (ITA). La Jolla, California, USA: IEEE, 2010. 1-10
[16] Hinton G E, Osindero S, Teh Y W. A fast learning algorithm for deep belief nets. Neural Computation, 2006, 18(7): 1527-1554 doi: 10.1162/neco.2006.18.7.1527
[17] Fischer A, Igel C. Bounding the bias of contrastive divergence learning. Neural Computation, 2011, 23(3): 664-673 doi: 10.1162/NECO_a_00085
[18] Tieleman T. Training restricted Boltzmann machines using approximations to the likelihood gradient. In: Proceedings of the 25th International Conference on Machine Learning (ICML). Helsinki, Finland: ACM, 2008. 1064-1071
[19] Tieleman T, Hinton G E. Using fast weights to improve persistent contrastive divergence. In: Proceedings of the 26th International Conference on Machine Learning (ICML). Montréal, Canada: ACM, 2009. 1033-1040
[20] Sutskever I, Tieleman T. On the convergence properties of contrastive divergence. In: Proceedings of the 13th International Conference on Artificial Intelligence and Statistics (AISTATS). Sardinia, Italy: JMLR, 2010. 789-795
[21] Fischer A, Igel C. Parallel tempering, importance sampling, and restricted Boltzmann machines. In: Proceedings of the 5th Workshop on Theory of Randomized Search Heuristics (ThRaSH). Copenhagen, Denmark: University of Copenhagen, 2011. 99-119
[22] Desjardins G, Courville A, Bengio Y. Adaptive parallel tempering for stochastic maximum likelihood learning of RBMs. In: Proceedings of NIPS 2010 Workshop on Deep Learning and Unsupervised Feature Learning. Whistler, Canada: Computer Science, 2010. arXiv: 1012.3476
[23] Cho K, Raiko T, Ilin A. Parallel tempering is efficient for learning restricted Boltzmann machines. In: Proceedings of the 2010 International Joint Conference on Neural Networks (IJCNN). Barcelona, Spain: IEEE, 2010. 3246-3253
[24] Brakel P, Dieleman S, Schrauwen B. Training restricted Boltzmann machines with multi-tempering: harnessing parallelization. In: Proceedings of the 22nd International Conference on Artificial Neural Networks. Berlin Heidelberg, Germany: Springer, 2012. 92-99
[25] Desjardins G, Courville A C, Bengio Y, Vincent P, Dellaleau O. Tempered Markov chain Monte Carlo for training of restricted Boltzmann machines. In: Proceedings of the 13th International Workshop on Artificial Intelligence and Statistics (AISTATS). Washington, DC, USA: IEEE, 2010. 45-152
[26] Fischer A, Igel C. Training restricted Boltzmann machines: an introduction. Pattern Recognition, 2014, 47(1): 25-39 doi: 10.1016/j.patcog.2013.05.025
[27] Bengio Y, Delalleau O. Justifying and generalizing contrastive divergence. Neural Computation, 2009, 21(6): 1601-1621 doi: 10.1162/neco.2008.11-07-647