Deep Neural Networks with Visible Intermediate Layers
-
摘要: 深层神经网络的中间层是隐含的、未知的,这使得深层网络的学习过程不可追踪,学习结果无法解释,在一 定程度上制约了深度学习的发展.本文通过引入先验知识使深层网络的中间层具有明确的含义与显性的影响 关系,即中间层可见化,从而部分人工干预深层网络的内部结构,约束网络学习的方向.基于深层堆叠网 络 (Deep stacking network, DSN),提出两种中间层部分可见的深层神经网络:输入层部分可见的深层堆叠网络(Input-layer visible DSN, IVDSN)和隐含层部分可见的深层堆叠网络(Hidden-layer visible DSN, HVDSN),部分可见是为了保留对未知信息的提取能力和一定的容错能力.以基于文本的言语情 感计算为例测试所提网络的有效性,结果表明先验知识的引入有助于提升深层神经网络的 性能;所提两种网络均可实现中间层的部分可见化,其中HVDSN结构更精简,性能也更优.Abstract: The hidden nature of intermediate layers in deep neural networks makes the learning process hard to track and the learned results difficult to explain, which restricts the development of deep networks to some extent. This work focuses on making these intermediate layers visible through prior knowledge, which means giving the intermediate layers definite meanings and explicit interrelationship, in the hope to supervise the learning process of deep networks and guide the learning direction. On the basis of deep stacking network (DSN), we propose two networks in which the intermediate layers are partially visible: the input-layer visible deep stacking network (IVDSN) and the hidden-layer visible deep stacking network (HVDSN). To be partially but not fully visible is to leave room for the unknown and the error. With the application of the text-based detection of speech emotion, the performance of the proposed networks is tested. The results validate that the transparency of intermediate layers is beneficial to improve the performance of deep neural networks. Between the two proposed networks, the HVDSN has a simpler structure and a better performance.
-
[1] Yoo H J. Deep convolution neural networks in computer vision: a review. IEIE Transactions on Smart Processing and Computing, 2015, 4(1): 35-43 [2] Oquab M, Bottou L, Laptev I, Sivic J. Learning and transferring mid-level image representations using convolutional neural networks. In: Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Columbus, OH: IEEE, 2014. 1717-1724 [3] Zhang C, Zhang Z Y. Improving multiview face detection with multi-task deep convolutional neural networks. In: Proceedings of the 2014 IEEE Winter Conference on Applications of Computer Vision (WACV). Steamboat Springs, CO: IEEE, 2014. 1036-1041 [4] Sainath T N, Kingsbury B, Saon G, Soltaua H, Mohamed A, Dahlb G, Ramabhadran R. Deep convolutional neural networks for large-scale speech tasks. Neural Networks, 2015, 64: 39-48 [5] Deng L, Hinton G, Kingsbury B. New types of deep neural network learning for speech recognition and related applications: an overview. In: Proceedings of the 2013 International Conference on Acoustics, Speech and Signal Processing. Vancouver, Canada: IEEE, 2013. 8599-8603 [6] Bengio S, Heigold G. Word embeddings for speech recognition. In: Proceedings of the 15th Conference of the International Speech Communication Association, Interspeech. Singapore: ISCA, 2014. 1053-1057 [7] Le Q V, Mikolov T. Distributed representations of sentences and documents. In: Eprint Arxiv, 2014. 1188-1196 [8] Kiros R, Zemel R S, Salakhutdinov R. A multiplicative model for learning distributed text-based attribute representations. In: Eprint Arxiv, 2014. 2348-2356 [9] Lee C Y, Xie S N, Gallagher P, Zhang Z, Tu Z W. Deeply-supervised nets. In: Eprint Arvix, 2014. 562-570 [10] Weston J, Ratle F, Mobahi H, Collobert R. Deep learning via semi-supervised embedding. Neural Networks: Tricks of the Trade. Berlin Heidelberg: Springer, 2012. 639-655 [11] Deng L, Yu D, Platt J. Scalable stacking and learning for building deep architectures. In: Proceedings of the 2012 International Conference on Acoustics, Speech, and Signal Processing. Kyoto: IEEE, 2012. 2133-2136 [12] Hinton G E, Salakhutdinov R R. Reducing the dimensionality of data with neural networks. Science, 2006, 313(5786): 504-507 [13] Glorot X, Bengio Y. Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the 13th International Conference on Artificial Intelligence and Statistics. Sardinia, Italy: JMLR: W&CP, 2010. 249-256 [14] Hinton G E, Osindero S, Teh Y W. A fast learning algorithm for deep belief nets. Neural Computation, 2006, 18(7): 1527-1554 [15] Hinton G E. Training products of experts by minimizing contrastive divergence. Neural Computation, 2002, 14(8): 1711-1800 [16] Yu D, Deng L. Accelerated parallelizable neural network learning algorithm for speech recognition. In: Proceedings of the 2011 Annual Conference of the International Speech Communication Association. Florence, Italy: ISCA, 2011. 2281-2284 [17] Ekman P. An argument for basic emotions. Cognition and Emotion, 1992, 6(3-4): 169-200 [18] Cowie R, Cornelius R R. Describing the emotional states that are expressed in speech. Speech Communication, 2003, 40(1-2): 5-32 [19] Calvo R A, Mac K S. Emotions in text: dimensional and categorical models. Computational Intelligence, 2013, 29(3): 527-543 [20] Trilla T, Alias F. Sentence-based sentiment analysis for expressive text-to-speech. IEEE Transactions on Audio, Speech, and Language Processing, 2013, 21(2): 223-233 [21] Bellegarda J R. A data-driven affective analysis framework toward naturally expressive speech synthesis. IEEE Transactions on Audio, Speech, and Language Processing, 2011, 19(5): 1113-1122 [22] Moors A, Ellsworth P C, Scherer K R, Frijda N H. Appraisal theories of emotion: state of the art and future development. Emotion Review, 2013, 5(2): 119-124 [23] Gao Ying-Ying, Zhu Wei-Bin. A study of a transcription system for speech emotion. Chinese Journal of Phonetics, 2013, 4: 71-81(高莹莹, 朱维彬. 言语情感描述体系的试验性研究. 中国语音学报, 2013, 4: 71-81) [24] Zhang Song. Recitation Science. Beijing: Communication University of China Press, 2007.(张颂. 朗读学. 北京: 中国传媒大学出版社, 2007.) [25] Zhang Song. China Broadcasting Science. Beijing: Communication University of China Press, 2003.(张颂. 中国播音学. 北京: 中国传媒大学出版社, 2003.) [26] Blei D M, Ng A Y, Jordan M I. Latent Dirichlet allocation. The Journal of Machine Learning Research, 2003, 3: 993-1022
点击查看大图
计量
- 文章访问数: 2125
- HTML全文浏览量: 118
- PDF下载量: 1809
- 被引次数: 0