2.624

2020影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于变分信息瓶颈的半监督神经机器翻译

于志强 余正涛 黄于欣 郭军军 高盛祥

于志强, 余正涛, 黄于欣, 郭军军, 高盛祥. 基于变分信息瓶颈的半监督神经机器翻译. 自动化学报, 2022, 48(7): 1678−1689 doi: 10.16383/j.aas.c190477
引用本文: 于志强, 余正涛, 黄于欣, 郭军军, 高盛祥. 基于变分信息瓶颈的半监督神经机器翻译. 自动化学报, 2022, 48(7): 1678−1689 doi: 10.16383/j.aas.c190477
Yu Zhi-Qiang, Yu Zheng-Tao, Huang Yu-Xin, Guo Jun-Jun, Gao Sheng-Xiang. Improving semi-supervised neural machine translation with variational information bottleneck. Acta Automatica Sinica, 2022, 48(7): 1678−1689 doi: 10.16383/j.aas.c190477
Citation: Yu Zhi-Qiang, Yu Zheng-Tao, Huang Yu-Xin, Guo Jun-Jun, Gao Sheng-Xiang. Improving semi-supervised neural machine translation with variational information bottleneck. Acta Automatica Sinica, 2022, 48(7): 1678−1689 doi: 10.16383/j.aas.c190477

基于变分信息瓶颈的半监督神经机器翻译

doi: 10.16383/j.aas.c190477
基金项目: 国家重点研发计划(2019QY1800), 国家自然科学基金(61732005, 61672271, 61761026, 61762056, 61866020), 云南省高新技术产业专项基金(201606), 云南省自然科学基金(2018FB104)资助
详细信息
    作者简介:

    于志强:昆明理工大学信息工程与自动化学院博士研究生. 主要研究方向为自然语言处理.E-mail: yzqyt@hotmail.com

    余正涛:昆明理工大学信息工程与自动化学院教授. 主要研究方向为自然语言处理. 本文通信作者.E-mail: ztyu@hotmail.com

    黄于欣:昆明理工大学信息工程与自动化学院博士研究生. 主要研究方向为自然语言处理.E-mail: huangyuxin2004@163.com

    郭军军:昆明理工大学信息工程与自动化学院讲师. 主要研究方向为自然语言处理.E-mail: guojjgb@163.com

    高盛祥:昆明理工大学信息工程与自动化学院副教授. 主要研究方向为自然语言处理.E-mail: gaoshengxiang.yn@foxmail.com

Improving Semi-supervised Neural Machine Translation With Variational Information Bottleneck

Funds: Supported by National Key Research and Development Program of China (2019QY1800), National Natural Science Foundation of China (61732005, 61672271, 61761026, 61762056, 61866020), Yunnan High-Tech Industry Development Project (201606), and Natural Science Foundation of Yunnan Province (2018FB104)
More Information
    Author Bio:

    YU Zhi-Qiang Ph.D. candidate at the Faculty of Information Engineering and Automation, Kunming University of Science and Technology. His main research interest is natural language processing

    YU Zheng-Tao Professor at the Faculty of Information Engineering and Automation, Kunming University of Science and Technology. His main research interest is natural language processing. Corresponding author of this paper

    HUANG Yu-Xin Ph.D. candidate at the Faculty of Information Engineering and Automation, Kunming University of Science and Technology. His main research interest is natural language processing

    GUO Jun-Jun Lecturer at the Faculty of Information Engineering and Automation, Kunming University of Science and Technology. His main research interest is natural language processing

    GAO Sheng-Xiang Associate professor at the Faculty of Information Engineering and Automation, Kunming University of Science and Technology. Her main research interest is natural language processing

  • 摘要: 变分方法是机器翻译领域的有效方法, 其性能较依赖于数据量规模. 然而在低资源环境下, 平行语料资源匮乏, 不能满足变分方法对数据量的需求, 因此导致基于变分的模型翻译效果并不理想. 针对该问题, 本文提出基于变分信息瓶颈的半监督神经机器翻译方法, 所提方法的具体思路为: 首先在小规模平行语料的基础上, 通过引入跨层注意力机制充分利用神经网络各层特征信息, 训练得到基础翻译模型; 随后, 利用基础翻译模型, 使用回译方法从单语语料生成含噪声的大规模伪平行语料, 对两种平行语料进行合并形成组合语料, 使其在规模上能够满足变分方法对数据量的需求; 最后, 为了减少组合语料中的噪声, 利用变分信息瓶颈方法在源与目标之间添加中间表征, 通过训练使该表征具有放行重要信息、阻止非重要信息流过的能力, 从而达到去除噪声的效果. 多个数据集上的实验结果表明, 本文所提方法能够显著地提高译文质量, 是一种适用于低资源场景的半监督神经机器翻译方法.
  • 图  1  传统作用于最高层网络的注意力机制融入

    Fig.  1  Model with traditional attention mechanism based on top-layer merge

    图  2  层内融合方式的层级注意力机制融入

    Fig.  2  Model with hierarchical attention mechanism based on inner-layer merge

    图  3  跨层融合方式的层级注意力机制融入

    Fig.  3  Model with hierarchical attention mechanism based on cross-layer merge

    图  4  融入变分信息瓶颈后的神经机器翻译模型

    Fig.  4  NMT model after integrating variational information bottleneck

    图  5  翻译效果可视化

    Fig.  5  Example of translation effects

    图  6  英−越翻译任务的译文长度评测

    Fig.  6  Translation length evaluation of English-Vietnamese translation task

    图  7  $ \lambda $参数对模型的影响

    Fig.  7  Influence of $ \lambda $ parameter on the model

    表  1  语料组合结构示例

    Table  1  Examples of the combined corpus structure

    语料类别源语言语料目标语言语料
    原始语料$ {D}_{a} $$ {D}_{b} $
    单语语料$ {D}_{x} $
    伪平行语料$ {D}_{x} $$ {D}_{y} $
    组合语料$ {D}_{b}+{D}_{y} $$ {D}_{a}+{D}_{x} $
    下载: 导出CSV

    表  2  平行语料的构成

    Table  2  The composition of parallel corpus

    语料类型数据集语言对训练集验证集测试集
    小规模平行语料IWSLT15${\rm{en} }\leftrightarrow {\rm{vi} }$133 K15531268
    IWSLT15${\rm{en}}\leftrightarrow {\rm{zh}} $209 K8871261
    IWSLT15${\rm{en}}\leftrightarrow {\rm{de}} $172 K8871565
    大规模平行语料WMT14${\rm{en}}\leftrightarrow {\rm{de}} $4.5 M30033000
    注: en: 英语, vi: 越南语, zh: 中文, de: 德语.
    下载: 导出CSV

    表  3  实验使用的单语语料的构成, 其中越南语使用本文构建的单语语料

    Table  3  The composition of monolingual corpus, in which Vietnamese was collected by ourselves

    翻译任务语言数据集句数 (M)
    单语语料$ {\rm{en} }\leftrightarrow {\rm{vi} } $enGIGAWORD22.3
    viNone1
    $ {\rm{en} }\leftrightarrow {\rm{zh} } $enGIGAWORD22.3
    zhGIGAWORD18.7
    ${\rm{en} }\leftrightarrow {\rm{de}}\;{\rm{(IWSLT15)} }$enWMT1418
    deWMT1417.3
    ${\rm{en} }\leftrightarrow {\rm{de}}\;{\rm{(WMT14)} }$enWMT1418
    deWMT1417.3
    下载: 导出CSV

    表  4  BLEU值评测结果(%)

    Table  4  Evaluation results of BLEU (%)

    模型BLEU
    en→vivi→enen→zhzh→enen→de
    (IWSLT15)
    de→en
    (IWSLT15)
    en→de
    (WMT14)
    de→en
    (WMT14)
    RNNSearch26.5524.4721.1819.1525.0328.5126.6229.20
    RNNSearch+CA27.0424.9521.6419.5925.3928.9427.0629.58
    RNNSearch+VIB27.3525.1221.9419.8425.7729.3127.2729.89
    RNNSearch+CA+VIB27.83*25.61*22.3920.2726.14*29.66*27.61*30.22*
    $\triangle $+1.28+1.14+1.21+1.12+1.11+1.15+0.99+1.02
    Transformer29.2026.7323.6921.6127.4830.6628.7431.29
    Transformer+CA29.5327.0023.9521.8227.7430.9828.9331.51
    Transformer+VIB29.9627.3824.3022.1328.0431.2429.1631.75
    Transformer+CA+VIB30.17*27.56*24.4322.3228.11*31.35*29.25*31.89*
    $\triangle $+0.97+0.83+0.74+0.71+0.63+0.69+0.51+0.60
    注: $\triangle $表示融入CA+VIB后相较基准系统的BLEU值提升, * 表示利用bootstrap resampling[37] 进行了显著性检验 ($ p<0.05 $)
    下载: 导出CSV

    表  6  RIBES值评测结果(%)

    Table  6  Evaluation results of RIBES (%)

    翻译方向基础翻译模型单语
    语料
    基准
    模型
    跨层注意力跨层注意力+
    变分信息瓶颈
    en→vivi→envi74.3875.0775.83
    vi→enen→vien74.2974.7075.64
    en→zhzh→enzh72.8773.3373.83
    zh→enen→zhen71.8172.2572.55
    en→de
    (IWSLT15)
    de→ende79.8180.1480.96
    de→en
    (IWSLT15)
    en→deen78.4878.8879.61
    en→de
    (WMT14)
    de→ende80.1580.4081.29
    de→en
    (WMT14)
    en→deen79.3379.5280.07
    下载: 导出CSV

    表  5  与其他半监督方法的比较(en-de)

    Table  5  Comparison between our work and different semi-supervised NMT approach (en-de)

    模型翻译方向基础翻译模型单语语料BLEU
    Zhang et al. (2018)en→dede→ende23.60
    de→enen→deen27.98
    this worken→dede→ende24.73
    de→enen→deen28.65
    下载: 导出CSV

    表  7  中−英翻译实例

    Table  7  Chinese-English translation examples

    源句参考译文真实译文
    火车被发现
    已经开走了
    It was found that the train had already left[TA] Found that the
    the train had gone
    [CA] It was found that the the train had left away
    [CA+VIB] It was found that the train had left
    下载: 导出CSV
  • [1] Sutskever I, Vinyals O, Le Q V. Sequence to sequence learning with neural networks. In: Proceedings of the 27th International Conference on Neural Information Processing Systems. Montreal, Canada: MIT Press, 2014. 3104−3112
    [2] Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate. In: Proceedings of the 3rd International Conference on Learning Representations (ICLR). San Diego, USA, 2015. 1−15
    [3] 蒋宏飞, 李生, 张民, 赵铁军, 杨沐昀. 基于同步树序列替换文法的统计机器翻译模型. 自动化学报, 2009, 35(10): 1317−1326 doi: 10.3724/SP.J.1004.2009.01317

    Jiang Hong-Fei, Li Sheng, Zhang Min, Zhao Tie-Jun, Yang Mu-Yun. Synchronous tree sequence substitution grammar for statistical machine translation. Acta Automatica Sinica, 2009, 35(10): 1317−1326 doi: 10.3724/SP.J.1004.2009.01317
    [4] 李亚超, 熊德意, 张民. 神经机器翻译综述. 计算机学报, 2018, 41(12): 2734−2755 doi: 10.11897/SP.J.1016.2018.02734

    Li Ya-Chao, Xiong De-Yi, Zhang Min. A survey of neural machine translation. Chinese Journal of Computers, 2018, 41(12): 2734−2755 doi: 10.11897/SP.J.1016.2018.02734
    [5] Kingma D P, Rezende D J, Mohamed S, Welling M. Semi-supervised learning with deep generative models. In: Proceedings of the 27th International Conference on Neural Information Processing Systems. Montreal, Canada: MIT Press, 2014. 3581−3589
    [6] Kingma D P, Welling M. Auto-encoding variational bayes. In: Proceedings of the 2nd International Conference on Learning Representations (ICLR). Banff, Canada, 2014.
    [7] Zhang B, Xiong D Y, Su J S, Duan H, Zhang M. Variational neural machine translation. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP 2016). Austin, USA: Association for Computational Linguistics, 2016. 521−530
    [8] Sennrich R, Haddow B, Birch A. Improving neural machine translation models with monolingual data. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Berlin, Germany: Association for Computational Linguistics, 2016. 86−96
    [9] Socher R, Pennington J, Huang E H, Ng A Y, Manning C D. Semi-supervised recursive autoencoders for predicting sentiment distributions. In: Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing (EMNLP). Edinburgh, UK: Association for Computational Linguistics, 2011. 151−161
    [10] Ammar W, Dyer C, Smith N A. Conditional random field autoencoders for unsupervised structured prediction. In: Proceedings of the 27th International Conference on Neural Information Processing Systems. Montreal, Canada: MIT Press, 2014. 3311−3319
    [11] Belinkov Y, Durrani N, Dalvi F, Sajjad H, Glass J. What do neural machine translation models learn about morphology? In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Vancouver, Canada: Association for Computational Linguistics, 2017. 861−872
    [12] Alemi A A, Fischer I, Dillon J V, Murphy K. Deep variational information bottleneck. In: Proceedings of the 5th International Conference on Learning Representations (ICLR). Toulon, France: OpenReview.net, 2017.
    [13] Nguyen T T, Choi J. Layer-wise learning of stochastic neural networks with information bottleneck. arXiv: 1712.01272, 2017.
    [14] Yang Z C, Yang D Y, Dyer C, He X D, Smola A, Hovy E. Hierarchical attention networks for document classification. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. San Diego, USA: Association for Computational Linguistics, 2016. 1480−1489
    [15] Pappas N, Popescu-Belis A. Multilingual hierarchical attention networks for document classification. In: Proceedings of the 8th International Joint Conference on Natural Language Processing. Taipei, China: Asian Federation of Natural Language Processing, 2017. 1015−1025
    [16] Zhang Y, Wang Y H, Liao J Z, Xiao W D. A hierarchical attention Seq2seq model with CopyNet for text summarization. In: Proceedings of the 2018 International Conference on Robots and Intelligent System (ICRIS). Changsha, China: IEEE, 2018. 316−320
    [17] Miculicich L, Ram D, Pappas N, Henderson J. Document-level neural machine translation with hierarchical attention networks. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Brussels, Belgium: Association for Computational Linguistics, 2018. 2947−2954
    [18] Zhang B, Xiong D Y, Su J S. Neural machine translation with deep attention. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42(1): 154−163 doi: 10.1109/TPAMI.2018.2876404
    [19] Ueffing N, Haffari G, Sarkar A. Semi-supervised model adaptation for statistical machine translation. Machine Translation, 2007, 21(2): 77−94 doi: 10.1007/s10590-008-9036-3
    [20] Bertoldi N, Federico M. Domain adaptation for statistical machine translation with monolingual resources. In: Proceedings of the 4th Workshop on Statistical Machine Translation. Athens, Greece: Association for Computational Linguistics, 2009. 182−189
    [21] Klementiev A, Irvine A, Callison-Burch C, Yarowsky D. Toward statistical machine translation without parallel corpora. In: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics. Avignon, France: Association for Computational Linguistics, 2012. 130−140
    [22] Zhang J J, Zong C Q. Learning a phrase-based translation model from monolingual data with application to domain adaptation. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics. Sofia, Bulgaria: Association for Computational Linguistics, 2013. 1425−1434
    [23] Ravi S, Knight K. Deciphering foreign language. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Portland, USA: Association for Computational Linguistics, 2011. 12−21
    [24] Dou Q, Vaswani A, Knight K. Beyond parallel data: Joint word alignment and decipherment improves machine translation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Doha, Qatar: Association for Computational Linguistics, 2014. 557−565
    [25] Cheng Y, Xu W, He Z J, He W, Wu H, Sun M S, et al. Semi-supervised learning for neural machine translation. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Berlin, Germany: Association for Computational Linguistics, 2016. 1965−1974
    [26] Skorokhodov I, Rykachevskiy A, Emelyanenko D, Slotin S, Ponkratov A. Semi-supervised neural machine translation with language models. In: Proceedings of the 2018 AMTA Workshop on Technologies for MT of Low Resource Languages (LoResMT 2018). Boston, USA: Association for Machine Translation in the Americas, 2018. 37−44
    [27] Artetxe M, Labaka G, Agirre E, Cho K. Unsupervised neural machine translation. In: Proceedings of the 6th International Conference on Learning Representations (ICLR 2018). Vancouver, Canada: OpenReview.net, 2018.
    [28] Lample G, Ott M, Conneau A, Denoyer L, Ranzato M A. Phrase-based and neural unsupervised machine translation. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Brussels, Belgium: Association for Computational Linguistics, 2018. 5039−5049
    [29] Burlot F, Yvon F. Using monolingual data in neural machine translation: A systematic study. In: Proceedings of the 3rd Conference on Machine Translation: Research Papers. Brussels, Belgium: Association for Computational Linguistics, 2018. 144−155
    [30] Tishby N, Pereira F C, Bialek W. The information bottleneck method. arXiv: physics/0004057, 2000.
    [31] Eikema B, Aziz W. Auto-encoding variational neural machine translation. In: Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019). Florence, Italy: Association for Computational Linguistics, 2019. 124−141
    [32] Su J S, Wu S, Xiong D Y, Lu Y J, Han X P, Zhang B. Variational recurrent neural machine translation. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence. New Orleans, USA: AAAI, 2018. 5488−5495
    [33] Kingma D P, Ba L J. Adam: A method for stochastic optimization. In: Proceedings of the 3rd International Conference on Learning Representations (ICLR 2015). San Diego, USA, 2014.
    [34] Sennrich R, Zhang B. Revisiting low-resource neural machine translation: A case study. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL2019). Florence, Italy: Association for Computational Linguistics, 2019. 211−221
    [35] Papineni K, Roukos S, Ward T, Zhu W J. BLEU: A method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL). Philadelphia, USA: Association for Computational Linguistics, 2002. 311−318
    [36] Isozaki H, Hirao T, Duh K, Sudoh K, Tsukada H. Automatic evaluation of translation quality for distant language pairs. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. Cambridge, USA: Association for Computational Linguistics, 2010. 944−952
    [37] Koehn P. Statistical significance tests for machine translation evaluation. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing (EMNLP2004). Barcelona, Spain: Association for Computational Linguistics, 2004. 388−395
    [38] Zhang Z R, Liu S J, Li M, Zhou M, Chen E H. Joint training for neural machine translation models with monolingual data. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence and the 30th Innovative Applications of Artificial Intelligence Conference and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence. New Orleans, USA: AAAI Press, 2018. Article No. 69
  • 加载中
图(7) / 表(7)
计量
  • 文章访问数:  822
  • HTML全文浏览量:  238
  • PDF下载量:  103
  • 被引次数: 0
出版历程
  • 收稿日期:  2019-06-24
  • 录用日期:  2020-01-17
  • 网络出版日期:  2021-01-12
  • 刊出日期:  2022-07-01

目录

    /

    返回文章
    返回