2.793

2018影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于变分信息瓶颈的半监督神经机器翻译

于志强 余正涛 黄于欣 郭军军 高盛祥

于志强, 余正涛, 黄于欣, 郭军军, 高盛祥. 基于变分信息瓶颈的半监督神经机器翻译. 自动化学报, 2020, 46(x): 1−12 doi: 10.16383/j.aas.c190477
引用本文: 于志强, 余正涛, 黄于欣, 郭军军, 高盛祥. 基于变分信息瓶颈的半监督神经机器翻译. 自动化学报, 2020, 46(x): 1−12 doi: 10.16383/j.aas.c190477
Yu Zhi-Qiang, Yu Zheng-Tao, Huang Yu-Xin, Guo Jun-Jun, Gao Sheng-Xiang. Improving semi-supervised neural machine translation with variational information bottleneck. Acta Automatica Sinica, 2020, 46(x): 1−12 doi: 10.16383/j.aas.c190477
Citation: Yu Zhi-Qiang, Yu Zheng-Tao, Huang Yu-Xin, Guo Jun-Jun, Gao Sheng-Xiang. Improving semi-supervised neural machine translation with variational information bottleneck. Acta Automatica Sinica, 2020, 46(x): 1−12 doi: 10.16383/j.aas.c190477

基于变分信息瓶颈的半监督神经机器翻译

doi: 10.16383/j.aas.c190477
基金项目: 国家重点研发计划(2019QY1800), 国家自然科学基金项目(61732005, 61672271, 61761026, 61762056, 61866020), 云南省高新技术产业专项(201606), 云南省自然科学基金(2018FB104)资助
详细信息
    作者简介:

    于志强:昆明理工大学信息工程与自动化学院博士研究生, 主要研究方向为自然语言处理.E-mail: yzqyt@hotmail.com

    余正涛:昆明理工大学信息工程与自动化学院教授, 主要研究方向为自然语言处理. 本文通信作者.E-mail: ztyu@hotmail.com

    黄于欣:昆明理工大学信息工程与自动化学院博士研究生, 主要研究方向为自然语言处理.E-mail: huangyuxin2004@163.com

    郭军军:昆明理工大学信息工程与自动化学院讲师, 主要研究方向为自然语言处理.E-mail: guojjgb@163.com

    高盛祥:昆明理工大学信息工程与自动化学院副教授, 主要研究方向为自然语言处理.E-mail: gaoshengxiang.yn@foxmail.com

Improving Semi-supervised Neural Machine Translation with Variational Information Bottleneck

Funds: Supported by National key research and development plan project (2019QY1800), National Natural Science Foundation of China (61732005, 61672271, 61761026, 61762056 and 61866020), Yunnan high-tech industry development project (201606), and Natural Science Foundation of Yunnan Province (2018FB104)
  • 摘要: 变分方法是机器翻译领域的有效方法, 其性能较依赖于数据量规模. 然而在低资源环境下, 平行语料资源匮乏, 不能满足变分方法对数据量的需求, 因此导致基于变分的模型翻译效果并不理想. 针对该问题, 我们提出基于变分信息瓶颈的半监督神经机器翻译方法, 所提方法的具体思路为: 首先在小规模平行语料的基础上, 通过引入跨层注意力机制充分利用神经网络各层特征信息, 训练得到基础翻译模型; 随后, 利用基础翻译模型, 使用回译方法从单语语料生成含噪声的大规模伪平行语料, 对两种平行语料进行合并形成组合语料, 使其在规模上能够满足变分方法对数据量的需求; 最后, 为了减少组合语料中的噪声, 利用变分信息瓶颈方法在源与目标之间添加中间表征, 通过训练使该表征具有放行重要信息、阻止非重要信息流过的能力, 从而达到去除噪声的效果. 多个数据集上的实验结果表明, 本文所提方法能够显著地提高译文质量, 是一种适用于低资源场景的半监督神经机器翻译方法.
  • 图  1  传统作用于最高层网络的注意力机制融入

    Fig.  1  Model with traditional attention mechanism based on top-layer merge

    图  2  层内融合方式的层级注意力机制融入

    Fig.  2  Model with hierarchical attention mechanism based on inner-layer merge

    图  3  跨层融合方式的层级注意力机制融入

    Fig.  3  Model with hierarchical attention mechanism based on cross-layer merge

    图  4  融入变分信息瓶颈后的神经机器翻译模型

    Fig.  4  NMT model after integrating variational information bottleneck

    图  5  翻译效果可视化

    Fig.  5  Example of translation effects

    图  6  英-越翻译任务的译文长度评测

    Fig.  6  Translation length evaluation of English-Vietnamese translation task

    图  7  $ \lambda $ 参数对模型的影响

    Fig.  7  Influence of $ \lambda $ parameter on the model

    表  1  语料组合结构示例

    Table  1  Examples of the combined corpus structure

    语料类别 源语言语料 目标语言语料
    原始语料 $ {D}_{a} $ $ {D}_{b} $
    单语语料 $ {D}_{x} $ None
    伪平行语料 $ {D}_{x} $ $ {D}_{y} $
    组合语料 $ {D}_{b}+{D}_{y} $ $ {D}_{a}+{D}_{x} $
    下载: 导出CSV

    表  2  平行语料的构成

    Table  2  The composition of parallel corpus

    语料类型 数据集 语言对 训练集 验证集 测试集
    小规模平行语料 IWSLT15 ${\rm{en} }\leftrightarrow {\rm{vi} }$ 133K 1553 1268
    IWSLT15 ${\rm{en}}\leftrightarrow {\rm{zh}} $ 209K 887 1261
    IWSLT15 ${\rm{en}}\leftrightarrow {\rm{de}} $ 172K 887 1565
    大规模平行语料 WMT14 ${\rm{en}}\leftrightarrow {\rm{de}} $ 4.5M 3003 3000
    下载: 导出CSV

    表  3  实验使用的单语语料的构成, 其中越南语(vi)使用本文构建的单语语料

    Table  3  The composition of monolingual corpus, in which Vietnamese (vi) was collected by ourselves

    翻译任务 语言 数据集 句数
    单语语料 $ {\rm{en} }\leftrightarrow {\rm{vi} } $ en GIGAWORD 22.3M
    vi None 1M
    $ {\rm{en} }\leftrightarrow {\rm{zh} } $ en GIGAWORD 22.3M
    zh GIGAWORD 18.7M
    $ {\rm{en} }\leftrightarrow {\rm{de(IWSLT15)} } $ en WMT14 18M
    de WMT14 17.3M
    ${\rm{en} }\leftrightarrow {\rm{de(WMT14)} } $ en WMT14 18M
    de WMT14 17.3M
    下载: 导出CSV

    表  4  BLEU值评测结果(%)

    Table  4  Evaluation results of BLEU(%)

    模型 BLEU
    en→vi vi→en en→zh zh→en en→de (IWSLT15) de→en (IWSLT15) en→de (WMT14) de→en (WMT14)
    RNNSearch 26.55 24.47 21.18 19.15 25.03 28.51 26.62 29.20
    RNNSearch+CA 27.04 24.95 21.64 19.59 25.39 28.94 27.06 29.58
    RNNSearch+VIB 27.35 25.12 21.94 19.84 25.77 29.31 27.27 29.89
    RNNSearch+CA+VIB 27.83* 25.61* 22.39 20.27 26.14* 29.66* 27.61* 30.22*
    Δ +1.28 +1.14 +1.21 +1.12 +1.11 +1.15 +0.99 +1.02
    Transformer 29.20 26.73 23.69 21.61 27.48 30.66 28.74 31.29
    Transformer+CA 29.53 27.00 23.95 21.82 27.74 30.98 28.93 31.51
    Transformer+VIB 29.96 27.38 24.30 22.13 28.04 31.24 29.16 31.75
    Transformer+CA+VIB 30.17* 27.56* 24.43 22.32 28.11* 31.35* 29.25* 31.89*
    Δ +0.97 +0.83 +0.74 +0.71 +0.63 +0.69 +0.51 +0.60
    注: Δ表示融入CA+VIB后相较基准系统的BLEU值提升, *表示利用bootstrap resampling[38]进行了显著性检验( $ p<0.05 $)
    下载: 导出CSV

    表  6  RIBES值评测结果(%)

    Table  6  Evaluation results of RIBES(%)

    翻译方向 基础翻译模型 单语
    语料
    基准
    模型
    跨层注意力 跨层注意力+
    变分信息瓶颈
    en→vi vi→en vi 74.38 75.07 75.83
    vi→en en→vi en 74.29 74.70 75.64
    en→zh zh→en zh 72.87 73.33 73.83
    zh→en en→zh en 71.81 72.25 72.55
    en→de
    (IWSLT15)
    de→en de 79.81 80.14 80.96
    de→en
    (IWSLT15)
    en→de en 78.48 78.88 79.61
    en→de
    (WMT14)
    de→en de 80.15 80.40 81.29
    de→en
    (WMT14)
    en→de en 79.33 79.52 80.07
    下载: 导出CSV

    表  5  与其他半监督方法的比较(en-de)

    Table  5  Comparison between our work and different semi-supervised NMT approach (en-de)

    模型 翻译方向 基础翻译模型 单语语料 BLEU
    Zhang et al. (2018) en→de de→en de 23.60
    de→en en→de en 27.98
    this work en→de de→en de 24.73
    de→en en→de en 28.65
    下载: 导出CSV

    表  7  中-英翻译实例

    Table  7  Chinese-English translation examples

    源句 火车被发现已经开走了
    参考译文 It was found that the train had already left
    真实译文 [TA] Found that the the train had gone
    [CA] It was found that the the train had left away
    [CA+VIB] It was found that the train had left
    下载: 导出CSV
  • [1] Sutskever I, Vinyals O, Le Q V. Sequence to sequence learning with neural networks[C] //Advances in Neural Information Processing Systems. Montreal, 2014: 3104–3112
    [2] Bahdanau D, Cho K, Bengio Y. Neural Machine Translation by Jointly Learning to Align and Translate[C]//Proceedings of the ICLR. San Diego, CA, 2015: 1–15
    [3] JIANG Hong-Fei, LI Sheng, ZHANG Min, ZHAO Tie-Jun, YANG Mu-Yun. Synchronous Tree Sequence Substitution Grammar for Statistical Machine Translation. ACTA AUTOMATICA SINICA, 2009, 35(10): 1317−1326 doi: 10.3724/SP.J.1004.2009.01317
    [4] 李亚超, 熊德意, 张民. 神经机器翻译综述. 计算机学报, 2018, 41(12): 2734−2755 doi: 10.11897/SP.J.1016.2018.02734

    Li Y C, Xiong D Y, Zhang M. A Survey of Neural Machine Translation. Chinese Journal of Computers, 2018, 41(12): 2734−2755 doi: 10.11897/SP.J.1016.2018.02734
    [5] Diederik P. Kingma, Shakir Mohamed, Danilo Jimenez Rezende, and Max Welling. [C]// Semi-supervised learning with deep generative models. In Proc. Of NIPS, 2014: 3581–3589
    [6] Diederik P Kingma and Max Welling. AutoEncoding Variational Bayes. [C]// International Conference on Learning Representations. Banff, Canada, 2014
    [7] Biao Zhang, Deyi Xiong, Jinsong Su. Variational neural machine translation[C]//Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP 2016). Austin, USA, 2016: 521–530
    [8] Rico Sennrich, Barry Haddow, Alexandra Birch. Improving Neural Machine Translation Models with Monolingual Data[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Berlin, 2016: 1715−1725
    [9] Socher R, Pennington J, Huang E H, et al. Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions[C]// Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, EMNLP. Edinburgh, UK, 2011: 27−31
    [10] Ammar W, Dyer C, Smith N A. Conditional Random Field Autoencoders for Unsupervised Structured Prediction. Advances in Neural Information Processing Systems, 2014, 4: 3311−3319
    [11] Belinkov Y, Durrani N, Dalvi F, et al. What do Neural Machine Translation Models Learn about Morphology?[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Vancouver, Canada, 2017: 861–872
    [12] Alemi A A, Fischer I, Dillon J V, et al. Deep Variational Information Bottleneck[J]. arXiv preprint arXiv: 1612.00410, 2016
    [13] Nguyen T T, Choi J. Layer-wise Learning of Stochastic Neural Networks with Information Bottleneck.[J]. arXiv: Learning, 2017
    [14] Yang Z, Yang D, Dyer C, et al. Hierarchical Attention Networks for Document Classification[C]// Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. San Diego, 2016: 1480−1489
    [15] Pappas N, Popescu-Belis A. Multilingual Hierarchical Attention Networks for Document Classification[C]//Proceedings of the 8th International Joint Conference on Natural Language Processing. Taipei, China, 2017: 1015–1025
    [16] Zhang Y, Wang Y, Liao J, et al. A Hierarchical Attention Seq2seq Model with CopyNet for Text Summarization[C]//IEEE 2018 International Conference on Robots & Intelligent System (ICRIS). Changsha, China, 2018: 316−320
    [17] Miculicich L, Ram D, Pappas N, et al. Document-Level Neural Machine Translation with Hierarchical Attention Networks[C]// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Brussels, Belgium, 2018: 2947–2954
    [18] Zhang B, Xiong D, Su J. Neural Machine Translation with Deep Attention. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018: 1−1
    [19] Ueffing N, Haffari G, Sarkar A. Semi-supervised model adaptation for statistical machine translation. Machine Translation, 2007, 21(2): 77−94 doi: 10.1007/s10590-008-9036-3
    [20] Bertoldi N, Federico M. Domain adaptation for statistical machine translation with monolingual resources[C]// Workshop on Statistical Machine Translation. Association for Computational Linguistics. Athens, Greece, 2009: 182−189
    [21] Klementiev A, Irvine A, Callison-Burch C, et al. Toward statistical machine translation without parallel corpora[C]// Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics. Avignon, France, 2012: 130−140
    [22] Zhang J, Zong C. Learning a phrase-based translation model from monolingual data with application to domain adaptation. , 2013, 1: 1425−1434
    [23] Ravi S, Knight K. Deciphering Foreign Language[C]// The 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference. Portland, USA, 2011: 12−21
    [24] Dou Q, Vaswani A, Knight K. Beyond Parallel Data: Joint Word Alignment and Decipherment Improves Machine Translation[C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Doha, Qatar. 2014: 557−565
    [25] Cheng Y, Xu W, He Z, et al. Semi-Supervised Learning for Neural Machine Translation[C]. // Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin. 2016: 1965−1974
    [26] Skorokhodov, Ivan, Anton Rykachevskiy, et al. Semi-Supervised Neural Machine Translation with Language Models[C]// Proceedings of AMTA 2018 Workshop. Boston, 2018: 37−44
    [27] Artetxe M, Labaka G, Agirre E, et al. Unsupervised Neural Machine Translation[C]// In Proceedings of the Sixth International Conference on Learning Representations (ICLR 2018). Vancouver, Canada, 2018
    [28] Lample G, Ott M, Conneau A, et al. Phrase-Based & Neural Unsupervised Machine Translation[C]// Proceedings of the Sixth International Conference on Learning Representations. Vancouver, Canada, 2018: 751−760
    [29] Burlot F, Yvon F. Using Monolingual Data in Neural Machine Translation: a Systematic Study.[J]. arXiv: Computation and Language, 2018: 144−155
    [30] Tishby N, Pereira F, Bialek W, et al. The information bottleneck method[J]. arXiv: Data Analysis, Statistics and Probability, 2000
    [31] Zhang B, Xiong D, Su J, et al. Variational Neural Machine Translation. empirical methods in natural language processing, 2016: 521−530
    [32] Eikema B, Aziz W. Auto-Encoding Variational Neural Machine Translation. arXiv: Computation and Language,, 2018: 35−43
    [33] Su J, Wu S, Xiong D, et al. Variational Recurrent Neural Machine Translation. national conference on artificial intelligence, 2018: 5488−5495
    [34] Kingma D, Ba J. Adam: A Method for Stochastic Optimization[J]//Proceedings of the 3rd International Conference on Learning Representations (ICLR 2015), San Diego, 2014
    [35] Sennrich, Rico, Biao Zhang. Revisiting Low-Resource Neural Machine Translation: A Case Study[C]// 57th Annual Meeting of the Association for Computational Linguistics (ACL2019), Florence, Italy, 2019
    [36] Papineni K. BLEU: A Method for Automatic Evaluation of Machine Translation[C]// Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL). Philadelphia, 2002: 311−318
    [37] Isozaki H, Hirao T, Duh K, et al. Automatic evaluation of translation quality for distant language pairs[C]// empirical methods in natural language processing. Massachusetts, 2010: 944−952
    [38] Koehn P, Statistical Significance Tests for Machine Translation Evaluation[C]// EMNLP2004, Barcelona, Spain, 2004
    [39] Zhang Z, Liu S, Li M, et al. Joint Training for Neural Machine Translation Models with Monolingual Data[C]. national conference on artificial intelligence, 2018: 555−562
  • 加载中
计量
  • 文章访问数:  27
  • HTML全文浏览量:  14
  • 被引次数: 0
出版历程
  • 收稿日期:  2019-06-24
  • 录用日期:  2020-01-17
  • 网络出版日期:  2021-01-12

目录

    /

    返回文章
    返回