基于句子級BLEU指標(biāo)挑選數(shù)據(jù)的半監(jiān)督神經(jīng)機(jī)器翻譯
發(fā)布時(shí)間:2018-05-23 21:09
本文選題:半監(jiān)督 + 句子級雙語評估替補(bǔ)(BLEU) ; 參考:《模式識別與人工智能》2017年10期
【摘要】:在單語語料的使用上,統(tǒng)計(jì)機(jī)器翻譯可通過利用語言模型提高性能,而神經(jīng)機(jī)器翻譯很難通過這種方法有效利用單語語料.針對此問題,文中提出基于句子級雙語評估替補(bǔ)(BLEU)指標(biāo)挑選數(shù)據(jù)的半監(jiān)督神經(jīng)網(wǎng)絡(luò)翻譯模型.分別利用統(tǒng)計(jì)機(jī)器翻譯和神經(jīng)機(jī)器翻譯模型對無標(biāo)注數(shù)據(jù)生成候選翻譯,然后通過句子級BLEU指標(biāo)挑選單語候選翻譯,加入到有標(biāo)注的數(shù)據(jù)集中進(jìn)行半監(jiān)督聯(lián)合訓(xùn)練.實(shí)驗(yàn)表明,文中方法能高效利用無標(biāo)注的單語語料,在NIST漢英翻譯任務(wù)上,相比僅使用精標(biāo)的有標(biāo)注數(shù)據(jù)單系統(tǒng),文中方法 BLEU值有所提升.
[Abstract]:In the use of monolingual data, statistical machine translation can improve performance by using language models, while neural machine translation is difficult to effectively use monolingual data. To solve this problem, a semi-supervised neural network translation model based on sentence level bilingual evaluation index selection data is proposed in this paper. Statistical machine translation model and neural machine translation model are used to generate candidate translation for untagged data respectively. Then single language candidate translation is selected by sentence level BLEU index and added to annotated data set for semi-supervised joint training. Experiments show that the proposed method can efficiently use unannotated monolingual data, and the BLEU value of the proposed method is improved compared with that of the NIST Chinese-English translation task.
【作者單位】: 中國科學(xué)技術(shù)大學(xué)語音及語言信息處理國家工程實(shí)驗(yàn)室;
【基金】:國家重點(diǎn)研發(fā)計(jì)劃專項(xiàng)項(xiàng)目(No.2016YFB1001303)資助~~
【分類號】:H085
【相似文獻(xiàn)】
相關(guān)碩士學(xué)位論文 前1條
1 劉芳芳;外事文本機(jī)器翻譯譯文差異的量化研究[D];北京第二外國語學(xué)院;2017年
,本文編號:1926412
本文鏈接:http://sikaile.net/wenyilunwen/yuyanyishu/1926412.html
最近更新
教材專著