融合統(tǒng)計機器翻譯特征的蒙漢神經(jīng)網(wǎng)絡機器翻譯技術(shù)
[Abstract]:With the development of machine translation, it is difficult to improve statistical machine translation in the bottleneck period. Therefore, researchers have gradually turned their attention to neural network machine translation. Neural network machine translation has also achieved good results on large scale corpus, but there is little research on neural network machine translation of small scale corpus. However, as a new machine translation method, it also has some limitations: (1) in order to reduce the complexity of training, neural network machine translation usually limits the size of the dictionary to a specific range, resulting in a serious problem of unrecorded words. This problem seriously affects the translation effect; (2) the decoding of neural network machine translation lacks the mechanism to ensure that all the source language words are translated, so it tends to short translation results; (3) neural network machine translation can not make good use of the language model. For the above reasons, this paper implements neural network machine translation on a small scale Mongolian and Chinese parallel corpus, and proposes to solve the problems in neural network machine translation by statistical machine translation features. Firstly, this paper builds an attention-based Mongolian and Chinese neural network machine translation system; secondly, this paper extracts the statistical machine translation features: translation model, word feedback information and language model, and defines its feature function. Thirdly, this paper uses the Mongolian and Chinese parallel corpus to build the Mongolian and Chinese alignment dictionary by using GIZA, and uses IRSTLM to build a language model for Chinese. Fourthly, this paper integrates the established Mongolian and Chinese alignment dictionaries, language models and word feedback information into the decoding of attention based neural network machine translation to deal with the limitations of neural network machine translation. Finally, this paper proposes two methods to deal with the problem of unrecorded words in neural network machine translation, which greatly reduce the number of unrecorded words in neural network machine translation. The experimental results show that the Mongolian and Chinese neural network machine translation, which combines statistical machine translation features, can significantly improve the translation quality. The BLEU value is increased to 30.66, and the sentence length is raised from 16.7 words to 19.1 words. In addition, 86% of the unrecorded words in neural network machine translation were eliminated.
【學位授予單位】:內(nèi)蒙古大學
【學位級別】:碩士
【學位授予年份】:2017
【分類號】:TP391.2
【參考文獻】
相關(guān)期刊論文 前9條
1 劉群;;基于句法的統(tǒng)計機器翻譯模型與方法[J];中文信息學報;2011年06期
2 奉國和;鄭偉;;國內(nèi)中文自動分詞技術(shù)研究綜述[J];圖書情報工作;2011年02期
3 趙偉;侯宏旭;從偉;宋美娜;;基于條件隨機場的蒙古語詞切分研究[J];中文信息學報;2010年05期
4 侯宏旭;張國強;劉志文;;層次化蒙古語統(tǒng)計語言模型[J];內(nèi)蒙古大學學報(自然科學版);2009年03期
5 劉群;;機器翻譯研究新進展[J];當代語言學;2009年02期
6 侯宏旭;劉群;那順烏日圖;牧仁高娃;李錦濤;;基于統(tǒng)計語言模型的蒙古文詞切分[J];模式識別與人工智能;2009年01期
7 侯宏旭;劉群;劉志文;張國強;;Skip-N蒙古文統(tǒng)計語言模型[J];內(nèi)蒙古大學學報(自然科學版);2008年02期
8 黃昌寧;趙海;;中文分詞十年回顧[J];中文信息學報;2007年03期
9 那順烏日圖;蒙古文詞根、詞干、詞尾的自動切分系統(tǒng)[J];內(nèi)蒙古大學學報(人文社會科學版);1997年02期
相關(guān)會議論文 前1條
1 申曉亭;;少數(shù)民族文字拉丁轉(zhuǎn)寫的意義與方案[A];第十屆全國少數(shù)民族語言文字信息處理學術(shù)研討會論文集[C];2005年
相關(guān)碩士學位論文 前1條
1 明玉;基于詞典、規(guī)則與統(tǒng)計的蒙古文詞切分系統(tǒng)的研究[D];內(nèi)蒙古大學;2011年
,本文編號:2415194
本文鏈接:http://sikaile.net/shoufeilunwen/xixikjs/2415194.html