天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

融合統(tǒng)計機器翻譯特征的蒙漢神經(jīng)網(wǎng)絡機器翻譯技術(shù)

發(fā)布時間:2019-01-25 21:12
【摘要】:隨著機器翻譯的發(fā)展,統(tǒng)計機器翻譯已經(jīng)進入瓶頸期很難有所提高,因此研究人員逐步將研究目光投向神經(jīng)網(wǎng)絡機器翻譯方向。神經(jīng)網(wǎng)絡機器翻譯也在大規(guī)模語料上取得了很好的翻譯效果,而對小規(guī)模語料的神經(jīng)網(wǎng)絡機器翻譯研究甚少。但是作為新的機器翻譯方法它也存在一些限制:(1)神經(jīng)網(wǎng)絡機器翻譯為了降低訓練的復雜度通常會將詞典的大小限制到一個特定的范圍內(nèi),從而導致嚴重的未登錄詞問題,這個問題嚴重影響了翻譯效果;(2)神經(jīng)網(wǎng)絡機器翻譯的解碼缺乏保證源語言詞都被翻譯的機制從而傾向于短的翻譯結(jié)果;(3)神經(jīng)網(wǎng)絡機器翻譯不能很好的利用語言模型;谝陨显,本文在小規(guī)模的蒙漢平行語料上實現(xiàn)了神經(jīng)網(wǎng)絡機器翻譯,并提出通過統(tǒng)計機器翻譯中的特征來緩解神經(jīng)網(wǎng)絡機器翻譯中的問題。首先本文搭建了基于注意力的蒙漢神經(jīng)網(wǎng)絡機器翻譯系統(tǒng);其次,本文提取了統(tǒng)計機器翻譯特征:翻譯模型,詞反饋信息以及語言模型,并定義了其特征函數(shù);第三,本文通過蒙漢平行語料利用GIZA++建立了蒙漢對齊詞典,利用IRSTLM對漢文建立了語言模型;第四,本文將已經(jīng)建立的蒙漢對齊詞典、語言模型以及詞反饋信息通過對數(shù)線性模型融入到基于注意力的神經(jīng)網(wǎng)絡機器翻譯的解碼中從而處理神經(jīng)網(wǎng)絡機器翻譯中的限制;最后,本文針對神經(jīng)網(wǎng)絡機器翻譯中的未登錄詞問題提出了在翻譯過程中處理和翻譯后處理的兩種處理方法,大幅度減少了神經(jīng)網(wǎng)絡機器翻譯中的未登錄詞。實驗結(jié)果表明,通過融合統(tǒng)計機器翻譯特征的蒙漢神經(jīng)網(wǎng)絡機器翻譯明顯地提升了翻譯質(zhì)量,BLEU值提高至30.66,句子長度由16.7個詞提升至19.1個詞,并處理掉了神經(jīng)網(wǎng)絡機器翻譯中86%的未登錄詞。
[Abstract]:With the development of machine translation, it is difficult to improve statistical machine translation in the bottleneck period. Therefore, researchers have gradually turned their attention to neural network machine translation. Neural network machine translation has also achieved good results on large scale corpus, but there is little research on neural network machine translation of small scale corpus. However, as a new machine translation method, it also has some limitations: (1) in order to reduce the complexity of training, neural network machine translation usually limits the size of the dictionary to a specific range, resulting in a serious problem of unrecorded words. This problem seriously affects the translation effect; (2) the decoding of neural network machine translation lacks the mechanism to ensure that all the source language words are translated, so it tends to short translation results; (3) neural network machine translation can not make good use of the language model. For the above reasons, this paper implements neural network machine translation on a small scale Mongolian and Chinese parallel corpus, and proposes to solve the problems in neural network machine translation by statistical machine translation features. Firstly, this paper builds an attention-based Mongolian and Chinese neural network machine translation system; secondly, this paper extracts the statistical machine translation features: translation model, word feedback information and language model, and defines its feature function. Thirdly, this paper uses the Mongolian and Chinese parallel corpus to build the Mongolian and Chinese alignment dictionary by using GIZA, and uses IRSTLM to build a language model for Chinese. Fourthly, this paper integrates the established Mongolian and Chinese alignment dictionaries, language models and word feedback information into the decoding of attention based neural network machine translation to deal with the limitations of neural network machine translation. Finally, this paper proposes two methods to deal with the problem of unrecorded words in neural network machine translation, which greatly reduce the number of unrecorded words in neural network machine translation. The experimental results show that the Mongolian and Chinese neural network machine translation, which combines statistical machine translation features, can significantly improve the translation quality. The BLEU value is increased to 30.66, and the sentence length is raised from 16.7 words to 19.1 words. In addition, 86% of the unrecorded words in neural network machine translation were eliminated.
【學位授予單位】:內(nèi)蒙古大學
【學位級別】:碩士
【學位授予年份】:2017
【分類號】:TP391.2

【參考文獻】

相關(guān)期刊論文 前9條

1 劉群;;基于句法的統(tǒng)計機器翻譯模型與方法[J];中文信息學報;2011年06期

2 奉國和;鄭偉;;國內(nèi)中文自動分詞技術(shù)研究綜述[J];圖書情報工作;2011年02期

3 趙偉;侯宏旭;從偉;宋美娜;;基于條件隨機場的蒙古語詞切分研究[J];中文信息學報;2010年05期

4 侯宏旭;張國強;劉志文;;層次化蒙古語統(tǒng)計語言模型[J];內(nèi)蒙古大學學報(自然科學版);2009年03期

5 劉群;;機器翻譯研究新進展[J];當代語言學;2009年02期

6 侯宏旭;劉群;那順烏日圖;牧仁高娃;李錦濤;;基于統(tǒng)計語言模型的蒙古文詞切分[J];模式識別與人工智能;2009年01期

7 侯宏旭;劉群;劉志文;張國強;;Skip-N蒙古文統(tǒng)計語言模型[J];內(nèi)蒙古大學學報(自然科學版);2008年02期

8 黃昌寧;趙海;;中文分詞十年回顧[J];中文信息學報;2007年03期

9 那順烏日圖;蒙古文詞根、詞干、詞尾的自動切分系統(tǒng)[J];內(nèi)蒙古大學學報(人文社會科學版);1997年02期

相關(guān)會議論文 前1條

1 申曉亭;;少數(shù)民族文字拉丁轉(zhuǎn)寫的意義與方案[A];第十屆全國少數(shù)民族語言文字信息處理學術(shù)研討會論文集[C];2005年

相關(guān)碩士學位論文 前1條

1 明玉;基于詞典、規(guī)則與統(tǒng)計的蒙古文詞切分系統(tǒng)的研究[D];內(nèi)蒙古大學;2011年

,

本文編號:2415194

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/shoufeilunwen/xixikjs/2415194.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶fc9f3***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com