融合互聯(lián)網(wǎng)引擎的機器翻譯系統(tǒng)
發(fā)布時間:2018-03-12 14:56
本文選題:機器翻譯 切入點:系統(tǒng)融合 出處:《內(nèi)蒙古大學(xué)》2017年碩士論文 論文類型:學(xué)位論文
【摘要】:機器翻譯從出現(xiàn)到現(xiàn)在歷經(jīng)了幾十年的發(fā)展,已經(jīng)取得了令人矚目的成果,期間各種方法不斷被提出,目前主流的是基于統(tǒng)計的機器翻譯以及最新的基于神經(jīng)網(wǎng)絡(luò)的機器翻譯方法,各種機器翻譯方法都有自己獨特的優(yōu)勢,因此提出了系統(tǒng)融合方法來"取長補短",希望通過系統(tǒng)融合來優(yōu)化翻譯結(jié)果。目前,機器翻譯在工業(yè)上的應(yīng)用已經(jīng)十分成熟,百度、有道和谷歌等都推出了在線互聯(lián)網(wǎng)翻譯系統(tǒng),本次研究就是利用這些互聯(lián)網(wǎng)翻譯引擎以及利用Moses統(tǒng)計機器翻譯模型訓(xùn)練出的系統(tǒng)來進行系統(tǒng)融合。系統(tǒng)融合按照操作基本操作單元的不同可以分為句子級、短語級和詞匯級系統(tǒng)融合三種,本研究中進行了句子級和詞匯級以及基于MEMT的三種融合方式,在漢英翻譯任務(wù)上進行。句子級系統(tǒng)融合采用了最小貝葉斯風(fēng)險解碼的方法,在解碼時使用了不同的損失函數(shù),在使用TER作為損失函數(shù)時取得了最好的結(jié)果,比融合前的最好結(jié)果的BLEU得分提升了 0.24個點。在詞匯級系統(tǒng)融合中需要構(gòu)造混淆網(wǎng)絡(luò)并解碼來得到目標(biāo)結(jié)果,研究中對構(gòu)造混淆網(wǎng)絡(luò)時采用的不同的詞對齊方式以及解碼時加入不同的特征進行了多組對比實驗,結(jié)果表明基于TER并加入詞干匹配的詞對齊以及解碼時加入多種有效特征可以提升系統(tǒng)融合的效果,這個實驗也取得了本次研究的最好結(jié)果,比融合前最好結(jié)果的BLEU得分提升了 0.78個點,比融合前最差的系統(tǒng)提升了 3.01個點;贛EMT的系統(tǒng)融合效果表現(xiàn)一般,比融合前最好結(jié)果的BLEU得分提升了 0.48個點。實驗結(jié)果表明融合互聯(lián)網(wǎng)引擎的機器翻譯系統(tǒng)可以提升翻譯的質(zhì)量。研究最后實現(xiàn)了一個融合互聯(lián)網(wǎng)翻譯引擎的B/S模式的系統(tǒng),采用的是詞匯級的系統(tǒng)融合方式。
[Abstract]:Machine translation has been developed for several decades from its emergence to now, and has achieved remarkable results. During this period, various methods have been put forward. At present, the mainstream is statistically based machine translation and the latest machine translation method based on neural network. All kinds of machine translation methods have their own unique advantages. Therefore, a system fusion method is proposed to "learn from each other's weaknesses", hoping to optimize translation results through system fusion. At present, the application of machine translation in industry is very mature, Baidu, Youdao and Google have launched online Internet translation systems, This research is to use these Internet translation engines and the system trained by Moses statistical machine translation model to fuse the system. The system fusion can be divided into sentence level according to the different operation units. There are three fusion methods of phrase level and lexical level, sentence level and vocabulary level, and three fusion methods based on MEMT, which are used in Chinese-English translation task. Sentence level system fusion adopts the method of minimum Bayesian risk decoding. Different loss functions are used in decoding, and the best results are obtained when TER is used as a loss function. The BLEU score is 0.24 points higher than the best result before fusion. In lexical level system fusion, we need to construct a confusion network and decode to get the target result. In the study, the different word alignment methods used in the construction of confusion network and the addition of different features in decoding were compared with each other. The results show that word alignment based on TER and stem matching and several effective features in decoding can improve the effectiveness of system fusion. This experiment has also obtained the best results of this study. The BLEU score was 0.78 points higher than that of the best result before fusion, and 3.01 points higher than that of the worst system before fusion. The experimental results show that the machine translation system integrated with the Internet engine can improve the translation quality. Finally, a system integrating the Internet translation engine with the B / S model is implemented. The system fusion method of vocabulary level is adopted.
【學(xué)位授予單位】:內(nèi)蒙古大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:TP391.2
【參考文獻】
相關(guān)期刊論文 前4條
1 李茂西;宗成慶;;機器翻譯系統(tǒng)融合技術(shù)綜述[J];中文信息學(xué)報;2010年04期
2 杜金華;魏瑋;徐波;;基于混淆網(wǎng)絡(luò)解碼的機器翻譯多系統(tǒng)融合[J];中文信息學(xué)報;2008年04期
3 邢永康;馬少平;;統(tǒng)計語言模型綜述[J];計算機科學(xué);2003年09期
4 陳小荷;自動分詞中未登錄詞問題的一攬子解決方案[J];語言文字應(yīng)用;1999年03期
,本文編號:1602062
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/1602062.html
最近更新
教材專著