天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

融合主題的漢越機(jī)器翻譯方法研究

發(fā)布時間:2018-05-17 22:43

  本文選題:漢語-越南語 + 統(tǒng)計機(jī)器翻譯; 參考:《昆明理工大學(xué)》2017年碩士論文


【摘要】:越南與我國云南、廣西毗鄰,在國家發(fā)展戰(zhàn)略的帶動下,越南與國內(nèi)交流密切。漢越機(jī)器翻譯可以推動兩國在旅游、電子商務(wù)、科技等方面的合作。傳統(tǒng)的統(tǒng)計翻譯模型主要是計算源語言和目標(biāo)語言的短語翻譯概率、以及詞匯翻譯概率等。但是這些翻譯概率并不能準(zhǔn)確衡量源語言和目標(biāo)語言在語義上的相似度,它們可能導(dǎo)致譯文與原文語義上并不等價,甚至使譯文出現(xiàn)嚴(yán)重的語義翻譯錯誤。本文針對以上問題,以樹到樹的翻譯模型為基礎(chǔ)做了一系列研究,主要研究成果如下:(1)融合短語主題的樹到樹翻譯模型。由于自然語言的復(fù)雜性,目前的漢越機(jī)器翻譯很難處理好領(lǐng)域歧義詞的問題。為了提高漢越翻譯的質(zhì)量,本文提出了短語話題語義翻譯模型,在樹到樹的解碼過程中,替代原有的特征函數(shù)短語翻譯概率,利用短語與它所在句子主題的分布關(guān)系來約束短語的選擇。這種融合短語與主題關(guān)系的機(jī)器翻譯方法在一定程度上能達(dá)到領(lǐng)域自適應(yīng)的目的。對比實驗結(jié)果表明,在一定規(guī)模領(lǐng)域語料的支持上,融合短語主題的漢越機(jī)器翻譯顯著改善了領(lǐng)域歧義詞的翻譯效果。(2)融合句子連貫性模型的樹到樹翻譯模型。目前的漢越機(jī)器翻譯基本是以單個句子為單位進(jìn)行翻譯建模,忽略了篇章層面的豐富信息,并不符合人類的翻譯習(xí)慣。本文針對漢越篇章翻譯時,跨句子篇章結(jié)構(gòu)信息缺失的問題,在句子級層面進(jìn)行翻譯建模,提出了句子連貫性的翻譯模型,使用話題的平滑遷移來來表示句子的連貫性,解決了連貫性的定量描述和計算問題。通過工具構(gòu)建源語言文檔的連貫性鏈,并將此鏈映射到目標(biāo)端,進(jìn)而利用映射得到的連貫性鏈約束譯文選擇。實驗表明,融合連貫性的漢越機(jī)器翻譯在進(jìn)行篇章翻譯時,能大幅度的提高篇章譯文的連貫性。(3)融合主題的漢-越統(tǒng)計機(jī)器翻譯原型系統(tǒng)。在開源機(jī)器翻譯系統(tǒng)Niutrans的基礎(chǔ)上,我們參考對數(shù)線性模型,將短語主題模型和句子連貫性模型融合到漢越樹到樹翻譯系統(tǒng)中,然后使用現(xiàn)有的一些基礎(chǔ)開源工具,在Linux平臺上開發(fā),以JavaWeb的形式,前端使用JSP開發(fā)展示層,框架采用比較簡潔的servlet,后端調(diào)用機(jī)器翻譯的接口,搭建了融合主題的漢-越于統(tǒng)計機(jī)器翻譯原型系統(tǒng)。
[Abstract]:Vietnam is adjacent to Yunnan and Guangxi. Sino-Vietnamese machine translation can promote cooperation in tourism, e-commerce, science and technology. The traditional statistical translation models are mainly used to calculate the phrase translation probability and lexical translation probability of the source language and the target language. However, these translation probabilities can not accurately measure the semantic similarity between the source language and the target language. They may lead to the semantic equivalence between the source language and the original text, and even make the translation appear serious semantic translation errors. Based on the tree-to-tree translation model, this paper makes a series of researches on the above problems. The main research results are as follows: 1) Tree-to-Tree Translation Model which integrates phrase theme. Due to the complexity of natural language, Chinese-Vietnamese machine translation is very difficult to deal with the problem of domain ambiguity. In order to improve the quality of Chinese-Vietnamese translation, this paper proposes a phrase topic semantic translation model, which replaces the original feature function phrase translation probability in the tree-to-tree decoding process. The choice of phrase is restricted by the distribution of phrase and its sentence theme. To a certain extent, this method can achieve the purpose of domain adaptation. The contrastive experimental results show that the Chinese-Vietnamese machine translation with phrase themes can significantly improve the translation effect of domain ambiguity words. (2) the tree to tree translation model of sentence coherence model is fused with sentence coherence model. At present, Chinese and Vietnamese machine translation models are based on a single sentence, ignoring the abundant information at the text level, which is not in line with human translation habits. In order to solve the problem of the lack of cross-sentence structure information in Chinese-Vietnamese text translation, this paper models the translation of sentence coherence at sentence level, and proposes a sentence coherence translation model, in which the smooth transfer of topic is used to represent sentence coherence. The problem of quantitative description and calculation of coherence is solved. The coherence chain of the source language document is constructed by the tool and mapped to the target, and then the translation selection is constrained by the mapping coherence chain. The experimental results show that the coherence of Chinese and Vietnamese machine translation can greatly improve the coherence of the text translation in the process of text translation.) the prototype system of Chinese-Vietnamese statistical machine translation is integrated with the topic. Based on the open source machine translation system (Niutrans), we use the logarithmic linear model to integrate the phrase topic model and sentence coherence model into the Sino-Vietnamese tree to tree translation system, and then use some basic open source tools. Developed on the Linux platform, in the form of JavaWeb, the front-end uses JSP to develop the display layer, the framework adopts the simpler servlet, and the back-end calls the interface of machine translation, the prototype system of Chinese-Yueyu statistical machine translation is built.
【學(xué)位授予單位】:昆明理工大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:TP391.2

【參考文獻(xiàn)】

相關(guān)期刊論文 前3條

1 張冬梅;劉小蝶;晉耀紅;;基于模板的漢英專利機(jī)器翻譯研究[J];計算機(jī)應(yīng)用研究;2013年07期

2 楊林坤;;國家實施橋頭堡戰(zhàn)略對云南纖檢的機(jī)遇與挑戰(zhàn)[J];中國纖檢;2011年05期

3 劉群;統(tǒng)計機(jī)器翻譯綜述[J];中文信息學(xué)報;2003年04期

相關(guān)博士學(xué)位論文 前1條

1 肖桐;樹到樹統(tǒng)計機(jī)器翻譯優(yōu)化學(xué)習(xí)及解碼方法研究[D];東北大學(xué);2012年

,

本文編號:1903227

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/jingjilunwen/dianzishangwulunwen/1903227.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶d6a08***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com