天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁(yè) > 科技論文 > 軟件論文 >

基于漢越雙語(yǔ)平行語(yǔ)料庫(kù)的詞對(duì)齊方法研究

發(fā)布時(shí)間:2019-02-23 20:41
【摘要】:近年來(lái),機(jī)器翻譯正在逐漸成為克服人們之間進(jìn)行交流時(shí)所面臨的語(yǔ)言障礙的重要手段。雙語(yǔ)詞對(duì)齊研究是自動(dòng)獲取翻譯知識(shí)的基礎(chǔ)環(huán)節(jié),尤其在機(jī)器翻譯領(lǐng)域,經(jīng)過(guò)詞語(yǔ)對(duì)齊的語(yǔ)料是極具有價(jià)值的翻譯知識(shí)源。它為后期漢越詞典編制、機(jī)器翻譯、語(yǔ)音識(shí)別、信息檢索、語(yǔ)義消歧以及雙語(yǔ)句子對(duì)齊系統(tǒng)等自然語(yǔ)言處理領(lǐng)域研究提供重要支撐,這使得人們?cè)絹?lái)越意識(shí)到獲取雙語(yǔ)詞對(duì)齊語(yǔ)料的重要性。研究如何在前人基礎(chǔ)上提高漢越雙語(yǔ)詞語(yǔ)對(duì)齊質(zhì)量,構(gòu)建大規(guī)模的漢-越雙語(yǔ)詞對(duì)齊語(yǔ)料庫(kù)在學(xué)術(shù)上具有一定的研究?jī)r(jià)值。目前,針對(duì)漢-英,法-英等大語(yǔ)種的雙語(yǔ)詞對(duì)齊都取得了很好的效果,但針對(duì)漢語(yǔ)與越南語(yǔ)之間的詞對(duì)齊研究還很少見(jiàn)。本文深入探究影響漢語(yǔ)-越南語(yǔ)雙語(yǔ)詞語(yǔ)對(duì)齊質(zhì)量的原因并分析對(duì)齊過(guò)程中存在的問(wèn)題,同時(shí)在結(jié)合越南語(yǔ)的語(yǔ)言特征以及現(xiàn)有研究工作的基礎(chǔ)上,主要完成以下特色研究工作:(1)提出基于組塊的漢-越雙語(yǔ)詞對(duì)齊方法。為提高漢-越雙語(yǔ)詞對(duì)齊準(zhǔn)確率以及緩解漢-越雙語(yǔ)詞對(duì)齊過(guò)程中存在的非對(duì)稱問(wèn)題,構(gòu)建了一定規(guī)模的漢越雙語(yǔ)組塊對(duì)齊語(yǔ)料庫(kù),在組塊對(duì)齊語(yǔ)料的基礎(chǔ)上,結(jié)合漢越雙語(yǔ)的語(yǔ)言特點(diǎn),利用CRFs模型實(shí)現(xiàn)組塊內(nèi)部的詞對(duì)齊。(2)提出融合語(yǔ)義信息的漢越雙語(yǔ)詞語(yǔ)對(duì)齊算法。由于對(duì)齊過(guò)程存在低頻詞對(duì)齊錯(cuò)誤率高的問(wèn)題,考慮構(gòu)建詞匯相似性模型。在單語(yǔ)的語(yǔ)料庫(kù)中利用神經(jīng)網(wǎng)絡(luò)模型訓(xùn)練出詞語(yǔ)相似性模型,利用詞語(yǔ)的相似性模型來(lái)擴(kuò)展IBM詞對(duì)齊模型,最后用融合詞匯相似性模型的GIZA++實(shí)現(xiàn)漢語(yǔ)與越南語(yǔ)間詞匯的對(duì)齊。(3)基于集成學(xué)習(xí)的思想,提出結(jié)合語(yǔ)義信息、word2vec詞對(duì)齊模型以及基于組塊的三個(gè)詞對(duì)齊模型,把它們看作獨(dú)立的對(duì)齊分類器,利用簡(jiǎn)單投票和加權(quán)投票的策略對(duì)多個(gè)詞對(duì)齊模型進(jìn)行融合,以進(jìn)一步改善詞對(duì)齊的質(zhì)量,實(shí)現(xiàn)對(duì)三個(gè)不同的詞對(duì)齊方法進(jìn)行評(píng)估研究。
[Abstract]:In recent years, machine translation is becoming an important means to overcome the language barriers that people face in communication. The study of double word alignment is the basic link of automatic acquisition of translation knowledge, especially in the field of machine translation, word alignment is a valuable source of translation knowledge. It provides important support for the research of natural language processing such as Chinese-Vietnamese dictionary compilation, machine translation, speech recognition, information retrieval, semantic disambiguation and bilingual sentence alignment system. This makes people more and more aware of the importance of acquiring bilingual word alignment data. The research on how to improve the quality of Chinese-Vietnamese bilingual word alignment on the basis of predecessors and to construct a large-scale Chinese-Vietnamese bilingual word alignment corpus has certain academic value. At present, Chinese-English, French-English and other major languages have achieved good results in word alignment, but word alignment between Chinese and Vietnamese is rare. This paper probes into the reasons that affect the quality of Chinese-Vietnamese bilingual word alignment and analyzes the problems existing in the alignment process. At the same time, on the basis of combining the linguistic characteristics of the Vietnamese language and the existing research work, The main works are as follows: (1) A Chinese-Vietnamese bilingual word alignment method based on chunks is proposed. In order to improve the accuracy of Chinese-Vietnamese bilingual word alignment and to alleviate the asymmetric problem in the process of Chinese-Vietnamese bilingual word alignment, a Chinese-Vietnamese bilingual block alignment corpus is constructed, which is based on the block alignment corpus. According to the characteristics of Chinese and Vietnamese bilingualism, CRFs model is used to realize word alignment within blocks. (2) A Chinese-Vietnamese bilingual word alignment algorithm is proposed, which combines semantic information. Due to the problem of high error rate of low frequency word alignment in the alignment process, a lexical similarity model is proposed. In the monolingual corpus, we use neural network model to train word similarity model, and extend IBM word alignment model by word similarity model. Finally, the lexical alignment between Chinese and Vietnamese is realized by using GIZA which combines lexical similarity model. (3) combining semantic information, word2vec word alignment model and three word alignment models based on chunks are proposed based on the idea of integrated learning. They are regarded as independent alignment classifiers, and the strategies of simple voting and weighted voting are used to fuse multiple word alignment models to further improve the quality of word alignment and to evaluate and study three different word alignment methods.
【學(xué)位授予單位】:昆明理工大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:TP391.1

【參考文獻(xiàn)】

相關(guān)期刊論文 前10條

1 劉艷超;郭劍毅;余正濤;周蘭江;嚴(yán)馨;陳秀琴;;融合實(shí)體特性識(shí)別越南語(yǔ)復(fù)雜命名實(shí)體的混合方法[J];智能系統(tǒng)學(xué)報(bào);2016年04期

2 李英;郭劍毅;余正濤;毛存禮;線巖團(tuán);;越南語(yǔ)短語(yǔ)樹(shù)到依存樹(shù)的轉(zhuǎn)換研究[J];計(jì)算機(jī)科學(xué)與探索;2017年04期

3 莫媛媛;郭劍毅;余正濤;毛存禮;牛翊童;;基于深層神經(jīng)網(wǎng)絡(luò)(DNN)的漢-越雙語(yǔ)詞語(yǔ)對(duì)齊方法[J];山東大學(xué)學(xué)報(bào)(理學(xué)版);2016年01期

4 李發(fā)杰;余正濤;郭劍毅;李英;周蘭江;;借助漢-越雙語(yǔ)詞對(duì)齊語(yǔ)料構(gòu)建越南語(yǔ)依存樹(shù)庫(kù)[J];中文信息學(xué)報(bào);2015年06期

5 劉穎;姜巍;;一種基于改進(jìn)隱馬爾克夫模型的詞語(yǔ)對(duì)齊方法[J];中文信息學(xué)報(bào);2014年02期

6 潘清清;周楓;余正濤;郭劍毅;線巖團(tuán);;基于條件隨機(jī)場(chǎng)的越南語(yǔ)命名實(shí)體識(shí)別方法[J];山東大學(xué)學(xué)報(bào)(理學(xué)版);2014年01期

7 張貫虹;烏達(dá)巴拉;鞏政;;基于判別式模型的蒙英詞對(duì)齊方法[J];模式識(shí)別與人工智能;2012年03期

8 任志敏;蔡?hào)|風(fēng);尹寶生;;一種高效的基于啟發(fā)式規(guī)則和詞典相結(jié)合的雙語(yǔ)詞對(duì)齊方法[J];沈陽(yáng)航空工業(yè)學(xué)院學(xué)報(bào);2010年05期

9 劉群;;機(jī)器翻譯研究新進(jìn)展[J];當(dāng)代語(yǔ)言學(xué);2009年02期

10 張孝飛;陳肇雄;黃河燕;王建德;;基于錨點(diǎn)詞對(duì)的雙語(yǔ)詞對(duì)齊算法[J];小型微型計(jì)算機(jī)系統(tǒng);2006年02期

相關(guān)博士學(xué)位論文 前1條

1 楊南;基于神經(jīng)網(wǎng)絡(luò)學(xué)習(xí)的統(tǒng)計(jì)機(jī)器翻譯研究[D];中國(guó)科學(xué)技術(shù)大學(xué);2014年

相關(guān)碩士學(xué)位論文 前3條

1 莫媛媛;漢越雙語(yǔ)詞語(yǔ)對(duì)齊方法研究[D];昆明理工大學(xué);2015年

2 潘清清;越南語(yǔ)新聞事件元素抽取方法研究[D];昆明理工大學(xué);2014年

3 李濤;基于半監(jiān)督技術(shù)的集成分類研究[D];西北農(nóng)林科技大學(xué);2009年



本文編號(hào):2429178

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2429178.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶bde22***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com