基于漢越雙語(yǔ)平行語(yǔ)料庫(kù)的詞對(duì)齊方法研究
[Abstract]:In recent years, machine translation is becoming an important means to overcome the language barriers that people face in communication. The study of double word alignment is the basic link of automatic acquisition of translation knowledge, especially in the field of machine translation, word alignment is a valuable source of translation knowledge. It provides important support for the research of natural language processing such as Chinese-Vietnamese dictionary compilation, machine translation, speech recognition, information retrieval, semantic disambiguation and bilingual sentence alignment system. This makes people more and more aware of the importance of acquiring bilingual word alignment data. The research on how to improve the quality of Chinese-Vietnamese bilingual word alignment on the basis of predecessors and to construct a large-scale Chinese-Vietnamese bilingual word alignment corpus has certain academic value. At present, Chinese-English, French-English and other major languages have achieved good results in word alignment, but word alignment between Chinese and Vietnamese is rare. This paper probes into the reasons that affect the quality of Chinese-Vietnamese bilingual word alignment and analyzes the problems existing in the alignment process. At the same time, on the basis of combining the linguistic characteristics of the Vietnamese language and the existing research work, The main works are as follows: (1) A Chinese-Vietnamese bilingual word alignment method based on chunks is proposed. In order to improve the accuracy of Chinese-Vietnamese bilingual word alignment and to alleviate the asymmetric problem in the process of Chinese-Vietnamese bilingual word alignment, a Chinese-Vietnamese bilingual block alignment corpus is constructed, which is based on the block alignment corpus. According to the characteristics of Chinese and Vietnamese bilingualism, CRFs model is used to realize word alignment within blocks. (2) A Chinese-Vietnamese bilingual word alignment algorithm is proposed, which combines semantic information. Due to the problem of high error rate of low frequency word alignment in the alignment process, a lexical similarity model is proposed. In the monolingual corpus, we use neural network model to train word similarity model, and extend IBM word alignment model by word similarity model. Finally, the lexical alignment between Chinese and Vietnamese is realized by using GIZA which combines lexical similarity model. (3) combining semantic information, word2vec word alignment model and three word alignment models based on chunks are proposed based on the idea of integrated learning. They are regarded as independent alignment classifiers, and the strategies of simple voting and weighted voting are used to fuse multiple word alignment models to further improve the quality of word alignment and to evaluate and study three different word alignment methods.
【學(xué)位授予單位】:昆明理工大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:TP391.1
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 劉艷超;郭劍毅;余正濤;周蘭江;嚴(yán)馨;陳秀琴;;融合實(shí)體特性識(shí)別越南語(yǔ)復(fù)雜命名實(shí)體的混合方法[J];智能系統(tǒng)學(xué)報(bào);2016年04期
2 李英;郭劍毅;余正濤;毛存禮;線巖團(tuán);;越南語(yǔ)短語(yǔ)樹(shù)到依存樹(shù)的轉(zhuǎn)換研究[J];計(jì)算機(jī)科學(xué)與探索;2017年04期
3 莫媛媛;郭劍毅;余正濤;毛存禮;牛翊童;;基于深層神經(jīng)網(wǎng)絡(luò)(DNN)的漢-越雙語(yǔ)詞語(yǔ)對(duì)齊方法[J];山東大學(xué)學(xué)報(bào)(理學(xué)版);2016年01期
4 李發(fā)杰;余正濤;郭劍毅;李英;周蘭江;;借助漢-越雙語(yǔ)詞對(duì)齊語(yǔ)料構(gòu)建越南語(yǔ)依存樹(shù)庫(kù)[J];中文信息學(xué)報(bào);2015年06期
5 劉穎;姜巍;;一種基于改進(jìn)隱馬爾克夫模型的詞語(yǔ)對(duì)齊方法[J];中文信息學(xué)報(bào);2014年02期
6 潘清清;周楓;余正濤;郭劍毅;線巖團(tuán);;基于條件隨機(jī)場(chǎng)的越南語(yǔ)命名實(shí)體識(shí)別方法[J];山東大學(xué)學(xué)報(bào)(理學(xué)版);2014年01期
7 張貫虹;烏達(dá)巴拉;鞏政;;基于判別式模型的蒙英詞對(duì)齊方法[J];模式識(shí)別與人工智能;2012年03期
8 任志敏;蔡?hào)|風(fēng);尹寶生;;一種高效的基于啟發(fā)式規(guī)則和詞典相結(jié)合的雙語(yǔ)詞對(duì)齊方法[J];沈陽(yáng)航空工業(yè)學(xué)院學(xué)報(bào);2010年05期
9 劉群;;機(jī)器翻譯研究新進(jìn)展[J];當(dāng)代語(yǔ)言學(xué);2009年02期
10 張孝飛;陳肇雄;黃河燕;王建德;;基于錨點(diǎn)詞對(duì)的雙語(yǔ)詞對(duì)齊算法[J];小型微型計(jì)算機(jī)系統(tǒng);2006年02期
相關(guān)博士學(xué)位論文 前1條
1 楊南;基于神經(jīng)網(wǎng)絡(luò)學(xué)習(xí)的統(tǒng)計(jì)機(jī)器翻譯研究[D];中國(guó)科學(xué)技術(shù)大學(xué);2014年
相關(guān)碩士學(xué)位論文 前3條
1 莫媛媛;漢越雙語(yǔ)詞語(yǔ)對(duì)齊方法研究[D];昆明理工大學(xué);2015年
2 潘清清;越南語(yǔ)新聞事件元素抽取方法研究[D];昆明理工大學(xué);2014年
3 李濤;基于半監(jiān)督技術(shù)的集成分類研究[D];西北農(nóng)林科技大學(xué);2009年
,本文編號(hào):2429178
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2429178.html