漢—老雙語(yǔ)詞語(yǔ)對(duì)齊及依存樹庫(kù)構(gòu)建方法研究
本文選題:漢語(yǔ) + 老撾語(yǔ); 參考:《昆明理工大學(xué)》2017年碩士論文
【摘要】:隨著科技和社會(huì)經(jīng)濟(jì)的快速發(fā)展,伴隨著跨語(yǔ)言溝通的不斷深化,全球互聯(lián)已成為不可抗拒的發(fā)展趨勢(shì)。面對(duì)互聯(lián)網(wǎng)上的數(shù)量巨大且實(shí)時(shí)動(dòng)態(tài)變化的多語(yǔ)言信息,僅僅依賴人工翻譯來處理這些數(shù)據(jù)簡(jiǎn)直就是天方夜譚,唯一的解決方案就是充分利用機(jī)器翻譯技術(shù)來實(shí)現(xiàn)自動(dòng)翻譯服務(wù),由此掀起了研究機(jī)器翻譯領(lǐng)域的浪潮。語(yǔ)言上的互相溝通和理解是國(guó)與國(guó)之間進(jìn)行經(jīng)濟(jì)文化各方面之間交流的基礎(chǔ),中國(guó)和老撾也不例外,對(duì)漢-老雙語(yǔ)進(jìn)行深入的研究也可以為構(gòu)建漢語(yǔ)-老撾語(yǔ)雙語(yǔ)語(yǔ)料資源打下基礎(chǔ)。在自然語(yǔ)言處理中,雙語(yǔ)詞對(duì)齊是一個(gè)十分重要的基礎(chǔ)工作,它將雙語(yǔ)平行語(yǔ)料庫(kù)中互為翻譯的一對(duì)雙語(yǔ)語(yǔ)言之間的關(guān)系看作一根連線,而這些對(duì)齊關(guān)系可以為機(jī)器翻譯提供有價(jià)值的參考知識(shí)。在自然語(yǔ)言研究領(lǐng)域中的許多應(yīng)用,例如:構(gòu)建依存樹庫(kù),雙語(yǔ)字典編纂、機(jī)器翻譯、雙語(yǔ)信息抽取等應(yīng)用,雙語(yǔ)詞對(duì)齊都能為它們提供基礎(chǔ)性支持。對(duì)漢-老雙語(yǔ)詞語(yǔ)自動(dòng)對(duì)齊方法的深入研究并且在此基礎(chǔ)上構(gòu)建具有一定規(guī)模的雙語(yǔ)平行語(yǔ)料庫(kù)在漢-老雙語(yǔ)信息化處理中有著舉足輕重的地位。本文通過分析漢語(yǔ)和老撾語(yǔ)這兩種語(yǔ)言在語(yǔ)法結(jié)構(gòu)上的異同點(diǎn),在漢-老雙語(yǔ)自動(dòng)詞對(duì)齊的方法和在基于漢-老雙語(yǔ)詞對(duì)齊語(yǔ)料的基礎(chǔ)上構(gòu)建老撾語(yǔ)依存樹庫(kù)的方法進(jìn)行相關(guān)研究,具有特色的研究工作有以下幾點(diǎn):(1)首先對(duì)漢語(yǔ)老撾語(yǔ)兩種語(yǔ)言在語(yǔ)法特點(diǎn)上存在的差別展開分析,通過分析發(fā)現(xiàn),漢語(yǔ)和老撾語(yǔ)的句子結(jié)構(gòu)中修飾詞與中心詞之間存在順序錯(cuò)位的情況,從這一特點(diǎn)入手,篩選出一些雙語(yǔ)特征,對(duì)漢-老雙語(yǔ)詞對(duì)齊加以約束。(2)將句法特征的融入到統(tǒng)計(jì)詞對(duì)齊算法中,對(duì)漢-老雙語(yǔ)自動(dòng)詞對(duì)齊算法加以約束。漢語(yǔ)和老撾語(yǔ)在語(yǔ)法和句法結(jié)構(gòu)上均存在巨大差異,漢-老雙語(yǔ)自動(dòng)詞對(duì)齊實(shí)現(xiàn)的困難較大,因此本文提出一種融合多種句法特征的漢-老雙語(yǔ)自動(dòng)詞對(duì)齊方法。首先分析和選取中老雙語(yǔ)的一些句法特征,對(duì)這些特征進(jìn)行整合并構(gòu)建模型,使用對(duì)數(shù)線性模型框架并在最小錯(cuò)誤率算法的條件下訓(xùn)練模型。實(shí)驗(yàn)以IBM3為基礎(chǔ)比對(duì)模型,結(jié)果表明該雙語(yǔ)詞對(duì)齊方法取得了很好的對(duì)齊效果,明顯優(yōu)于基礎(chǔ)模型。(3)提出了通過漢-老雙語(yǔ)詞對(duì)齊語(yǔ)料來構(gòu)建老撾語(yǔ)依存樹庫(kù)的方法。在前期的文獻(xiàn)調(diào)查中,我們發(fā)現(xiàn)國(guó)內(nèi)外目前針對(duì)老撾語(yǔ)研究工作相對(duì)較少且沒有建立較大規(guī)模的依存樹庫(kù),而人工方法構(gòu)建老撾語(yǔ)依存樹庫(kù)困難重重,所以本文提出了一種借助漢-老雙語(yǔ)詞對(duì)齊語(yǔ)料構(gòu)建老撾語(yǔ)依存樹庫(kù)的方法。在已經(jīng)獲取漢-老雙語(yǔ)詞對(duì)齊平行語(yǔ)料的基礎(chǔ)上,首先對(duì)平行語(yǔ)料中的漢語(yǔ)句子進(jìn)行依存句法分析,然后結(jié)合老撾語(yǔ)自身語(yǔ)言特點(diǎn),在依存句法規(guī)則的基礎(chǔ)上將漢語(yǔ)句子的依存關(guān)系通過漢-老雙語(yǔ)詞對(duì)齊關(guān)系映射到老撾語(yǔ)句子中,最終生成老撾語(yǔ)句子的依存樹。在實(shí)驗(yàn)中,將該方法和傳統(tǒng)的機(jī)器學(xué)習(xí)的方法進(jìn)行比較,結(jié)果表明該方法的準(zhǔn)確率得到了明顯提高,并且簡(jiǎn)化了構(gòu)建老撾語(yǔ)依存樹庫(kù)過程中的人工標(biāo)注收集工作,節(jié)省了大量的人力物力,可以在老撾語(yǔ)語(yǔ)料稀缺的情況下快速的構(gòu)建質(zhì)量較好的老撾語(yǔ)依存樹庫(kù)。
[Abstract]:With the rapid development of science and technology and social economy, with the continuous deepening of cross language communication, the global interconnection has become an irresistible trend. Facing the huge and real-time and dynamic multilingual information on the Internet, relying solely on artificial translation to deal with these data is simply the night, the only solution is It is to make full use of Machine Translation technology to realize automatic translation service, and thus set off a wave of research in the field of Machine Translation. Language communication and understanding are the basis for the exchange of economic and cultural aspects between countries and countries. China and Laos are no exception. The in-depth study of Chinese and old bilingualism can also be used to build Chinese. In the Natural Language Processing, bilingual word alignment is a very important basic work in the bilingual corpus of bilingual words. It regards the relationship between bilingual parallel corpus as a link between a pair of bilingual languages, which can provide valuable reference knowledge for Machine Translation. Many applications in the field of language research, such as building dependency tree library, bilingual dictionary compilation, Machine Translation, bilingual information extraction, can provide basic support for bilingual word alignment. A thorough study of the automatic alignment method of Chinese and old bilingual words and the construction of a bilingual parallel corpus with a certain scale on this basis. By analyzing the similarities and differences of the grammatical structure between the two languages of Chinese and Laos, this paper studies the methods for the alignment of Chinese and old bilingual words and the method of constructing the Laotian dependency tree base on the basis of the align corpus of Chinese and old bilingual words. The characteristics of the research are as follows: (1) first, the analysis of the differences in the grammatical characteristics of the two languages of the Chinese Laos is first analyzed. Through the analysis, it is found that the sequence of the modifiers and the central words in the sentence structure of Chinese and Laos are in the wrong order. From this feature, some bilingual features are screened out, and the Chinese and old bilingualism are selected. The word alignment is constrained. (2) the syntactic features are incorporated into the statistical word alignment algorithm, and the Chinese and old bilingual word alignment algorithms are constrained. There are great differences in the grammatical and syntactic structure between Chinese and Laos, and the difficulties in realizing the alignment of Chinese and old bilingual words are more difficult. Therefore, this paper puts forward a kind of syntactic feature. This paper firstly analyzes and selects some syntactic features of Chinese and old bilinguals, integrates these features and constructs the model, uses a logarithmic linear model framework and trains the model under the minimum error rate algorithm. The experiment is based on the IBM3 based comparison model. The results show that the bilingual word alignment method is very good. The alignment effect is obviously superior to that of the basic model. (3) a method of constructing the Laos dependency tree base through Chinese and old bilingual words is proposed. In the previous literature survey, we found that there are relatively few Laos research work at home and abroad, and there is no larger norm dependent dependency tree, and the artificial method is used to construct Lao language. It is difficult to save the tree bank, so this paper puts forward a method of constructing the Laos dependency tree base with the alignment corpus of Chinese and old bilingual words. On the basis of the alignment of the parallel corpus of Chinese and old bilingual words, the Chinese sentences in the parallel corpus are analyzed with dependency syntax, and then the dependency sentence is combined with the language characteristics of the Laos. On the basis of the rule of law, the dependency relationship of Chinese sentences is mapped to the Laotian sentence, and the dependency tree of the Laos sentence is generated. In the experiment, the method is compared with the traditional machine learning method. The result shows that the accuracy of the method is obviously improved and the structure is simplified. The manual labelling collection in the process of building the Laos dependency tree can save a lot of manpower and material resources, and can quickly build a good Laotian dependency tree base in the case of the scarce Lao language.
【學(xué)位授予單位】:昆明理工大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:TP391.1
【參考文獻(xiàn)】
相關(guān)期刊論文 前6條
1 楊蓓;周蘭江;余正濤;劉麗佳;;半監(jiān)督學(xué)習(xí)的老撾語(yǔ)詞性標(biāo)注方法研究[J];計(jì)算機(jī)科學(xué);2016年09期
2 曹井香;黃德根;王偉;王帥軍;;中英平行短語(yǔ)依存樹庫(kù)構(gòu)建[J];大連理工大學(xué)學(xué)報(bào);2014年01期
3 銀莎格;;國(guó)內(nèi)老撾語(yǔ)研究綜述[J];銅仁學(xué)院學(xué)報(bào);2014年01期
4 車萬(wàn)翔;張梅山;劉挺;;基于主動(dòng)學(xué)習(xí)的中文依存句法分析[J];中文信息學(xué)報(bào);2012年02期
5 呂學(xué)強(qiáng),吳宏林,姚天順;無(wú)雙語(yǔ)詞典的英漢詞對(duì)齊[J];計(jì)算機(jī)學(xué)報(bào);2004年08期
6 劉群;統(tǒng)計(jì)機(jī)器翻譯綜述[J];中文信息學(xué)報(bào);2003年04期
相關(guān)博士學(xué)位論文 前2條
1 劉樂茂;統(tǒng)計(jì)機(jī)器翻譯判別式訓(xùn)練方法研究[D];哈爾濱工業(yè)大學(xué);2013年
2 黃書劍;統(tǒng)計(jì)機(jī)器翻譯中的詞對(duì)齊研究[D];南京大學(xué);2012年
相關(guān)碩士學(xué)位論文 前3條
1 盧文杰;老撾語(yǔ)和漢語(yǔ)量詞對(duì)比研究[D];廣西民族大學(xué);2013年
2 阮華剛;基于IBM模型的漢—越雙語(yǔ)詞語(yǔ)對(duì)齊研究[D];昆明理工大學(xué);2013年
3 陳鑫;基于主動(dòng)學(xué)習(xí)的漢語(yǔ)依存樹庫(kù)構(gòu)建[D];哈爾濱工業(yè)大學(xué);2011年
,本文編號(hào):1827496
本文鏈接:http://sikaile.net/jingjilunwen/jiliangjingjilunwen/1827496.html