冶金領(lǐng)域漢越機(jī)器翻譯方法研究
本文選題:機(jī)器翻譯 + 漢語(yǔ)-越南語(yǔ); 參考:《昆明理工大學(xué)》2016年博士論文
【摘要】:機(jī)器翻譯是跨語(yǔ)言信息交流最有效的方式,隨著“一帶一路”國(guó)家戰(zhàn)略的實(shí)施,漢越機(jī)器翻譯變得越來(lái)越重要。中國(guó)與越南在冶金行業(yè)有著大量合作,在冶金領(lǐng)域文本、科技文獻(xiàn)、行業(yè)信息等有大量的翻譯需求,對(duì)這些信息進(jìn)行自動(dòng)翻譯對(duì)推動(dòng)漢越雙邊冶金行業(yè)信息國(guó)際合作交流有著重要的意義。當(dāng)前漢越機(jī)器翻譯方面的研究工作還相對(duì)比較薄弱,尤其在特定領(lǐng)域的機(jī)器翻譯研究工作更有限,嚴(yán)重制約了面向行業(yè)的跨語(yǔ)言信息交流。漢越語(yǔ)言本身存在很大差異,特定行業(yè)的翻譯同時(shí)還具有很多領(lǐng)域特點(diǎn),傳統(tǒng)的翻譯方法還不能完全適應(yīng)面向冶金領(lǐng)域的漢越機(jī)器翻譯,其面臨雙語(yǔ)領(lǐng)域術(shù)語(yǔ)獲取、雙語(yǔ)詞對(duì)齊自動(dòng)標(biāo)注、適應(yīng)于漢越語(yǔ)言差異特性及領(lǐng)域特性的機(jī)器翻譯問(wèn)題,結(jié)合漢越語(yǔ)言差異及冶金領(lǐng)域特性,本文開(kāi)展?jié)h越冶金領(lǐng)域機(jī)器翻譯關(guān)鍵技術(shù)及方法的探討,圍繞冶金領(lǐng)域漢越雙語(yǔ)術(shù)語(yǔ)獲取、漢越雙語(yǔ)詞對(duì)齊、融合語(yǔ)言差異的樹(shù)到樹(shù)句法統(tǒng)計(jì)機(jī)器翻譯、融合領(lǐng)域特性的句法統(tǒng)計(jì)機(jī)器翻譯等關(guān)鍵技術(shù)展開(kāi)研究,主要取得了以下創(chuàng)新性成果:(1)針對(duì)漢-越領(lǐng)域語(yǔ)料庫(kù)稀缺而導(dǎo)致雙語(yǔ)術(shù)語(yǔ)難于獲取的問(wèn)題,提出了基于樞軸語(yǔ)言的冶金領(lǐng)域雙語(yǔ)術(shù)語(yǔ)自動(dòng)獲取方法,借助于已有的漢英、英越雙語(yǔ)對(duì)照領(lǐng)域文本及科技文獻(xiàn),采用條件隨機(jī)場(chǎng)模型在源語(yǔ)言端對(duì)漢語(yǔ)領(lǐng)域文本進(jìn)行術(shù)語(yǔ)識(shí)別,然后,基于短語(yǔ)的統(tǒng)計(jì)機(jī)器翻譯思想,構(gòu)建漢語(yǔ)-英語(yǔ)短語(yǔ)概率表、英語(yǔ)-越南語(yǔ)短語(yǔ)概率表,借助樞軸的思想,通過(guò)英語(yǔ)樞軸的映射,獲得漢語(yǔ)到越南語(yǔ)的短語(yǔ)概率表,并利用中文領(lǐng)域術(shù)語(yǔ)過(guò)濾漢-越短語(yǔ)表,構(gòu)建漢-越冶金領(lǐng)域雙語(yǔ)術(shù)語(yǔ)庫(kù)。實(shí)驗(yàn)證明提出方法取得了很好的術(shù)語(yǔ)抽取效果,在漢越雙語(yǔ)對(duì)齊資源稀缺的情況下,有效解決了漢越冶金領(lǐng)域雙語(yǔ)術(shù)語(yǔ)抽取難的問(wèn)題。(2)針對(duì)漢越詞對(duì)齊自動(dòng)標(biāo)注問(wèn)題,提出融合語(yǔ)言差異特性及深度學(xué)習(xí)的漢越詞對(duì)齊方法,結(jié)合漢越在定語(yǔ)后置、狀語(yǔ)后置和語(yǔ)言結(jié)構(gòu)位置上的差異特點(diǎn),定義語(yǔ)言位置轉(zhuǎn)換函數(shù)及結(jié)構(gòu)調(diào)整函數(shù),并將這些函數(shù)作為約束,將語(yǔ)言結(jié)構(gòu)差異特性融合到雙向RNN學(xué)習(xí)的損失函數(shù)中,以此提升雙語(yǔ)詞對(duì)齊學(xué)習(xí)的性能及精度。漢越雙語(yǔ)詞對(duì)齊實(shí)驗(yàn)結(jié)果表明,提出的方法表現(xiàn)出很好的效果,語(yǔ)言特性及雙向上下文信息能夠有效提升詞對(duì)齊效果。(3)針對(duì)漢越語(yǔ)言差異特點(diǎn),提出了融合語(yǔ)言特點(diǎn)的漢越樹(shù)到樹(shù)統(tǒng)計(jì)機(jī)器翻譯方法。語(yǔ)言差異特性對(duì)機(jī)器翻譯有很好作用,分析漢越語(yǔ)言差異,定義漢越語(yǔ)言差異化規(guī)則,定義了定語(yǔ)后置獎(jiǎng)勵(lì)、時(shí)間狀語(yǔ)后置獎(jiǎng)勵(lì)、地點(diǎn)狀語(yǔ)后置獎(jiǎng)勵(lì)等語(yǔ)言特征,借助漢越雙語(yǔ)詞對(duì)齊語(yǔ)料,在模板抽取時(shí),將語(yǔ)言差異特征融合到樹(shù)到樹(shù)翻譯規(guī)則抽取過(guò)程,在解碼過(guò)程中,利用語(yǔ)言差異規(guī)則對(duì)候選句子進(jìn)行剪枝和優(yōu)化,獲取最優(yōu)翻譯序列,提高模板抽取及解碼的效率和精度。漢越雙語(yǔ)句子翻譯實(shí)驗(yàn)結(jié)果表明提出的方法取得了很好的效果,句法差異特性的利用能夠有效提升翻譯的性能和精度。(4)為提升領(lǐng)域文本翻譯效果,提出了融合領(lǐng)域特性的漢越句法統(tǒng)計(jì)機(jī)器翻譯方法,分析了領(lǐng)域特點(diǎn)及其對(duì)機(jī)器翻譯的影響關(guān)系,借助領(lǐng)域術(shù)語(yǔ)及語(yǔ)料,構(gòu)建雙語(yǔ)術(shù)語(yǔ)-主題分布模型、段落領(lǐng)域主題連貫性模型、及基于Freebase的領(lǐng)域知識(shí)模型,在融合語(yǔ)言特點(diǎn)的樹(shù)到樹(shù)的翻譯模型中,將雙語(yǔ)領(lǐng)域術(shù)語(yǔ)庫(kù)、雙語(yǔ)術(shù)語(yǔ)-主題概率分布、段落領(lǐng)域連貫性及領(lǐng)域知識(shí)關(guān)系應(yīng)用到候選翻譯的選擇、組合及剪枝優(yōu)化等解碼過(guò)程中,從而更有效利用領(lǐng)域特性提升領(lǐng)域翻譯效果。冶金領(lǐng)域漢越翻譯實(shí)驗(yàn)結(jié)果表明提出的方法取得很好的效果,領(lǐng)域主題、段落主題連貫性、領(lǐng)域知識(shí)對(duì)領(lǐng)域文本翻譯具有明顯提升效果。
[Abstract]:Machine Translation is the most effective way of cross language information exchange. With the implementation of the national strategy of "one area and one road", Han Yue Machine Translation becomes more and more important. There is a great deal of cooperation between China and Vietnam in the metallurgical industry. There are a lot of translation needs in the text of metallurgy, scientific literature, industry information and so on, and the information is translated automatically. It is of great significance to promote the international cooperation and exchange of information between the Han and Vietnam bilateral metallurgical industries. The research work of the Han and Vietnamese Machine Translation is relatively weak, especially in the specific field of Machine Translation research, which seriously restricts the cross language information exchange for the industry. There are great differences in the language of the Han Dynasty and Vietnam. The translation of the industry is also characterized by many fields. The traditional translation method can not be fully adapted to the Machine Translation in the field of metallurgy. It is faced with the acquisition of bilingual terminology, the automatic tagging of bilingual word alignment, the Machine Translation problem adapted to the differences and domain characteristics of the Han Yue language, combining the differences of the Chinese and Vietnamese language and the metallurgical collar. In this paper, the key technologies and methods of Machine Translation in the area of Han Yue metallurgy are discussed in this paper. This paper focuses on the study of the key technologies, such as the acquisition of Sino Vietnamese bilingual terminology, the alignment of Chinese and Vietnamese bilingual words, the tree to the tree syntactic statistics Machine Translation, the syntactic statistics of the domain characteristics of the syntactic statistics Machine Translation and other key technologies. Innovative achievements: (1) in view of the problem that the Chinese and Vietnamese corpus are scarce and the bilingual terminology is difficult to obtain, the automatic acquisition method of bilingual terminology in metallurgical field based on pivot language is proposed, with the help of the existing Chinese English, English and Vietnamese bilingual contrast domain text and scientific literature, the conditional random field model is used in the source language to the Chinese domain. The text carries out the terminology recognition, and then, based on the phrase - based statistical Machine Translation thought, the Chinese - English phrase probability table is constructed, the English - Vietnamese phrase probability table is used to obtain the phrase probability table of Chinese to Vietnamese by the mash of the pivot, and the Chinese Vietnamese phrase table is used to construct Han Yue metallurgy with the Chinese domain terms. The bilingual terminology Library of the gold field has proved that the proposed method has achieved a good term extraction effect. In the case of scarcity of Chinese and Vietnamese bilingual align resources, the problem of bilingual terminology extraction in the Han Yue metallurgy field is effectively solved. (2) in view of the problem of automatic tagging in the alignment of the Chinese and Vietnamese words, the Chinese Vietnamese words with the characteristics of the language difference and the deep learning are put forward. In order to improve the performance and accuracy of the bilingual word alignment learning, the homogeneity method, combining with the differences of the postposition of the attributive, the postposition of adverbials and the position of the language structure, defines the position transformation function and the structural adjustment function of the language, and combines these functions as a constraint to integrate the linguistic structure difference into the loss function of the two-way RNN learning. The results of the bilingual word alignment show that the proposed method has a good effect. Language characteristics and two-way context information can effectively improve the effect of word alignment. (3) according to the characteristics of the Chinese and Vietnamese language differences, the Chinese Vietnamese tree to tree statistical Machine Translation method is proposed. The language difference characteristics have a good effect on the Machine Translation. This paper analyzes the differences between the Chinese and Vietnamese language, defines the Chinese Vietnamese language differentiation rules, defines the language characteristics of the attributive postposition reward, the time adverbial postposition reward, the place adverbial postposition reward and so on. With the help of the Chinese and Vietnamese bilingual words, the language difference features are fused to the tree to tree translation rule extraction process when the template is extracted. In the decoding process, the language is used in the decoding process. The difference rules are used to prune and optimize the candidate sentences, obtain the optimal translation sequence and improve the efficiency and accuracy of template extraction and decoding. The results of Chinese Vietnamese bilingual sentence translation experiments show that the proposed method has achieved good results. The use of syntactic differences can effectively improve the performance and accuracy of translation. (4) to improve the domain text In translation effect, the Chinese Vietnamese syntactic statistics Machine Translation method, which combines the characteristics of the domain, is proposed, and the characteristics of the domain and its influence on Machine Translation are analyzed. With the use of domain terms and corpus, the bilingual terminology theme distribution model, the topic coherence model in the paragraph domain, and the domain knowledge model based on Freebase are used to fuse the language characteristics. In the tree to tree translation model, bilingual domain terminology database, bilingual term - topic probability distribution, paragraph domain coherence and domain knowledge relation are applied to the selection of candidate translation, combination and pruning optimization, so as to better use the domain characteristics to improve the translation effect of the domain. The method proposed by Ming has achieved good results, and the domain theme, paragraph theme coherence and domain knowledge have significant effect on the translation of domain texts.
【學(xué)位授予單位】:昆明理工大學(xué)
【學(xué)位級(jí)別】:博士
【學(xué)位授予年份】:2016
【分類(lèi)號(hào)】:H44;TF0
【相似文獻(xiàn)】
相關(guān)期刊論文 前10條
1 張敬國(guó);萬(wàn)新梁;張景懷;汪禮敏;;銅在粉末冶金領(lǐng)域中的重要性[J];世界有色金屬;2009年06期
2 ;“863”計(jì)劃在冶金領(lǐng)域取得顯著成果[J];礦業(yè)快報(bào);2001年08期
3 馬智明,徐榮軍,姚忠卯,馬林海;Data mining techniques在冶金領(lǐng)域的應(yīng)用[J];河南冶金;2001年02期
4 ;一本關(guān)于粉末冶金領(lǐng)域的全面的工具書(shū)[J];粉末冶金技術(shù);2008年02期
5 唐華生;傳統(tǒng)粉末冶金領(lǐng)域的一些技術(shù)發(fā)展(上)[J];機(jī)械工程;1989年02期
6 陳深;;1985年度亞洲采礦會(huì)議[J];國(guó)外采礦技術(shù)快報(bào);1985年09期
7 曉松;國(guó)內(nèi)幾個(gè)粉末冶金領(lǐng)域的相關(guān)網(wǎng)站[J];粉末冶金工業(yè);2003年01期
8 廖際常;成果豐碩的粉末冶金研究基地[J];稀有金屬材料與工程;1985年03期
9 張華;;POCHAHO公司同VSMPO-AVISMA公司加強(qiáng)冶金領(lǐng)域合作[J];中國(guó)鈦業(yè);2012年01期
10 ;新書(shū)征訂[J];粉末冶金技術(shù);1989年03期
相關(guān)會(huì)議論文 前2條
1 葛道才;郭雄軍;;陰陽(yáng)膜和雙極膜在冶金領(lǐng)域的應(yīng)用探討[A];第四屆全國(guó)膜分離技術(shù)在冶金工業(yè)中應(yīng)用研討會(huì)論文集[C];2014年
2 徐銅文;;我國(guó)分離膜發(fā)展的戰(zhàn)略淺議及在冶金領(lǐng)域中應(yīng)用前景展望[A];第四屆全國(guó)膜分離技術(shù)在冶金工業(yè)中應(yīng)用研討會(huì)論文集[C];2014年
相關(guān)重要報(bào)紙文章 前6條
1 通訊員 尹欣欣;華油工建承建工程首獲冶金領(lǐng)域優(yōu)質(zhì)獎(jiǎng)[N];中國(guó)石油報(bào);2009年
2 駐湖北記者 李文聰 通訊員 鄺冬林 張珂斌;武漢科爾輥破機(jī)進(jìn)軍冶金領(lǐng)域[N];中國(guó)建材報(bào);2007年
3 記者 徐剛;耐磨產(chǎn)品多項(xiàng)“扎根”冶金領(lǐng)域[N];中國(guó)冶金報(bào);2004年
4 夏杰生;電磁冶金領(lǐng)域的全能專(zhuān)家[N];中國(guó)冶金報(bào);2009年
5 記者 周炳文;微波技術(shù)新增產(chǎn)值近10億[N];云南政協(xié)報(bào);2011年
6 田慶華;高校冶金學(xué)院院長(zhǎng)學(xué)術(shù)論壇舉行[N];中國(guó)有色金屬報(bào);2007年
相關(guān)博士學(xué)位論文 前1條
1 高盛祥;冶金領(lǐng)域漢越機(jī)器翻譯方法研究[D];昆明理工大學(xué);2016年
,本文編號(hào):2056183
本文鏈接:http://sikaile.net/shoufeilunwen/rwkxbs/2056183.html