從網(wǎng)頁和在線中英詞典中獲取專業(yè)術(shù)語翻譯的方法研究和實(shí)現(xiàn)

發(fā)布時(shí)間：2018-10-05 18:38

【摘要】：特定領(lǐng)域中的專業(yè)術(shù)語是領(lǐng)域的核心概念，承載著豐富的領(lǐng)域信息。由于專業(yè)術(shù)語不斷增長和變化，，其翻譯成為機(jī)器翻譯和信息檢索的難題之一�；诮y(tǒng)計(jì)和規(guī)則的方法在翻譯術(shù)語時(shí)都遇到一定的困難。本文以Web為語料庫，結(jié)合Web挖掘和知識獲取的手段研究中文專業(yè)術(shù)語的英文翻譯問題。不僅有助于解決術(shù)語翻譯這一難題，實(shí)現(xiàn)半自動(dòng)建立術(shù)語雙語詞典，而且還將對跨語言信息檢索、跨語言知識獲取等工作產(chǎn)生積極的推動(dòng)作用。本文主要圍繞以下幾個(gè)方面展開研究。 1)首先分析基于Web的翻譯獲取面臨的主要問題、難點(diǎn)及研究現(xiàn)狀，并探討以往研究的不足，接著給出從網(wǎng)頁中獲取術(shù)語翻譯的基本流程和思路。 2)利用基于Web的信息抽取技術(shù)和語義預(yù)測原則，結(jié)合術(shù)語部分翻譯構(gòu)建查詢項(xiàng)，從搜索引擎中返回術(shù)語翻譯相關(guān)網(wǎng)頁，解決了術(shù)語翻譯雙語共現(xiàn)語料難以獲取的問題。高質(zhì)量的術(shù)語翻譯相關(guān)語料獲取，為后續(xù)的術(shù)語抽取打下了良好的基礎(chǔ)。 3)利用知識獲取技術(shù)，結(jié)合半結(jié)構(gòu)化文本分析方法以及統(tǒng)計(jì)和規(guī)則結(jié)合的信息抽取方法從網(wǎng)頁中抽取術(shù)語翻譯。提出了基于模板、詞典模式和位置模式三種抽取方法相結(jié)合的抽取思路，在保證召回率的前提下最大限度提高返回結(jié)果的準(zhǔn)確性。 4)為了排除翻譯結(jié)果中的噪聲數(shù)據(jù)，本文利用手工整理的術(shù)語雙語對齊語料提出端類比對齊、雙語對齊度和構(gòu)詞法三種驗(yàn)證方法，對候選翻譯進(jìn)行充分不必要驗(yàn)證。術(shù)語翻譯驗(yàn)證過程保證了術(shù)語翻譯的準(zhǔn)確性，使系統(tǒng)的實(shí)用性和可靠性更高。 5)對常用的術(shù)語采用在線中英詞典輔助翻譯，保證術(shù)語翻譯精度的同時(shí)提高翻譯獲取系統(tǒng)的效率。對不同領(lǐng)域術(shù)語的獲取實(shí)驗(yàn)表明，本文從網(wǎng)頁和在線詞典中獲取術(shù)語翻譯的方法和系統(tǒng)具有很好的準(zhǔn)確性，較前人方法有顯著提高，且系統(tǒng)耗時(shí)少，實(shí)用性很強(qiáng)。
[Abstract]:The term in a specific domain is the core concept of the domain, carrying rich domain information. The translation of technical terms has become one of the difficult problems in machine translation and information retrieval due to the increasing and changing of technical terms. Both statistical and rule-based approaches have encountered difficulties in translating terminology. In this paper, Web is used as a corpus to study English translation of Chinese technical terms by means of Web mining and knowledge acquisition. It is not only helpful to solve the problem of term translation, but also has a positive effect on cross-language information retrieval, cross-language knowledge acquisition and so on. This paper mainly focuses on the following aspects. The main contents of this paper are as follows: 1) the main problems, difficulties and current research situation of translation acquisition based on Web are analyzed, and the shortcomings of previous studies are discussed. Then, the basic flow and thinking of obtaining terminology translation from web pages are given. 2) using the information extraction technology based on Web and semantic prediction principle, combining with the partial translation of terms to construct query items, returning the related web pages of term translation from search engine, solving the problem that the bilingual cooccurrence data of term translation is difficult to obtain. High-quality terminology translation related data acquisition for the subsequent term extraction laid a good foundation. 3) using knowledge acquisition technology, combining semi-structured text analysis method and information extraction method of statistics and rules to extract terminology translation from web pages. Based on template, dictionary pattern and location mode, the extraction method is proposed, which can improve the accuracy of the result under the premise of guaranteeing recall rate. 4) in order to eliminate the noise data from the translation results, this paper proposes three verification methods: end analogy alignment, bilingual alignment and word-formation, using the manually compiled term bilingual alignment corpus to verify candidate translation unnecessarily. Term translation verification ensures the accuracy of term translation and makes the system more practical and reliable. 5) On-line Chinese-English dictionaries are used to help the translation of commonly used terms to ensure the accuracy of the translation and improve the efficiency of the translation acquisition system. The experiments on the acquisition of terms in different fields show that the method and system of terms translation from web pages and online dictionaries have good accuracy and are significantly improved compared with previous methods, and the system takes less time and has strong practicability.
【學(xué)位授予單位】：江蘇科技大學(xué)
【學(xué)位級別】：碩士
【學(xué)位授予年份】：2012
【分類號】：TP391.1

【參考文獻(xiàn)】

相關(guān)期刊論文前10條

1 陳魁;馮寅;;一種基于隱馬爾可夫模型的第一類對位生成方法[J];福建電腦;2008年09期

2 何中軍;劉群;林守勛;;基于短語相似度的統(tǒng)計(jì)機(jī)器翻譯模型[J];高技術(shù)通訊;2009年04期

3 董燕舉;白宇;蔡東風(fēng);;基于Web的中英術(shù)語翻譯獲取方法研究[J];沈陽航空工業(yè)學(xué)院學(xué)報(bào);2010年02期

4 李保利,陳玉忠,俞士汶;信息抽取研究綜述[J];計(jì)算機(jī)工程與應(yīng)用;2003年10期

5 鄧丹,劉群,俞鴻魁;基于雙語詞典的漢英詞語對齊算法研究[J];計(jì)算機(jī)工程;2005年16期

6 夏天;;漢語詞語語義相似度計(jì)算研究[J];計(jì)算機(jī)工程;2007年06期

7 呂學(xué)強(qiáng),吳宏林,姚天順;無雙語詞典的英漢詞對齊[J];計(jì)算機(jī)學(xué)報(bào);2004年08期

8 劉群;統(tǒng)計(jì)機(jī)器翻譯綜述[J];中文信息學(xué)報(bào);2003年04期

9 蔣龍;周明;簡立峰;;利用音譯和網(wǎng)絡(luò)挖掘翻譯命名實(shí)體[J];中文信息學(xué)報(bào);2007年01期

10 熊德意;劉群;林守勛;;基于句法的統(tǒng)計(jì)機(jī)器翻譯綜述[J];中文信息學(xué)報(bào);2008年02期

相關(guān)碩士學(xué)位論文前2條

1 鄧丹;漢英詞語對齊技術(shù)研究[D];中國科學(xué)院研究生院（計(jì)算技術(shù)研究所）;2004年

2 王旭東;基于Web的信息抽取技術(shù)研究[D];西南交通大學(xué);2008年

本文編號：2254447

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會(huì)員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2254447.html

上一篇：構(gòu)建查詢需求形式分類體系
下一篇：ESIS序列自適應(yīng)生成算法

論文發(fā)表

·知網(wǎng)|萬方|維普|龍?jiān)磡省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

從網(wǎng)頁和在線中英詞典中獲取專業(yè)術(shù)語翻譯的方法研究和實(shí)現(xiàn)