日漢數(shù)字時(shí)間表達(dá)式的識(shí)別與翻譯研究

發(fā)布時(shí)間：2018-10-18 09:01

【摘要】：命名實(shí)體識(shí)別及翻譯是自然語(yǔ)言處理中重要的基礎(chǔ)任務(wù)。數(shù)字時(shí)間表達(dá)式作為一類特殊的命名實(shí)體包含了關(guān)鍵信息,其識(shí)別與翻譯具有重要的理論意義和實(shí)用價(jià)值。數(shù)字時(shí)間表達(dá)式的識(shí)別與分析是信息檢索、事件抽取、事件檢測(cè)跟蹤及問(wèn)答系統(tǒng)等自然語(yǔ)言處理任務(wù)的重要基礎(chǔ)。尤其在機(jī)器翻譯等多語(yǔ)言處理任務(wù)中,數(shù)字時(shí)間表達(dá)式的對(duì)齊及其翻譯質(zhì)量也是影響機(jī)器翻譯系統(tǒng)性能的重要因素。數(shù)字時(shí)間表達(dá)式識(shí)別與翻譯的研究對(duì)于提高機(jī)器翻譯系統(tǒng)性能及推進(jìn)人工智能快速發(fā)展具有重要意義。本文從日漢雙語(yǔ)數(shù)字時(shí)間表達(dá)式的特性出發(fā),將語(yǔ)言學(xué)知識(shí)與統(tǒng)計(jì)方法相結(jié)合,通過(guò)大量的數(shù)據(jù)分析和實(shí)驗(yàn),對(duì)日漢雙語(yǔ)數(shù)字時(shí)間表達(dá)式的識(shí)別與翻譯方法進(jìn)行了深入的研究和探索并將其應(yīng)用于機(jī)器翻譯系統(tǒng)。本文的主要研究工作如下:(1)基于最新的TIMEX3時(shí)間標(biāo)注規(guī)范和通用的數(shù)字分類方式,結(jié)合日漢語(yǔ)言學(xué)知識(shí)中同構(gòu)和異構(gòu)情況,分別針對(duì)日語(yǔ)和漢語(yǔ)的數(shù)字時(shí)間表達(dá)式建立了觸發(fā)詞、邊界詞等關(guān)鍵詞知識(shí)庫(kù),并將表達(dá)“概數(shù)”含義的詞包含在數(shù)字時(shí)間表達(dá)式識(shí)別范圍中,使得數(shù)字時(shí)間表達(dá)式具有更豐富的含義;然后利用正則匹配的方式對(duì)數(shù)字時(shí)間表達(dá)式進(jìn)行識(shí)別;最后將以上基于規(guī)則與基于統(tǒng)計(jì)的識(shí)別方法相融合,分別實(shí)現(xiàn)對(duì)日語(yǔ)和漢語(yǔ)數(shù)字時(shí)間表達(dá)式的識(shí)別。實(shí)驗(yàn)結(jié)果表明,該識(shí)別方法在日語(yǔ)和漢語(yǔ)上都有較好的表現(xiàn)。(2)在傳統(tǒng)的詞對(duì)齊方法中融入雙語(yǔ)數(shù)字時(shí)間表達(dá)式對(duì)齊,提出了一種基于位置約束和相似度度量相結(jié)合的數(shù)字時(shí)間表達(dá)式雙向?qū)R算法,實(shí)驗(yàn)結(jié)果表明該算法能有效提高雙語(yǔ)詞對(duì)齊性能,輔助機(jī)器翻譯系統(tǒng)訓(xùn)練生成更優(yōu)的翻譯模型。(3)根據(jù)日漢數(shù)字時(shí)間表達(dá)式的翻譯特點(diǎn),建立數(shù)字時(shí)間表達(dá)式的翻譯規(guī)則庫(kù),專用于數(shù)字時(shí)間表達(dá)式的獨(dú)立翻譯,并將雙語(yǔ)數(shù)字時(shí)間表達(dá)式的識(shí)別及對(duì)齊信息和翻譯規(guī)則庫(kù)有效融合到現(xiàn)有的統(tǒng)計(jì)機(jī)器翻譯系統(tǒng)中,提升機(jī)器翻譯中關(guān)于數(shù)字時(shí)間表達(dá)式及其鄰近詞的翻譯準(zhǔn)確性,進(jìn)而提升整體翻譯效果,并通過(guò)實(shí)驗(yàn)得以驗(yàn)證。綜上所述,本文創(chuàng)新工作主要體現(xiàn)在:根據(jù)日漢數(shù)字時(shí)間表達(dá)式的特性,基于TIMEX3標(biāo)注對(duì)時(shí)間詞的識(shí)別和翻譯規(guī)則進(jìn)行設(shè)計(jì)、將“概數(shù)”詞納入數(shù)字時(shí)間表達(dá)式識(shí)別范圍;并提出一種基于位置約束和相似度度量的數(shù)字時(shí)間表達(dá)式雙向?qū)R算法;以及建立日漢數(shù)字時(shí)間表達(dá)式的翻譯規(guī)則庫(kù)。最終將這三方面研究?jī)?nèi)容應(yīng)用于機(jī)器翻譯系統(tǒng),實(shí)驗(yàn)驗(yàn)證其有效地改善了機(jī)器翻譯系統(tǒng)的整體性能。
[Abstract]:Named entity recognition and translation are important basic tasks in natural language processing. As a special named entity, digital time expression contains key information, and its recognition and translation have important theoretical significance and practical value. Recognition and analysis of digital time expressions are the important foundation of natural language processing tasks such as information retrieval, event extraction, event detection and tracking, and question and answer system. Especially in multilingual processing tasks such as machine translation, the alignment of digital time expressions and their translation quality are also important factors affecting the performance of machine translation systems. The research of digital time expression recognition and translation is of great significance to improve the performance of machine translation system and promote the rapid development of artificial intelligence. Based on the characteristics of Japanese and Chinese bilingual digital time expressions, this paper combines linguistic knowledge with statistical methods, and through a large number of data analysis and experiments, The recognition and translation methods of Japanese and Chinese bilingual digital time expressions are deeply studied and applied to machine translation systems. The main research work of this paper is as follows: (1) based on the latest TIMEX3 time labeling specification and the general numerical classification method, combined with the isomorphism and heterogeneity of Japanese and Chinese language knowledge, the trigger words are established for Japanese and Chinese digital time expressions, respectively. The knowledge base of keywords such as boundary words, and the words expressing the meaning of "approximate number" are included in the recognition range of digital time expression, which makes digital time expression have richer meaning. Then the digital time expression is recognized by regular matching. Finally, the recognition of Japanese and Chinese digital time expressions is realized by combining the above rule-based and statistical recognition methods. The experimental results show that the method has a good performance in both Japanese and Chinese. (2) the bilingual digital time expression alignment is incorporated into the traditional word alignment method. A bidirectional alignment algorithm of digital time expressions based on position constraint and similarity measure is proposed. The experimental results show that the algorithm can effectively improve the performance of bilingual word alignment. The auxiliary machine translation system trains to generate a better translation model. (3) according to the translation characteristics of Japanese and Chinese digital time expressions, a translation rule base of digital time expressions is established, which is used for the independent translation of digital time expressions. The recognition and alignment information of bilingual digital time expressions and translation rules are effectively integrated into the existing statistical machine translation system to improve the accuracy of translation of digital time expressions and their adjacent words in machine translation. Thus, the overall translation effect can be improved and verified by experiments. To sum up, the innovative work of this paper is mainly reflected in: according to the characteristics of Japanese and Chinese digital time expressions, the recognition and translation rules of time words are designed based on TIMEX3 annotation, and the "estimate" words are brought into the recognition scope of digital time expressions; A bidirectional alignment algorithm for digital time expressions based on position constraints and similarity measures is proposed, and a translation rule base of Japanese and Chinese digital time expressions is established. Finally, these three aspects are applied to the machine translation system, and the experimental results show that it improves the overall performance of the machine translation system effectively.
【學(xué)位授予單位】：北京交通大學(xué)
【學(xué)位級(jí)別】：碩士
【學(xué)位授予年份】：2017
【分類號(hào)】：TP391.1

【參考文獻(xiàn)】

相關(guān)期刊論文前10條

1 夏光輝;李軍蓮;阮學(xué)平;;基于實(shí)體詞典與機(jī)器學(xué)習(xí)的基因命名實(shí)體識(shí)別[J];醫(yī)學(xué)信息學(xué)雜志;2015年12期

2 楊萍;侯宏旭;蔣玉鵬;申志鵬;杜健;;基于雙語(yǔ)對(duì)齊的漢語(yǔ) 新蒙古文命名實(shí)體翻譯[J];北京大學(xué)學(xué)報(bào)(自然科學(xué)版);2016年01期

3 王東明;徐金安;陳鈺楓;張玉潔;;基于單語(yǔ)語(yǔ)料的面向日語(yǔ)假名的日漢人名翻譯對(duì)抽取方法[J];中文信息學(xué)報(bào);2015年05期

4 尹存燕;黃書劍;戴新宇;陳家駿;;中英命名實(shí)體識(shí)別及對(duì)齊中的中文分詞優(yōu)化[J];電子學(xué)報(bào);2015年08期

5 劉勝奇;朱東華;;基于多策略融合Giza++的術(shù)語(yǔ)對(duì)齊法[J];軟件學(xué)報(bào);2015年07期

6 尹存燕;黃書劍;戴新宇;陳家駿;;面向新聞?wù)Z料的中日命名實(shí)體翻譯抽取[J];小型微型計(jì)算機(jī)系統(tǒng);2015年06期

7 趙紫玉;徐金安;張玉潔;劉江鳴;;規(guī)則與統(tǒng)計(jì)相結(jié)合的日語(yǔ)時(shí)間表達(dá)式識(shí)別[J];中文信息學(xué)報(bào);2013年06期

8 徐紅艷;黨曉婉;馮勇;李軍平;;基于BP神經(jīng)網(wǎng)絡(luò)的Deep Web實(shí)體識(shí)別方法[J];計(jì)算機(jī)應(yīng)用;2013年03期

9 李君嬋;譚紅葉;王風(fēng)娥;;中文時(shí)間表達(dá)式及類型識(shí)別[J];計(jì)算機(jī)科學(xué);2012年S3期

10 陳鈺楓;宗成慶;蘇克毅;;漢英雙語(yǔ)命名實(shí)體識(shí)別與對(duì)齊的交互式方法[J];計(jì)算機(jī)學(xué)報(bào);2011年09期

相關(guān)碩士學(xué)位論文前1條

1 鄔桐;中文時(shí)間表達(dá)式識(shí)別研究[D];復(fù)旦大學(xué);2010年

，

本文編號(hào)：2278643

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會(huì)員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2278643.html

上一篇：基于情感距離和領(lǐng)域自適應(yīng)的評(píng)論者聲譽(yù)度
下一篇：一種緩解互惠推薦系統(tǒng)中數(shù)據(jù)稀疏性的算法

論文發(fā)表

·知網(wǎng)|萬(wàn)方|維普|龍?jiān)磡省級(jí)|國(guó)家級(jí)|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

日漢數(shù)字時(shí)間表達(dá)式的識(shí)別與翻譯研究