日漢數(shù)字時(shí)間表達(dá)式的識(shí)別與翻譯研究
[Abstract]:Named entity recognition and translation are important basic tasks in natural language processing. As a special named entity, digital time expression contains key information, and its recognition and translation have important theoretical significance and practical value. Recognition and analysis of digital time expressions are the important foundation of natural language processing tasks such as information retrieval, event extraction, event detection and tracking, and question and answer system. Especially in multilingual processing tasks such as machine translation, the alignment of digital time expressions and their translation quality are also important factors affecting the performance of machine translation systems. The research of digital time expression recognition and translation is of great significance to improve the performance of machine translation system and promote the rapid development of artificial intelligence. Based on the characteristics of Japanese and Chinese bilingual digital time expressions, this paper combines linguistic knowledge with statistical methods, and through a large number of data analysis and experiments, The recognition and translation methods of Japanese and Chinese bilingual digital time expressions are deeply studied and applied to machine translation systems. The main research work of this paper is as follows: (1) based on the latest TIMEX3 time labeling specification and the general numerical classification method, combined with the isomorphism and heterogeneity of Japanese and Chinese language knowledge, the trigger words are established for Japanese and Chinese digital time expressions, respectively. The knowledge base of keywords such as boundary words, and the words expressing the meaning of "approximate number" are included in the recognition range of digital time expression, which makes digital time expression have richer meaning. Then the digital time expression is recognized by regular matching. Finally, the recognition of Japanese and Chinese digital time expressions is realized by combining the above rule-based and statistical recognition methods. The experimental results show that the method has a good performance in both Japanese and Chinese. (2) the bilingual digital time expression alignment is incorporated into the traditional word alignment method. A bidirectional alignment algorithm of digital time expressions based on position constraint and similarity measure is proposed. The experimental results show that the algorithm can effectively improve the performance of bilingual word alignment. The auxiliary machine translation system trains to generate a better translation model. (3) according to the translation characteristics of Japanese and Chinese digital time expressions, a translation rule base of digital time expressions is established, which is used for the independent translation of digital time expressions. The recognition and alignment information of bilingual digital time expressions and translation rules are effectively integrated into the existing statistical machine translation system to improve the accuracy of translation of digital time expressions and their adjacent words in machine translation. Thus, the overall translation effect can be improved and verified by experiments. To sum up, the innovative work of this paper is mainly reflected in: according to the characteristics of Japanese and Chinese digital time expressions, the recognition and translation rules of time words are designed based on TIMEX3 annotation, and the "estimate" words are brought into the recognition scope of digital time expressions; A bidirectional alignment algorithm for digital time expressions based on position constraints and similarity measures is proposed, and a translation rule base of Japanese and Chinese digital time expressions is established. Finally, these three aspects are applied to the machine translation system, and the experimental results show that it improves the overall performance of the machine translation system effectively.
【學(xué)位授予單位】:北京交通大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:TP391.1
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 夏光輝;李軍蓮;阮學(xué)平;;基于實(shí)體詞典與機(jī)器學(xué)習(xí)的基因命名實(shí)體識(shí)別[J];醫(yī)學(xué)信息學(xué)雜志;2015年12期
2 楊萍;侯宏旭;蔣玉鵬;申志鵬;杜健;;基于雙語(yǔ)對(duì)齊的漢語(yǔ) 新蒙古文命名實(shí)體翻譯[J];北京大學(xué)學(xué)報(bào)(自然科學(xué)版);2016年01期
3 王東明;徐金安;陳鈺楓;張玉潔;;基于單語(yǔ)語(yǔ)料的面向日語(yǔ)假名的日漢人名翻譯對(duì)抽取方法[J];中文信息學(xué)報(bào);2015年05期
4 尹存燕;黃書劍;戴新宇;陳家駿;;中英命名實(shí)體識(shí)別及對(duì)齊中的中文分詞優(yōu)化[J];電子學(xué)報(bào);2015年08期
5 劉勝奇;朱東華;;基于多策略融合Giza++的術(shù)語(yǔ)對(duì)齊法[J];軟件學(xué)報(bào);2015年07期
6 尹存燕;黃書劍;戴新宇;陳家駿;;面向新聞?wù)Z料的中日命名實(shí)體翻譯抽取[J];小型微型計(jì)算機(jī)系統(tǒng);2015年06期
7 趙紫玉;徐金安;張玉潔;劉江鳴;;規(guī)則與統(tǒng)計(jì)相結(jié)合的日語(yǔ)時(shí)間表達(dá)式識(shí)別[J];中文信息學(xué)報(bào);2013年06期
8 徐紅艷;黨曉婉;馮勇;李軍平;;基于BP神經(jīng)網(wǎng)絡(luò)的Deep Web實(shí)體識(shí)別方法[J];計(jì)算機(jī)應(yīng)用;2013年03期
9 李君嬋;譚紅葉;王風(fēng)娥;;中文時(shí)間表達(dá)式及類型識(shí)別[J];計(jì)算機(jī)科學(xué);2012年S3期
10 陳鈺楓;宗成慶;蘇克毅;;漢英雙語(yǔ)命名實(shí)體識(shí)別與對(duì)齊的交互式方法[J];計(jì)算機(jī)學(xué)報(bào);2011年09期
相關(guān)碩士學(xué)位論文 前1條
1 鄔桐;中文時(shí)間表達(dá)式識(shí)別研究[D];復(fù)旦大學(xué);2010年
,本文編號(hào):2278643
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2278643.html