天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁 > 科技論文 > 搜索引擎論文 >

融合泰語特征的句子級(jí)實(shí)體關(guān)系抽取研究

發(fā)布時(shí)間:2018-05-15 10:56

  本文選題:泰語句子切分 + 命名實(shí)體識(shí)別。 參考:《昆明理工大學(xué)》2017年碩士論文


【摘要】:泰語句子的實(shí)體關(guān)系抽取研究是泰語自然語言處理的重要內(nèi)容,其性能對(duì)事件抽取、知識(shí)庫(kù)構(gòu)建和搜索引擎等上層應(yīng)用研究有著直接影響。然而泰語構(gòu)詞復(fù)雜,語氣詞使用頻繁,不習(xí)慣書寫標(biāo)點(diǎn)符號(hào)造成泰語句子邊界模糊等語言特點(diǎn)都增加了泰語信息智能處理的難度。本文結(jié)合泰語語言特征和統(tǒng)計(jì)機(jī)器學(xué)習(xí)模型,針對(duì)泰語句子切分、泰語句子命名實(shí)體識(shí)別和泰語句子從屬實(shí)體關(guān)系抽取進(jìn)行了研究探討。取得了如下三個(gè)方面的研究成果。(1)在泰語文本信息中,通常書寫的泰語句子之間僅以簡(jiǎn)單的空格符在句子末尾作為句子分界符,并且泰語中也存在大量的非句末空格符,所以使得泰語句子邊界模糊。本文首先分析歸納了一些與泰語句子邊界相關(guān)的實(shí)用語法規(guī)則,然后使用統(tǒng)計(jì)機(jī)器學(xué)習(xí)中的最大熵分類算法,將關(guān)于泰語句子切分的任務(wù)轉(zhuǎn)換為對(duì)泰語文本中空格符的分類問題。結(jié)合泰語文本中空格符的上下文特征來訓(xùn)練最大熵分類模型,從而對(duì)泰語信息中的空格符進(jìn)行類別分類。最后在使用構(gòu)建的相關(guān)語法規(guī)則庫(kù)來對(duì)最大熵分類模型的空格符分類結(jié)果進(jìn)行校正。本文的方法相對(duì)于只使用泰語語法規(guī)則的方法,簡(jiǎn)化了大量復(fù)雜泰語語法知識(shí)的規(guī)則構(gòu)建工作,僅針對(duì)與泰語句子邊界識(shí)別相關(guān)的主要知識(shí)構(gòu)建了語法規(guī)則,并且通過最大熵分類模型更好的利用了在泰語輸入語塊或段落文本中空格符的上下文特征,從而在泰語句子切分任務(wù)中獲得了較好的效果,并且性能穩(wěn)定,為泰語句子的命名實(shí)體識(shí)別任務(wù)奠定了基礎(chǔ)。(2)將泰語句子命名實(shí)體識(shí)別任務(wù)轉(zhuǎn)化為對(duì)泰語句子中的詞匯序列進(jìn)行標(biāo)記的任務(wù)。本文利用泰語句子中詞匯的上下文語言特征,分別使用隱馬爾科夫模型和條件隨機(jī)場(chǎng)模型在泰語實(shí)體識(shí)別訓(xùn)練語料上進(jìn)行了模型構(gòu)建,并且分別使用所構(gòu)建的序列標(biāo)注模型在泰語測(cè)試語料上進(jìn)行了實(shí)驗(yàn)驗(yàn)證。最終的實(shí)驗(yàn)結(jié)果也驗(yàn)證了本文使用序列標(biāo)注方法在泰語命名實(shí)體識(shí)別任務(wù)中的有效性,并且為泰語句子的實(shí)體關(guān)系抽取研究奠定了基礎(chǔ)。(3)在泰語句子命名實(shí)體識(shí)別的基礎(chǔ)上,將泰語句子從屬實(shí)體關(guān)系抽取任務(wù)轉(zhuǎn)化為對(duì)泰語句子中的實(shí)體關(guān)系三元組的分類問題。本文首先在缺少泰語從屬實(shí)體關(guān)系語料的情況下,利用句子對(duì)齊的漢泰平行句對(duì)和漢泰詞典構(gòu)建泰語實(shí)體關(guān)系語料庫(kù)。然后使用泰語實(shí)體詞匯周圍的上下文特征訓(xùn)練最大熵分類模型,對(duì)泰語句子中候選實(shí)體關(guān)系三元組的從屬實(shí)體關(guān)系類型進(jìn)行識(shí)別,從而實(shí)現(xiàn)泰語句子中的從屬實(shí)體關(guān)系抽取。最后通過實(shí)驗(yàn)驗(yàn)證了本文方法在針對(duì)泰語句子中從屬實(shí)體關(guān)系進(jìn)行抽取時(shí)的有效性。
[Abstract]:The research on entity relation extraction of Thai sentences is an important part of natural language processing in Thai. Its performance has a direct impact on the research of event extraction, knowledge base construction and search engine. However, the complexity of Thai word-formation, the frequent use of modal words, the unaccustomed writing of punctuation marks, and the blurring of the boundaries of Thai sentences all increase the difficulty of intelligent processing of Thai information. Based on the features of Thai language and the statistical machine learning model, this paper discusses Thai sentence segmentation, Thai sentence naming entity recognition and Thai sentence subordinate entity relation extraction. In Thai text information, only simple blanks are used between Thai sentences as sentence delimiters at the end of the sentence, and there are a large number of non-sentence end blanks in Thai. Therefore, the boundary of Thai sentences is blurred. This paper first analyzes and induces some practical grammar rules related to the boundary of Thai sentences, and then uses the maximum entropy classification algorithm in statistical machine learning. The task of Thai sentence segmentation is converted to the classification of whitespace in Thai text. The maximum entropy classification model is trained by combining the contextual features of white space characters in Thai text, and the whitespace characters in Thai language information are classified. Finally, the whitespace classification results of the maximum entropy classification model are corrected by using the constructed grammar rules. Compared with only using Thai grammar rules, the method in this paper simplifies the construction of a large number of complex Thai grammar rules, and only constructs grammar rules for the main knowledge related to Thai sentence boundary recognition. And the maximum entropy classification model makes better use of the context features of the blanks in the Thai input chunks or paragraph text, thus obtaining a better effect in the Thai sentence segmentation task, and the performance is stable. It lays the foundation for the task of named entity recognition in Thai sentences.) the task of identifying named entities in Thai sentences is transformed into the task of tagging the lexical sequences in Thai sentences. Based on the contextual features of the words in Thai sentences, this paper uses the hidden Markov model and the conditional random field model to construct the model on the training corpus of Thai entity recognition. And the sequence tagging model is used to test the Thai language test corpus. The final experimental results also verify the effectiveness of the method of sequence tagging in the task of Thai named entity recognition, and lay a foundation for the research of entity relation extraction of Thai sentences based on named entity recognition of Thai sentences. In this paper, the subordinate entity relation extraction task of Thai sentence is transformed into the classification problem of the entity relation triple in Thai sentence. In this paper, in the absence of Thai subordinate entity relation corpus, a corpus of Thai entity relations is constructed by using Chinese-Thai parallel sentence pairs with sentence alignment and Chinese-Thai Dictionary. Then the maximum entropy classification model is trained by using the contextual features around the Thai entity vocabulary to identify the subordinate entity relation types of candidate entity relation triples in Thai sentences, so as to achieve subordinate entity extraction in Thai sentences. Finally, the effectiveness of the proposed method in extracting subordinate entities in Thai sentences is verified by experiments.
【學(xué)位授予單位】:昆明理工大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:TP391.1

【參考文獻(xiàn)】

相關(guān)期刊論文 前10條

1 王紅斌;沈強(qiáng);線巖團(tuán);;融合遷移學(xué)習(xí)的中文命名實(shí)體識(shí)別[J];小型微型計(jì)算機(jī)系統(tǒng);2017年02期

2 李麗雙;何紅磊;劉珊珊;黃德根;;基于詞表示方法的生物醫(yī)學(xué)命名實(shí)體識(shí)別[J];小型微型計(jì)算機(jī)系統(tǒng);2016年02期

3 陳鴻;金培權(quán);岳麗華;胡玉娟;殷鳳梅;;基于上下文特征分類的評(píng)論長(zhǎng)句切分方法[J];計(jì)算機(jī)工程;2015年09期

4 鄒嘉齡;劉春臘;尹國(guó)慶;唐志鵬;;中國(guó)與“一帶一路”沿線國(guó)家貿(mào)易格局及其經(jīng)濟(jì)貢獻(xiàn)[J];地理科學(xué)進(jìn)展;2015年05期

5 陳鵬;郭劍毅;余正濤;嚴(yán)馨;張志坤;高盛祥;;融合領(lǐng)域知識(shí)短語樹核函數(shù)的中文領(lǐng)域?qū)嶓w關(guān)系抽取[J];南京大學(xué)學(xué)報(bào)(自然科學(xué));2015年01期

6 母克東;萬琪;;關(guān)系抽取研究綜述[J];現(xiàn)代計(jì)算機(jī)(專業(yè)版);2015年03期

7 劉紹毓;周杰;李弼程;席耀一;唐浩浩;;基于多分類SVM-KNN的實(shí)體關(guān)系抽取方法[J];數(shù)據(jù)采集與處理;2015年01期

8 何炎祥;羅楚威;胡彬堯;;基于CRF和規(guī)則相結(jié)合的地理命名實(shí)體識(shí)別方法[J];計(jì)算機(jī)應(yīng)用與軟件;2015年01期

9 郭喜躍;何婷婷;胡小華;陳前軍;;基于句法語義特征的中文實(shí)體關(guān)系抽取[J];中文信息學(xué)報(bào);2014年06期

10 栗偉;趙大哲;李博;彭新茗;劉積仁;;CRF與規(guī)則相結(jié)合的醫(yī)學(xué)病歷實(shí)體識(shí)別[J];計(jì)算機(jī)應(yīng)用研究;2015年04期

相關(guān)博士學(xué)位論文 前1條

1 何冬梅;泰語構(gòu)詞研究[D];上海師范大學(xué);2012年

相關(guān)碩士學(xué)位論文 前2條

1 趙世瑜;泰語詞法分析關(guān)鍵技術(shù)研究[D];昆明理工大學(xué);2016年

2 陳暉;半監(jiān)督的命名實(shí)體識(shí)別[D];北京交通大學(xué);2011年

,

本文編號(hào):1892168

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/1892168.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶bd1c9***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com