天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁 > 科技論文 > 軟件論文 >

面向中文新聞文本的實(shí)體關(guān)系抽取研究

發(fā)布時間:2018-07-31 17:15
【摘要】:隨著互聯(lián)網(wǎng)技術(shù)的快速發(fā)展,互聯(lián)網(wǎng)上的文本信息呈現(xiàn)飛速增長。如何從海量文本中快速準(zhǔn)確地抽取人們需要的知識正在成為研究的熱點(diǎn)。其中,從文本中自動抽取實(shí)體關(guān)系的研究顯得尤為重要。目前,實(shí)體關(guān)系抽取研究主要集中在英文文本語料,同時主要使用傳統(tǒng)機(jī)器學(xué)習(xí)算法。此外,目前研究較少考慮到大量無關(guān)系樣本存在對關(guān)系分類的影響。為此,本文的工作集中在面向中文新聞文本、主要基于深度學(xué)習(xí)方法的實(shí)體關(guān)系分類。為減少無關(guān)系樣本的影響,本文將實(shí)體關(guān)系抽取任務(wù)劃分有無實(shí)體關(guān)系判別和實(shí)體關(guān)系分類兩個子任務(wù),分別展開研究。在有無實(shí)體關(guān)系判別子任務(wù)中,本文設(shè)計實(shí)現(xiàn)了一種結(jié)合詞袋模型和邏輯回歸算法的判別方法。針對這種方法存在的特征空間維度較大、算法運(yùn)行時間過長的問題,進(jìn)一步設(shè)計實(shí)現(xiàn)了一種基于卷積神經(jīng)網(wǎng)絡(luò)模型的判別方法。通過應(yīng)用在搜狐新聞數(shù)據(jù)預(yù)訓(xùn)練得到的詞向量,結(jié)合對ACE2005中文文本實(shí)體關(guān)系抽取數(shù)據(jù)集分詞后得到的詞語進(jìn)行向量映射作為卷積神經(jīng)網(wǎng)絡(luò)輸入,應(yīng)用于有無實(shí)體關(guān)系判別。在ACE2005中文文本實(shí)體關(guān)系抽取數(shù)據(jù)集上的實(shí)驗(yàn)結(jié)果顯示該方法獲得更好的判別性能,F值達(dá)到了81.78%。在實(shí)體關(guān)系分類子任務(wù)中,本文提出了一種基于Bi-directional Long-Short Term Memory(BLSTM)模型結(jié)合特征融合的實(shí)體關(guān)系分類方法。首先對語料預(yù)訓(xùn)練得到詞向量,并提取實(shí)體類型、實(shí)體長度、實(shí)體相對位置等實(shí)體相關(guān)特征。通過對語料庫文本中實(shí)體類型及其上下文與關(guān)系類型的聯(lián)系進(jìn)行分析,構(gòu)建自定義的規(guī)則庫。最終,融合詞向量、實(shí)體相關(guān)特征和自定義規(guī)則庫作為BLSTM模型的輸入,構(gòu)建分類器。在ACE2005數(shù)據(jù)集上實(shí)驗(yàn)顯示該方法的關(guān)系分類F值達(dá)到了91.74%,顯示了本文工作對中文新聞文本實(shí)體關(guān)系分類的有效性。
[Abstract]:With the rapid development of Internet technology, text information on the Internet is growing rapidly. How to extract the knowledge that people need quickly and accurately from the massive text is becoming a hot topic. Among them, the research of extracting entity relation automatically from text is particularly important. At present, the research of entity relation extraction mainly focuses on the English text corpus, and mainly uses the traditional machine learning algorithm. In addition, few studies have taken into account the influence of a large number of unrelated samples on relational classification. Therefore, the work of this paper is focused on the Chinese news text, mainly based on the in-depth learning method of entity relationship classification. In order to reduce the influence of independent samples, this paper divides the entity relation extraction task into two sub-tasks: entity relation discrimination and entity relation classification. In this paper, we design and implement a judgment method which combines the word bag model with the logical regression algorithm. In order to solve the problem that the feature space dimension is large and the algorithm running time is too long, a discriminant method based on convolution neural network model is designed and implemented. By applying the word vector pre-trained in the Sohu news data and combining the vector mapping of the words extracted from the entity relation of ACE2005 Chinese text data set as the convolutional neural network input, it is applied to judge whether the entity relation exists or not. The experimental results on the data set of ACE2005 Chinese text entity relation extraction show that the proposed method achieves better discriminant performance and F value reaches 81.78. In the subtask of entity relationship classification, this paper proposes a method of entity relationship classification based on Bi-directional Long-Short Term Memory (BLSTM) model and feature fusion. First, the word vector is obtained by pre-training the corpus, and the entity correlation features such as entity type, entity length and entity relative position are extracted. Based on the analysis of the relation between the entity type and the relation between the context and the relational type in the corpus text, a custom rule library is constructed. Finally, the classifier is constructed by combining word vector, entity correlation feature and custom rule base as input of BLSTM model. Experiments on the ACE2005 dataset show that the F value of this method is 91.74, which shows the effectiveness of this work for the classification of entity relations in Chinese news texts.
【學(xué)位授予單位】:哈爾濱工業(yè)大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:TP391.1

【參考文獻(xiàn)】

相關(guān)期刊論文 前5條

1 劉丹丹;彭成;錢龍華;周國棟;;《同義詞詞林》在中文實(shí)體關(guān)系抽取中的作用[J];中文信息學(xué)報;2014年02期

2 陳宇;鄭德權(quán);趙鐵軍;;基于Deep Belief Nets的中文名實(shí)體關(guān)系抽取[J];軟件學(xué)報;2012年10期

3 董靜;孫樂;馮元勇;黃瑞紅;;中文實(shí)體關(guān)系抽取中的特征選擇研究[J];中文信息學(xué)報;2007年04期

4 車萬翔,劉挺,李生;實(shí)體關(guān)系自動抽取[J];中文信息學(xué)報;2005年02期

5 姜吉發(fā),王樹西;一種自舉的二元關(guān)系和二元關(guān)系模式獲取方法[J];中文信息學(xué)報;2005年02期

相關(guān)碩士學(xué)位論文 前1條

1 王莉峰;領(lǐng)域自適應(yīng)的中文實(shí)體關(guān)系抽取研究[D];哈爾濱工業(yè)大學(xué);2011年



本文編號:2156286

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2156286.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶b953c***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com