天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁 > 碩博論文 > 信息類碩士論文 >

中文開放式多元實(shí)體關(guān)系抽取

發(fā)布時(shí)間:2017-12-31 20:23

  本文關(guān)鍵詞:中文開放式多元實(shí)體關(guān)系抽取 出處:《太原理工大學(xué)》2017年碩士論文 論文類型:學(xué)位論文


  更多相關(guān)文章: 開放式信息抽取 實(shí)體關(guān)系抽取 機(jī)器學(xué)習(xí) 邏輯回歸分類器 支持向量機(jī)


【摘要】:信息抽取是指從文本中抽取指定類型的實(shí)體詞、關(guān)系詞、時(shí)間、地點(diǎn)、事件等多層次的語義信息,并將這些信息轉(zhuǎn)化成結(jié)構(gòu)化格式進(jìn)行輸出。隨著網(wǎng)絡(luò)信息的指數(shù)型增長,加之在今天人工智能的快速發(fā)展,信息抽取逐漸成了熱門研究領(lǐng)域。而實(shí)體關(guān)系抽取是信息抽取的一個(gè)重要環(huán)節(jié),同時(shí)也是一個(gè)重要任務(wù),實(shí)體關(guān)系抽取的主要內(nèi)容是抽取文本中的實(shí)體關(guān)系類型和實(shí)體關(guān)系值。實(shí)體關(guān)系抽取對(duì)于知識(shí)圖譜構(gòu)建和領(lǐng)域本體、問答系統(tǒng)、文本相似度計(jì)算以及語義理解和文本摘要提取等更深層次的自然語言處理問題都具有重要的理論和實(shí)踐意義。實(shí)體關(guān)系抽取的研究包括傳統(tǒng)式實(shí)體關(guān)系抽取和開放式實(shí)體關(guān)系抽取。其中,傳統(tǒng)實(shí)體關(guān)系抽取主要面向限定領(lǐng)域文本、限定類別實(shí)體和關(guān)系的抽取,需要針對(duì)某一限定領(lǐng)域建立語言模型進(jìn)行抽取。然而隨著互聯(lián)網(wǎng)信息的指數(shù)型增長和互聯(lián)網(wǎng)信息所具有的跨領(lǐng)域特性,使得傳統(tǒng)式實(shí)體關(guān)系抽取無法滿足網(wǎng)絡(luò)文本抽取的需求。從而,開放式信息抽取成為了信息抽取的一個(gè)重要研究領(lǐng)域,它的主要任務(wù)是從大規(guī)模異構(gòu)、跨領(lǐng)域文本中抽取實(shí)體、關(guān)系、事件等多層次語義信息,并且以結(jié)構(gòu)化格式輸出,使得可以跨領(lǐng)域地、大規(guī)模地對(duì)網(wǎng)絡(luò)文本進(jìn)行處理。針對(duì)英文文本的開放式實(shí)體關(guān)系抽取主要分為兩個(gè)階段:先對(duì)實(shí)體詞進(jìn)行抽取的階段和先對(duì)關(guān)系詞進(jìn)行抽取的階段。在針對(duì)中文文本實(shí)體關(guān)系抽取方面的研究主要集中在二元關(guān)系抽取以及使用淺層語義特征進(jìn)行抽取的方法。因此本文提出了基于依存關(guān)系分析的針對(duì)中文文本的開放式實(shí)體關(guān)系抽取方法,該方法可以用于抽取多元關(guān)系,并且加入了深層語義特征使得抽取的準(zhǔn)確性得到了提供。本文在上述方法的基礎(chǔ)上設(shè)計(jì)并實(shí)現(xiàn)了抽取系統(tǒng)。本文提出了面對(duì)大規(guī)模、異構(gòu)中文網(wǎng)絡(luò)文本的基于依存關(guān)系的開放式信息抽取方法,首先對(duì)網(wǎng)絡(luò)文本進(jìn)行預(yù)處理,包括網(wǎng)頁正文文本抽取、中文分詞、中文詞性標(biāo)注和依存關(guān)系分析,然后使用啟發(fā)式規(guī)則進(jìn)行基本名詞短語識(shí)別并通過基于詞間依存關(guān)系的啟發(fā)式規(guī)則獲取候選實(shí)體關(guān)系多元組,接著通過經(jīng)過訓(xùn)練的機(jī)器學(xué)習(xí)分類器對(duì)候選實(shí)體關(guān)系多元組進(jìn)行過濾得到最終的實(shí)體關(guān)系多元組,最后將過濾得到的實(shí)體關(guān)系組進(jìn)行標(biāo)準(zhǔn)化過程后保存在數(shù)據(jù)庫中。抽取出的大規(guī)模的實(shí)體關(guān)系組也可以用于其他的自然語言處理方面的任務(wù)。本文使用語言技術(shù)平臺(tái)云(Language Technology Platform-Cloud,LTP-Cloud)進(jìn)行文本預(yù)處理,定義了一系列基本名詞短語的詞性組合規(guī)則和一系列基于依存關(guān)系的抽取實(shí)體關(guān)系多元組的規(guī)則。在過濾階段,以詞個(gè)數(shù)、詞性、詞間距離等方面為特征訓(xùn)練得到機(jī)器學(xué)習(xí)分類器,對(duì)候選關(guān)系組進(jìn)行一個(gè)正確與否的判斷與過濾。在對(duì)測試語料抽取實(shí)驗(yàn)中,得到81.25%的準(zhǔn)確性。最后,使用了本文提出的抽取方法搭建了中文開放式多元實(shí)體關(guān)系抽取系統(tǒng),并抽取出了大量的實(shí)體關(guān)系組。
[Abstract]:Information extraction refers to the extraction from the specified text types of solid words, words, time, place, events and other multi-level semantic information, and these information into a structured format output. With the exponential growth of network information, coupled with the rapid development of today, artificial intelligence, information extraction has become a hot research the field and entity relation extraction is an important part of information extraction, and also an important task, the main content of entity relation extraction is selected in the text type and entity relationship entity relationship value. Entity relation extraction for knowledge mapping and domain ontology, question answering system, has important theoretical and practical significance of Natural Language Processing the deeper problem of text similarity computing and semantic comprehension and text summarization extraction. Research of entity relation extraction including traditional entity relation extraction Take and open entity relation extraction. Among them, the traditional entity relation extraction for domain specific text, limited categories of entity and relation extraction, need for a restricted domain language model based extraction. However, cross domain characteristics with the exponential growth of Internet information and Internet information. It makes the traditional entity relationship extraction can not meet the demand. So the network text extraction, open information extraction has become an important research field of information extraction, it is the main task of the large-scale heterogeneous, entity extraction, cross domain text between the events of multi-level semantic information, and output in a structured format, enables cross domain, for on a large scale. The network text open to English text entity relation extraction is mainly divided into two stages: the first stage extraction on the real words And the first to extract Related words in text. Chinese entity relation extraction research mainly concentrated in the two yuan relation extraction method and using the shallow semantic features extraction. This paper proposes an open entity relation extraction method for Chinese text dependency relation based on the analysis, this method can be used to extract multiple the relationship between, and joined the deep semantic feature makes the accuracy of the extraction is offered. This paper designs and implements the extraction system on the basis of the above methods is put forward in this paper. In the face of massive, heterogeneous network Chinese text open information extraction method based on the dependency relation, the network text pretreatment, including Web Text extraction Chinese, word segmentation, POS tagging and dependency relation analysis Chinese, then use heuristic rules for base noun phrase identification and The heuristic rules based on the dependency relation between words acquisition candidate entity relation between multiple groups, followed by trained machine learning classifier to filter candidate entity between multiple groups to obtain the final entity relation between multiple groups, the group entity relationship by filtering in the standardization process after stored in the database. A large group of entity relationship the extract can also be used for Natural Language Processing other tasks. In this paper, the use of language technology platform (Language Technology Platform-Cloud, LTP-Cloud cloud) for text preprocessing, defines a series of basic noun phrase combination rule based on part of speech and a series of multiple entity relation extraction group rule dependency relation. In the filtering stage, in a word the number of POS, distance etc. between words by machine learning classifier for feature training, a group of candidate relations In the test corpus extraction experiment, we get 81.25% accuracy. Finally, we use the extraction method proposed in this paper to build an open multi entity relationship extraction system in China, and extract a large number of entity relationship groups.

【學(xué)位授予單位】:太原理工大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:TP391.1

【參考文獻(xiàn)】

相關(guān)期刊論文 前4條

1 秦兵;劉安安;劉挺;;無指導(dǎo)的中文開放式實(shí)體關(guān)系抽取[J];計(jì)算機(jī)研究與發(fā)展;2015年05期

2 趙軍;劉康;周光有;蔡黎;;開放式文本信息抽取[J];中文信息學(xué)報(bào);2011年06期

3 奉國和;鄭偉;;國內(nèi)中文自動(dòng)分詞技術(shù)研究綜述[J];圖書情報(bào)工作;2011年02期

4 周宏宇;張政;;中文分詞技術(shù)綜述[J];安陽師范學(xué)院學(xué)報(bào);2010年02期

相關(guān)博士學(xué)位論文 前1條

1 張奇;信息抽取中實(shí)體關(guān)系識(shí)別研究[D];中國科學(xué)技術(shù)大學(xué);2010年

,

本文編號(hào):1361332

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/shoufeilunwen/xixikjs/1361332.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶e5793***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com
日本欧美一区二区三区在线播| 国产又黄又爽又粗视频在线| 国产亚洲精品久久99| 国产内射一级二级三级| 亚洲男人天堂成人在线视频| 日韩精品成区中文字幕| 亚洲精品一区二区三区免| 国产美女精品午夜福利视频| 国产三级欧美三级日韩三级| 日韩欧美一区二区黄色| 成人精品一级特黄大片| 在线欧美精品二区三区| 特黄大片性高水多欧美一级| 99久久婷婷国产亚洲综合精品| 中文字幕91在线观看| 草草夜色精品国产噜噜竹菊| 日韩一区二区三区有码| 国产熟女一区二区精品视频| 夫妻性生活真人动作视频| 国产午夜精品久久福利| 国产欧美日韩在线一区二区| 东京热男人的天堂社区| 自拍偷拍一区二区三区| 国产一区二区三区免费福利| 中文字幕熟女人妻视频| 久久精品偷拍视频观看| 亚洲最新的黄色录像在线| 亚洲天堂精品在线视频| 日本午夜精品视频在线观看| 麻豆看片麻豆免费视频| 玩弄人妻少妇一区二区桃花| 国产精品一区二区丝袜| 午夜国产精品国自产拍av| 成人精品亚洲欧美日韩| 人妻乱近亲奸中文字幕| 午夜福利视频日本一区| 欧美丝袜诱惑一区二区| 欧美午夜不卡在线观看| 亚洲熟女诱惑一区二区| 久久精品国产亚洲av麻豆| 精品香蕉一区二区在线|