天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當前位置:主頁 > 科技論文 > 軟件論文 >

基于句法結(jié)構(gòu)的術(shù)語關(guān)系抽取方法研究

發(fā)布時間:2018-07-13 09:14
【摘要】:目前,互聯(lián)網(wǎng)上的數(shù)據(jù)正在以指數(shù)的方式迅速增長,將互聯(lián)網(wǎng)上內(nèi)容豐富、形式多樣的海量數(shù)據(jù)轉(zhuǎn)化為知識并將其有效地存儲和表示具有極其重要的意義。同時,伴隨著自然語言處理技術(shù)的不斷發(fā)展和成熟,從Web開放領(lǐng)域文本中抽取出有用的信息并以此構(gòu)建知識圖譜也成為可能。術(shù)語是在特定科學領(lǐng)域中使用的、相對固定的詞或短語,可以用來正確標記各個專門領(lǐng)域中的事物、現(xiàn)象、特性、關(guān)系和過程,是科學研究和知識交流的有力工具。術(shù)語關(guān)系體現(xiàn)并表示了一個領(lǐng)域的核心知識,對理解學習領(lǐng)域知識、預(yù)測未來趨勢具有重要的理論和現(xiàn)實意義。另外,術(shù)語關(guān)系也可以廣泛應(yīng)用到信息檢索、自動問答系統(tǒng)、知識圖譜構(gòu)建等領(lǐng)域。然而,人工從大規(guī)模語料中抽取術(shù)語關(guān)系費時費力。因此自動或半自動抽取術(shù)語關(guān)系成為研究的熱點。本文針對開放領(lǐng)域術(shù)語關(guān)系的獲取進行了研究和探討,提出了基于句法結(jié)構(gòu)的術(shù)語關(guān)系抽取方法,并在此基礎(chǔ)上構(gòu)建醫(yī)療領(lǐng)域知識圖譜,本文的主要貢獻總結(jié)如下:(1)提出了高精度自舉術(shù)語模板獲取方法,在利用模板進行關(guān)系抽取的過程中,關(guān)系模板的質(zhì)量直接影響著抽取結(jié)果的質(zhì)量。我們充分利用Web數(shù)據(jù)的多樣性進行自舉迭代,將小規(guī)模的術(shù)語種子集擴展為大規(guī)模的術(shù)語關(guān)系庫。并利用深度學習工具word2vec訓(xùn)練詞向量并進行語義相似度計算,根據(jù)相似度排序,選擇相似度最高的術(shù)語關(guān)系作為新的種子,其在一定程度上避免了傳統(tǒng)自舉方法中的語義漂移問題。(2)提出基于依存句法結(jié)構(gòu)的術(shù)語關(guān)系抽取方法。該方法借助依存句法分析和語義角色標注技術(shù),對語句的句法依存樹進行最小子樹裁剪,提取以動詞為中心的具有語義依存關(guān)系的句子主干,使之既能涵蓋術(shù)語關(guān)系的關(guān)鍵信息,又能減少依存句法分析錯誤所帶來的噪音。通過對模板進行泛化,根據(jù)核心動詞結(jié)合文本篇章分析對關(guān)系類別進行標注,并利用數(shù)據(jù)庫進行結(jié)構(gòu)化存儲,實現(xiàn)快速查詢。實驗表明,基于句法結(jié)構(gòu)的關(guān)系抽取方法能有效的利用結(jié)構(gòu)化特征捕捉到術(shù)語語義關(guān)系。(3)提出多類型術(shù)語關(guān)系的知識圖譜構(gòu)建方法,知識圖譜可以用結(jié)構(gòu)化的形式描述客觀世界的概念、實體、事件及其之間的關(guān)系,將信息轉(zhuǎn)換成人類認知世界的形式。本文針對醫(yī)療知識圖譜的特例研究,通過有效的知識整合解決了醫(yī)療數(shù)據(jù)中知識分散、異構(gòu)、冗余和碎片化的問題。為機器進一步理解自然語言提供技術(shù)支持。為驗證所提出方法的有效性,構(gòu)建醫(yī)療領(lǐng)域知識圖譜實例。實驗結(jié)果表明,本文所提出的基于句法結(jié)構(gòu)的術(shù)語關(guān)系抽取方法具有較高的實用性,實現(xiàn)了術(shù)語關(guān)系抽取、知識圖譜構(gòu)建過程中一定程度的自動化。
[Abstract]:At present, the data on the Internet is growing exponentially. It is of great significance to convert the mass data rich in content and various forms into knowledge and store and express them effectively. At the same time, with the development and maturity of natural language processing technology, it is possible to extract useful information from Web open domain text and construct knowledge map. Terms are relatively fixed words or phrases used in specific scientific fields, which can be used to correctly mark things, phenomena, characteristics, relationships and processes in various specialized fields. They are powerful tools for scientific research and knowledge exchange. The term relation embodies and represents the core knowledge of a domain, which has important theoretical and practical significance for understanding the knowledge of learning domain and predicting the future trend. In addition, terminology relationships can also be widely used in information retrieval, automatic question and answer system, knowledge map construction and so on. However, it takes time and effort to extract terms from large-scale corpus. Therefore, automatic or semi-automatic extraction of terminology relations has become a hot topic. In this paper, the acquisition of open domain terminology relationship is studied and discussed, and a syntactic structure based extraction method of term relation is proposed, and then the medical domain knowledge map is constructed. The main contributions of this paper are summarized as follows: (1) A high precision bootstrap terminology template acquisition method is proposed. In the process of relational extraction using templates, the quality of relational templates directly affects the quality of extraction results. We make full use of the diversity of Web data to carry out bootstrap iterations and extend the small-scale term seed set to a large term relational database. Word2vec is used to train the word vector and calculate the semantic similarity. According to the similarity ranking, the term relationship with the highest similarity is chosen as the new seed. To some extent, it avoids the semantic drift in the traditional bootstrap method. (2) A term relation extraction method based on dependency syntactic structure is proposed. With the help of dependency syntactic analysis and semantic role tagging techniques, the sentence syntax dependency tree is clipped to extract the sentence trunk with semantic dependency relation centered on verbs. It can not only cover the key information of terminology relations, but also reduce the noise caused by paraphrase errors. By generalizing the template, the relation category is annotated according to the core verb and text analysis, and the database is used for structured storage to realize the fast query. The experimental results show that the relation extraction method based on syntactic structure can effectively capture the semantic relationship of terms by using structured features. (3) A method of constructing knowledge atlas of multi-type term relationships is proposed. Knowledge maps can describe the concepts, entities, events and their relationships of the objective world in a structured form, and transform information into the form of human cognition of the world. Based on the special case study of medical knowledge map, the problems of knowledge dispersion, heterogeneity, redundancy and fragmentation in medical data are solved by effective knowledge integration in this paper. Provide technical support for machine understanding of natural language. In order to verify the effectiveness of the proposed method, an example of knowledge map in medical field was constructed. The experimental results show that the method proposed in this paper based on syntactic structure is of high practicability and realizes term relation extraction and automation in the process of constructing knowledge atlas.
【學位授予單位】:北京交通大學
【學位級別】:碩士
【學位授予年份】:2017
【分類號】:TP391.1

【參考文獻】

相關(guān)期刊論文 前6條

1 吳e,

本文編號:2118878


資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2118878.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶0c525***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com