基于語義的文本事件信息抽取方法的研究與實現(xiàn)
發(fā)布時間:2018-05-14 21:21
本文選題:事件抽取 + 語義處理; 參考:《上海交通大學》2012年碩士論文
【摘要】:事件抽取和追蹤是自然語言處理領域一個非常重要的研究方向,如何準確而高效地從大量繁雜無序的信息中提取到感興趣的事件信息,一直是事件抽取研究領域的關鍵問題。 一般而言,事件抽取就是從非結(jié)構化文檔中抽取出用戶感興趣的事件,同時用結(jié)構化形式描述,供用戶查詢和進一步追蹤分析等。事件抽取的研究對象會選取某一個固定領域或者新聞文本,這樣更符合用戶對于事件抽取的想象。并且事件抽取的形式也比較固定和單一,一般會采取基于模板匹配提取結(jié)構化文本或分析文本段落等進行分類的方法。 本課題基于時空元素語義搜索引擎的研究背景,提出了一種基于語義的文本事件信息抽取方法,創(chuàng)新地通過應用多方面語義知識和統(tǒng)計方法,強調(diào)時、空元素對于事件追蹤的定位功能,進行信息抽取和歸并,最終實現(xiàn)對文本中事件的描述。 該課題的處理文本類型多樣,結(jié)構與行文風格復雜,如果采用傳統(tǒng)的方法達不到理想的結(jié)果。而在實際應用中,這種情況非常常見。本文目標明確,方法有效且不繁瑣,結(jié)合語義知識和統(tǒng)計學習,對處理復雜語料和大規(guī)模數(shù)據(jù)有著非常明顯的優(yōu)勢。 另外,在本文中涉及到多方面自然語言處理的相關概念和算法研究,可以說,通過本課題對自然語言處理的研究,尤其是對信息抽取的研究有了深刻的認識與感悟。
[Abstract]:Event extraction and tracking is a very important research field in the field of natural language processing. How to accurately and efficiently extract the event information from a large number of complex and disordered information has been a key issue in the field of event extraction. In general, event extraction is to extract events of interest to users from unstructured documents, and describe them in structured form for users to query and further trace and analyze. The research object of event extraction will select a fixed field or news text, which is more in line with the user's imagination of event extraction. And the form of event extraction is also fixed and single. Generally, the method of extracting structured text or analyzing text paragraphs based on template matching is used for classification. Based on the research background of Spatio-temporal element semantic search engine, this paper proposes a semantic-based text event information extraction method, which emphasizes time by applying various semantic knowledge and statistical methods. The empty element can extract and merge the information for the locating function of event tracing, and finally realize the description of the event in the text. There are various types of text and complex structure and style of writing. If the traditional method is adopted, the ideal results can not be achieved. In practical applications, this situation is very common. The purpose of this paper is clear, the method is effective and not tedious, and combining semantic knowledge and statistical learning, it has a very obvious advantage in dealing with complex corpus and large-scale data. In addition, this paper involves a variety of natural language processing related concepts and algorithms, we can say that through this topic of natural language processing research, especially the study of information extraction has a profound understanding and understanding.
【學位授予單位】:上海交通大學
【學位級別】:碩士
【學位授予年份】:2012
【分類號】:TP391.1
【引證文獻】
相關碩士學位論文 前1條
1 幸小然;基于本體的電影院NFC智能應用系統(tǒng)的設計與實現(xiàn)[D];電子科技大學;2013年
,本文編號:1889463
本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/1889463.html
最近更新
教材專著