天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁(yè) > 科技論文 > 搜索引擎論文 >

地名本體實(shí)體與關(guān)系抽取研究

發(fā)布時(shí)間:2018-08-01 13:54
【摘要】:近年來(lái),突發(fā)事件頻頻發(fā)生。應(yīng)急管理的重要性越來(lái)越突出。應(yīng)急管理的過(guò)程中涉及多方面數(shù)據(jù)的融合。如何快速、準(zhǔn)確的提供相關(guān)的數(shù)據(jù)是急需研究的問(wèn)題。隨著互聯(lián)網(wǎng)的發(fā)展,網(wǎng)絡(luò)上的數(shù)據(jù)呈指數(shù)級(jí)增長(zhǎng),這些數(shù)據(jù)中包含了很多應(yīng)急管理需要的信息。地名信息是應(yīng)急信息的核心支撐點(diǎn)。本文進(jìn)行地名本體實(shí)體和關(guān)系抽取研究,抽取地名相關(guān)的實(shí)體和實(shí)體間的關(guān)系,為應(yīng)急數(shù)據(jù)的抽取和語(yǔ)義化奠定核心基礎(chǔ)。 實(shí)體和關(guān)系的抽取屬于自然語(yǔ)言處理中的命名實(shí)體識(shí)別和關(guān)系抽取。目前主流的方法有基于規(guī)則的方法和基于機(jī)器學(xué)習(xí)的方法。本文在抽取的過(guò)程中根據(jù)原始文本中實(shí)體和關(guān)系的特點(diǎn)分別因地制宜地采取了基于規(guī)則和基于機(jī)器學(xué)習(xí)的方法。 由于業(yè)界沒(méi)有建立好的地名領(lǐng)域抽取的語(yǔ)料庫(kù),本文首先建立了地名本體抽取的實(shí)體體系和關(guān)系體系,然后根據(jù)抽取過(guò)程中關(guān)注的特征建立實(shí)體抽取和關(guān)系抽取所需要的語(yǔ)料,詳細(xì)介紹了語(yǔ)料庫(kù)構(gòu)建的過(guò)程。對(duì)地名本體實(shí)體根據(jù)其在原始文本中出現(xiàn)的規(guī)律進(jìn)行了分類(lèi),分別采用基于規(guī)則的方法和利用最大熵進(jìn)行機(jī)器學(xué)習(xí)的方法。首先總結(jié)了四類(lèi)地名本體實(shí)體的抽取規(guī)則,然后對(duì)于其他的幾類(lèi)地名本體實(shí)體,首先對(duì)機(jī)器學(xué)習(xí)過(guò)程中使用的特征進(jìn)行了分析,基于標(biāo)注的語(yǔ)料,利用最大熵進(jìn)行了地名實(shí)體的抽取。對(duì)于關(guān)系的抽取,首先分析了關(guān)系的特點(diǎn),采用基于特征向量的方法,利用SVM進(jìn)行關(guān)系的抽取。根據(jù)語(yǔ)料的特點(diǎn),提出了基于規(guī)則的方法抽取地名本體的關(guān)系。同時(shí),分析了關(guān)系的特點(diǎn),制定了相關(guān)的規(guī)則,從已有的關(guān)系出發(fā),推導(dǎo)出隱含的關(guān)系,進(jìn)一步豐富地名本體關(guān)系庫(kù)。 最后,設(shè)計(jì)和實(shí)現(xiàn)了地名本體實(shí)體和關(guān)系抽取平臺(tái),并將抽取的數(shù)據(jù)應(yīng)用到了實(shí)際的語(yǔ)義地名搜索引擎中,實(shí)踐證明,抽取的實(shí)體和關(guān)系數(shù)據(jù)很大程度上提升了用戶(hù)體驗(yàn),幫助了用戶(hù)更方便、更迅速、更準(zhǔn)確的地名相關(guān)數(shù)據(jù)。
[Abstract]:In recent years, emergencies occur frequently. The importance of emergency management is becoming more and more prominent. The process of emergency management involves the fusion of many aspects of data. How to provide relevant data quickly and accurately is an urgent problem. With the development of the Internet, the data on the network increase exponentially, which contains a lot of information needed for emergency management. Toponymic information is the core support of emergency information. In this paper, the ontology and relation extraction of geographical names is carried out to extract the relationship between entities and entities, which lays the core foundation for the extraction and semantics of emergency data. The extraction of entities and relationships belongs to named entity identification and relation extraction in natural language processing. At present, the mainstream methods are rule-based approach and machine-based learning method. According to the characteristics of entities and relationships in the original text, this paper adopts rule-based and machine-learning methods in the process of extraction, respectively. Because there is no good corpus for toponymic domain extraction, this paper first establishes the entity system and relational system of toponymic ontology extraction, and then establishes the corpus needed for entity extraction and relational extraction according to the features concerned in the extraction process. The construction process of corpus is introduced in detail. The ontology entities of geographical names are classified according to their rules in the original text, respectively, which are based on rules and machine learning methods using maximum entropy. Firstly, the extraction rules of four kinds of toponymic ontology entities are summarized, then the features used in the machine learning process are analyzed for several other toponymic ontology entities, which are based on annotated corpus. The maximum entropy is used to extract geographical names. For the extraction of relationships, the characteristics of the relationships are analyzed, and the feature vector based method is used to extract the relationships using SVM. According to the characteristics of corpus, a rule-based method is proposed to extract the relation of geographical names ontology. At the same time, the characteristics of the relationship are analyzed, and the relevant rules are made. Based on the existing relations, the implicit relationship is derived, which further enriches the ontology relation database of geographical names. Finally, the ontology entity and relational extraction platform are designed and implemented, and the extracted data are applied to the actual semantic toponymic search engine. The practice shows that the extracted entity and relational data greatly improve the user experience. Help users to more convenient, faster, more accurate place name related data.
【學(xué)位授予單位】:天津大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2012
【分類(lèi)號(hào)】:TP391.1

【參考文獻(xiàn)】

相關(guān)期刊論文 前5條

1 周俊生;戴新宇;尹存燕;陳家駿;;基于層疊條件隨機(jī)場(chǎng)模型的中文機(jī)構(gòu)名自動(dòng)識(shí)別[J];電子學(xué)報(bào);2006年05期

2 劉克彬;李芳;劉磊;韓穎;;基于核函數(shù)中文關(guān)系自動(dòng)抽取系統(tǒng)的實(shí)現(xiàn)[J];計(jì)算機(jī)研究與發(fā)展;2007年08期

3 蔣方玲;王文俊;楊鵬;徐佳佳;;中文地名本體模型研究[J];計(jì)算機(jī)工程與應(yīng)用;2011年25期

4 王寧,葛瑞芳,苑春法,黃錦輝,李文捷;中文金融新聞中公司名的識(shí)別[J];中文信息學(xué)報(bào);2002年02期

5 董靜;孫樂(lè);馮元勇;黃瑞紅;;中文實(shí)體關(guān)系抽取中的特征選擇研究[J];中文信息學(xué)報(bào);2007年04期

相關(guān)碩士學(xué)位論文 前1條

1 張志田;無(wú)監(jiān)督關(guān)系抽取方法研究[D];哈爾濱工業(yè)大學(xué);2007年

,

本文編號(hào):2157790

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2157790.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶(hù)b7daa***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com
久久黄片免费播放大全| 国产又猛又大又长又粗| 午夜福利激情性生活免费视频| 99久久精品午夜一区| 日韩人妻毛片中文字幕| 国产一区二区三区不卡| 日韩中文字幕视频在线高清版| 亚洲最新一区二区三区| 国产老熟女乱子人伦视频| 在线免费不卡亚洲国产| 国产av熟女一区二区三区蜜桃| 亚洲夫妻性生活免费视频| 欧美性猛交内射老熟妇| 日韩欧美一区二区亚洲| 日韩精品一级片免费看| 国产情侣激情在线对白| 中文字幕亚洲精品在线播放| 日韩特级黄片免费在线观看 | 麻豆蜜桃星空传媒在线观看| 国产一区二区三区色噜噜| 亚洲男人的天堂色偷偷| 丝袜人妻夜夜爽一区二区三区| 中文字幕中文字幕一区二区| 欧美av人人妻av人人爽蜜桃| 黄色激情视频中文字幕| 爱在午夜降临前在线观看| 久久经典一区二区三区| 老司机精品在线你懂的| 日韩欧美中文字幕av| 九九视频通过这里有精品| 中文文精品字幕一区二区| 日韩精品一区二区毛片| 国产黑人一区二区三区| 又黄又色又爽又免费的视频| 亚洲欧美日韩综合在线成成| 亚洲国产av在线观看一区| 色婷婷视频国产一区视频| 在线日韩中文字幕一区| 国产精品欧美激情在线观看| 色婷婷在线精品国自产拍| 久久久精品日韩欧美丰满|