人名消歧關(guān)鍵技術(shù)研究與實(shí)現(xiàn)

發(fā)布時(shí)間：2018-03-29 08:16

本文選題：人名消歧　切入點(diǎn)：機(jī)構(gòu)名識(shí)別　出處：《哈爾濱工業(yè)大學(xué)》2012年碩士論文

【摘要】：隨著移動(dòng)互聯(lián)網(wǎng)時(shí)代的到來(lái)，網(wǎng)絡(luò)使用的便捷性不斷提高，終端數(shù)量不斷增加，使得信息發(fā)布的速度加快，信息量飛速增長(zhǎng)搜索與特定人物相關(guān)的信息是用戶在互聯(lián)網(wǎng)上進(jìn)行搜索的主要目的之一，而重名現(xiàn)象的普遍性導(dǎo)致互聯(lián)網(wǎng)文本中人名歧義現(xiàn)象嚴(yán)重通用搜索引擎返回的結(jié)果并不能針對(duì)歧義現(xiàn)象有效地組織信息，造成了用戶耗費(fèi)大量的時(shí)間從許多同名人物中篩選自己感興趣的人物信息，且有遺漏重要信息的信息的風(fēng)險(xiǎn)因此，如何有效的消除這些歧義，把信息以有組織的形式呈現(xiàn)給用戶，就成為一個(gè)非常重要的問(wèn)題為此，本文進(jìn)行了以下四個(gè)方面的工作：第一，本文探討了人工標(biāo)注人名歧義語(yǔ)料的過(guò)程，并提出了基于自適應(yīng)共振理論的兩階段消歧策略模仿這一過(guò)程：在第一階段，構(gòu)建代表人物的類(lèi)別并對(duì)文檔進(jìn)行分類(lèi)，在第二階段通過(guò)層次凝聚的方法合并相似的類(lèi)別系統(tǒng)通過(guò)類(lèi)人行為，自動(dòng)構(gòu)建目標(biāo)概念集合并實(shí)現(xiàn)歧義消解本文設(shè)計(jì)實(shí)驗(yàn)并驗(yàn)證了兩階段消歧策略的有效性，在兩種人名識(shí)別結(jié)果上，本文的兩階段方法的性能比傳統(tǒng)方法提高了0.92%和5.00% 第二，本文實(shí)現(xiàn)了人機(jī)互助的系統(tǒng)，，輔助建立識(shí)別規(guī)則和多種知識(shí)詞典資源并利用這些資源和規(guī)則建立了機(jī)構(gòu)名識(shí)別系統(tǒng)，通過(guò)與其他兩種命名實(shí)體識(shí)別工具ISLEX和LTP的比較，證明了規(guī)則方法在人名消歧任務(wù)的識(shí)別要求中，具有較高的性能和效率，可以有效適用于人名消歧系統(tǒng)的實(shí)際應(yīng)用第三，本文對(duì)搜狗全網(wǎng)新聞?wù)Z料進(jìn)行了標(biāo)注，得到了可用于互聯(lián)網(wǎng)人名消歧研究的真實(shí)網(wǎng)絡(luò)語(yǔ)料資源；分析了人物屬性的對(duì)于互聯(lián)網(wǎng)語(yǔ)料的重要性和各屬性的特點(diǎn)；針對(duì)網(wǎng)絡(luò)上的非結(jié)構(gòu)化信息，設(shè)計(jì)并實(shí)現(xiàn)人物屬性抽取系統(tǒng)；最后，通過(guò)在真實(shí)網(wǎng)絡(luò)語(yǔ)料上的實(shí)驗(yàn)，對(duì)人物屬性特征的有效性進(jìn)行了驗(yàn)證第四，本文分析了人名消歧系統(tǒng)的任務(wù)和功能，設(shè)計(jì)并實(shí)現(xiàn)了基于知識(shí)資源人名消歧模塊，完成了頁(yè)面爬取頁(yè)面分析基于知識(shí)資源人名消歧數(shù)據(jù)存儲(chǔ)等模塊，實(shí)現(xiàn)了直觀的消歧結(jié)果排序算法，建立了新聞檢索結(jié)果消歧系統(tǒng)
[Abstract]:With the advent of the mobile Internet era, the convenience of network use has been improved, and the number of terminals has been increasing, which makes the speed of information release accelerated. The rapid growth of information is one of the main purposes for users to search the Internet for information related to a particular person. However, the universality of the phenomenon of double names leads to the serious phenomenon of name ambiguity in Internet texts. The results returned by the common search engine can not organize information effectively against the phenomenon of ambiguity. It causes users to spend a lot of time to filter the information of people of interest from many people of the same name, and there is the risk of omitting important information. Therefore, how to effectively eliminate these ambiguities, Presenting information to users in organized form becomes a very important issue. For this reason, this paper has carried out the following four aspects of work:. First, this paper discusses the process of manually tagging the ambiguous corpus of human names, and proposes a two-stage disambiguation strategy based on adaptive resonance theory to imitate this process: in the first stage, the categories of representative persons are constructed and the documents are classified. In the second stage, the similar category system is merged by hierarchical aggregation, and the target concept set is automatically constructed by humanoid behavior. In this paper, experiments are designed and the effectiveness of the two-stage disambiguation strategy is verified. The performance of the two-stage method in this paper is better than that of the traditional method by 0.92% and 5.00%. Secondly, this paper implements a man-machine mutual aid system, which helps to establish recognition rules and a variety of knowledge dictionary resources, and uses these resources and rules to establish an institution name recognition system. The system is compared with two other named entity recognition tools, ISLEX and LTP. It is proved that the rule method has higher performance and efficiency in the task of name disambiguation, and it can be effectively applied to the practical application of the disambiguation system. Thirdly, this paper annotates the whole news corpus of Sogou, obtains the real network corpus resources that can be used in the research of Internet name disambiguation, analyzes the importance of the character attribute to the Internet corpus and the characteristics of each attribute. In view of the unstructured information on the network, the character attribute extraction system is designed and implemented. Finally, the validity of the character attribute feature is verified by the experiment on the real network corpus. Fourth, this paper analyzes the tasks and functions of the disambiguation system, designs and implements the disambiguation module based on knowledge resource, and completes the module of page crawling page analysis based on the data storage of human name disambiguation based on knowledge resources. An intuitive sorting algorithm of disambiguation results is implemented, and a news retrieval result disambiguation system is established.
【學(xué)位授予單位】：哈爾濱工業(yè)大學(xué)
【學(xué)位級(jí)別】：碩士
【學(xué)位授予年份】：2012
【分類(lèi)號(hào)】：TP391.3

【參考文獻(xiàn)】

相關(guān)期刊論文前5條

1 郎君;秦兵;宋巍;劉龍;劉挺;李生;;基于社會(huì)網(wǎng)絡(luò)的人名檢索結(jié)果重名消解[J];計(jì)算機(jī)學(xué)報(bào);2009年07期

2 楊欣欣;李培峰;朱巧明;王英帥;;一種基于改進(jìn)的K-means算法的人名消歧系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)[J];計(jì)算機(jī)與數(shù)字工程;2010年08期

3 王寧,葛瑞芳,苑春法,黃錦輝,李文捷;中文金融新聞中公司名的識(shí)別[J];中文信息學(xué)報(bào);2002年02期

4 沈嘉懿;李芳;徐飛玉;Hans Uszkoreit;;中文組織機(jī)構(gòu)名稱(chēng)與簡(jiǎn)稱(chēng)的識(shí)別[J];中文信息學(xué)報(bào);2007年06期

5 張小衡,王玲玲;中文機(jī)構(gòu)名稱(chēng)的識(shí)別與分析[J];中文信息學(xué)報(bào);1997年04期

本文編號(hào)：1680297

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會(huì)員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/sousuoyinqinglunwen/1680297.html

上一篇：SD軟件公司的web2.0營(yíng)銷(xiāo)策略研究
下一篇：高職院校網(wǎng)站SEO之路

論文發(fā)表

·知網(wǎng)|萬(wàn)方|維普|龍?jiān)磡省級(jí)|國(guó)家級(jí)|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

人名消歧關(guān)鍵技術(shù)研究與實(shí)現(xiàn)