天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁 > 科技論文 > 搜索引擎論文 >

人名消歧關(guān)鍵技術(shù)研究與實(shí)現(xiàn)

發(fā)布時(shí)間:2018-03-29 08:16

  本文選題:人名消歧 切入點(diǎn):機(jī)構(gòu)名識別 出處:《哈爾濱工業(yè)大學(xué)》2012年碩士論文


【摘要】:隨著移動互聯(lián)網(wǎng)時(shí)代的到來,網(wǎng)絡(luò)使用的便捷性不斷提高,終端數(shù)量不斷增加,使得信息發(fā)布的速度加快,信息量飛速增長搜索與特定人物相關(guān)的信息是用戶在互聯(lián)網(wǎng)上進(jìn)行搜索的主要目的之一,而重名現(xiàn)象的普遍性導(dǎo)致互聯(lián)網(wǎng)文本中人名歧義現(xiàn)象嚴(yán)重通用搜索引擎返回的結(jié)果并不能針對歧義現(xiàn)象有效地組織信息,造成了用戶耗費(fèi)大量的時(shí)間從許多同名人物中篩選自己感興趣的人物信息,且有遺漏重要信息的信息的風(fēng)險(xiǎn)因此,如何有效的消除這些歧義,把信息以有組織的形式呈現(xiàn)給用戶,就成為一個(gè)非常重要的問題為此,本文進(jìn)行了以下四個(gè)方面的工作: 第一,本文探討了人工標(biāo)注人名歧義語料的過程,并提出了基于自適應(yīng)共振理論的兩階段消歧策略模仿這一過程:在第一階段,構(gòu)建代表人物的類別并對文檔進(jìn)行分類,在第二階段通過層次凝聚的方法合并相似的類別系統(tǒng)通過類人行為,自動構(gòu)建目標(biāo)概念集合并實(shí)現(xiàn)歧義消解本文設(shè)計(jì)實(shí)驗(yàn)并驗(yàn)證了兩階段消歧策略的有效性,在兩種人名識別結(jié)果上,本文的兩階段方法的性能比傳統(tǒng)方法提高了0.92%和5.00% 第二,本文實(shí)現(xiàn)了人機(jī)互助的系統(tǒng),,輔助建立識別規(guī)則和多種知識詞典資源并利用這些資源和規(guī)則建立了機(jī)構(gòu)名識別系統(tǒng),通過與其他兩種命名實(shí)體識別工具ISLEX和LTP的比較,證明了規(guī)則方法在人名消歧任務(wù)的識別要求中,具有較高的性能和效率,可以有效適用于人名消歧系統(tǒng)的實(shí)際應(yīng)用 第三,本文對搜狗全網(wǎng)新聞?wù)Z料進(jìn)行了標(biāo)注,得到了可用于互聯(lián)網(wǎng)人名消歧研究的真實(shí)網(wǎng)絡(luò)語料資源;分析了人物屬性的對于互聯(lián)網(wǎng)語料的重要性和各屬性的特點(diǎn);針對網(wǎng)絡(luò)上的非結(jié)構(gòu)化信息,設(shè)計(jì)并實(shí)現(xiàn)人物屬性抽取系統(tǒng);最后,通過在真實(shí)網(wǎng)絡(luò)語料上的實(shí)驗(yàn),對人物屬性特征的有效性進(jìn)行了驗(yàn)證 第四,本文分析了人名消歧系統(tǒng)的任務(wù)和功能,設(shè)計(jì)并實(shí)現(xiàn)了基于知識資源人名消歧模塊,完成了頁面爬取頁面分析基于知識資源人名消歧數(shù)據(jù)存儲等模塊,實(shí)現(xiàn)了直觀的消歧結(jié)果排序算法,建立了新聞檢索結(jié)果消歧系統(tǒng)
[Abstract]:With the advent of the mobile Internet era, the convenience of network use has been improved, and the number of terminals has been increasing, which makes the speed of information release accelerated. The rapid growth of information is one of the main purposes for users to search the Internet for information related to a particular person. However, the universality of the phenomenon of double names leads to the serious phenomenon of name ambiguity in Internet texts. The results returned by the common search engine can not organize information effectively against the phenomenon of ambiguity. It causes users to spend a lot of time to filter the information of people of interest from many people of the same name, and there is the risk of omitting important information. Therefore, how to effectively eliminate these ambiguities, Presenting information to users in organized form becomes a very important issue. For this reason, this paper has carried out the following four aspects of work:. First, this paper discusses the process of manually tagging the ambiguous corpus of human names, and proposes a two-stage disambiguation strategy based on adaptive resonance theory to imitate this process: in the first stage, the categories of representative persons are constructed and the documents are classified. In the second stage, the similar category system is merged by hierarchical aggregation, and the target concept set is automatically constructed by humanoid behavior. In this paper, experiments are designed and the effectiveness of the two-stage disambiguation strategy is verified. The performance of the two-stage method in this paper is better than that of the traditional method by 0.92% and 5.00%. Secondly, this paper implements a man-machine mutual aid system, which helps to establish recognition rules and a variety of knowledge dictionary resources, and uses these resources and rules to establish an institution name recognition system. The system is compared with two other named entity recognition tools, ISLEX and LTP. It is proved that the rule method has higher performance and efficiency in the task of name disambiguation, and it can be effectively applied to the practical application of the disambiguation system. Thirdly, this paper annotates the whole news corpus of Sogou, obtains the real network corpus resources that can be used in the research of Internet name disambiguation, analyzes the importance of the character attribute to the Internet corpus and the characteristics of each attribute. In view of the unstructured information on the network, the character attribute extraction system is designed and implemented. Finally, the validity of the character attribute feature is verified by the experiment on the real network corpus. Fourth, this paper analyzes the tasks and functions of the disambiguation system, designs and implements the disambiguation module based on knowledge resource, and completes the module of page crawling page analysis based on the data storage of human name disambiguation based on knowledge resources. An intuitive sorting algorithm of disambiguation results is implemented, and a news retrieval result disambiguation system is established.
【學(xué)位授予單位】:哈爾濱工業(yè)大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2012
【分類號】:TP391.3

【參考文獻(xiàn)】

相關(guān)期刊論文 前5條

1 郎君;秦兵;宋巍;劉龍;劉挺;李生;;基于社會網(wǎng)絡(luò)的人名檢索結(jié)果重名消解[J];計(jì)算機(jī)學(xué)報(bào);2009年07期

2 楊欣欣;李培峰;朱巧明;王英帥;;一種基于改進(jìn)的K-means算法的人名消歧系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)[J];計(jì)算機(jī)與數(shù)字工程;2010年08期

3 王寧,葛瑞芳,苑春法,黃錦輝,李文捷;中文金融新聞中公司名的識別[J];中文信息學(xué)報(bào);2002年02期

4 沈嘉懿;李芳;徐飛玉;Hans Uszkoreit;;中文組織機(jī)構(gòu)名稱與簡稱的識別[J];中文信息學(xué)報(bào);2007年06期

5 張小衡,王玲玲;中文機(jī)構(gòu)名稱的識別與分析[J];中文信息學(xué)報(bào);1997年04期



本文編號:1680297

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/1680297.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶ec880***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com
国产精品久久香蕉国产线| 粉嫩内射av一区二区| 99久久精品午夜一区二区| 亚洲视频在线观看你懂的| 欧美日韩在线观看自拍| 国产精品午夜福利在线观看| 91麻豆视频国产一区二区| 精品午夜福利无人区乱码| 玩弄人妻少妇一区二区桃花| 激情三级在线观看视频| 99少妇偷拍视频在线| 色偷偷亚洲女人天堂观看| 欧美一区二区不卡专区| 丰满人妻熟妇乱又伦精另类视频| 亚洲午夜精品视频在线| 久久这里只精品免费福利| 亚洲熟女精品一区二区成人| 日韩精品综合免费视频| 亚洲熟妇熟女久久精品 | 99久久精品国产日本| 青青免费操手机在线视频| 韩日黄片在线免费观看| 国产免费操美女逼视频| 国产精品丝袜美腿一区二区| 欧美日韩亚洲巨色人妻| 欧洲一区二区三区自拍天堂| 91福利免费一区二区三区| 91精品国产综合久久精品| 又黄又色又爽又免费的视频| 国产精品欧美在线观看| 国产91人妻精品一区二区三区| 人妻乱近亲奸中文字幕| 国产又粗又猛又黄又爽视频免费| 99热在线精品视频观看| 富婆又大又白又丰满又紧又硬| 国产盗摄精品一区二区视频| 亚洲一区二区久久观看| 国产一区二区三区四区中文| 一区二区三区亚洲天堂| 搡老妇女老熟女一区二区| 超薄丝袜足一区二区三区|