基于知識(shí)圖譜構(gòu)建人物關(guān)系的設(shè)計(jì)與實(shí)現(xiàn)
本文關(guān)鍵詞: OrientDB 知識(shí)圖譜 本體 出處:《重慶大學(xué)》2016年碩士論文 論文類型:學(xué)位論文
【摘要】:公安情報(bào)工作的重點(diǎn)是關(guān)注人物、組織、賬號(hào)之間的關(guān)系,在實(shí)際工作中往往需要通過一個(gè)姓名獲得與之相關(guān)的所有信息,比如某人最近和哪些人聯(lián)系過,某人參加過哪些活動(dòng),某人使用過哪些社交賬號(hào)等,這些需求通常需要人工在海量信息中查找答案,于是論文提出構(gòu)建一套描述了人物之間的關(guān)聯(lián)關(guān)系的知識(shí)圖譜,使得通過查詢知識(shí)圖譜中人物關(guān)系,就能獲得人物的基本信息、人物相關(guān)活動(dòng)軌跡信息、人物的相關(guān)人物信息等。當(dāng)然,知識(shí)圖譜應(yīng)用在查詢?nèi)宋镪P(guān)系上為情報(bào)工作帶來了便利,但是如何設(shè)計(jì)和構(gòu)建知識(shí)圖譜卻是一個(gè)難點(diǎn)。然而,現(xiàn)存的許多研究工作都假設(shè)了原始數(shù)據(jù)已經(jīng)清洗完畢,人物關(guān)系已經(jīng)構(gòu)建成為三元組數(shù)據(jù),甚至知識(shí)圖譜已經(jīng)構(gòu)建完畢,而主要研究知識(shí)圖譜的分析方法和應(yīng)用場(chǎng)景。于是,論文的主要工作集中在從原始數(shù)據(jù)到形成人物知識(shí)圖譜的過程上,而對(duì)于人物知識(shí)圖譜應(yīng)用只需要滿足查詢?nèi)宋镪P(guān)系的要求。對(duì)于設(shè)計(jì)構(gòu)建人物關(guān)系的知識(shí)圖譜,主要存在三個(gè)難點(diǎn)問題:一、原始的數(shù)據(jù)量非常大而且數(shù)據(jù)結(jié)構(gòu)完全不一樣,如何從中抽取到人物、組織、賬號(hào)等關(guān)注的對(duì)象,以及如何判斷兩個(gè)人物存在關(guān)系。二、針對(duì)知識(shí)圖譜的更新問題,如何判斷新加入的人物是否已經(jīng)存在于知識(shí)圖譜中,而且如果對(duì)于已存在的人物又如何合并人物相關(guān)信息。三、人物關(guān)系包含了人與人、人與組織、人與網(wǎng)站、人與賬號(hào)等上千類關(guān)系,如何設(shè)計(jì)每種對(duì)象的數(shù)據(jù)模型,既能描述對(duì)象基本信息,又能描述對(duì)象之間關(guān)系。本文的主要工作有:(1)在本體建模的基礎(chǔ)上,提出了人物關(guān)系建模方法。首先根據(jù)域、類、屬性、實(shí)體的定義,詳細(xì)設(shè)計(jì)了這四類數(shù)據(jù)結(jié)構(gòu),并指導(dǎo)創(chuàng)建了人物屬性集合、人物關(guān)系集合,并實(shí)際驗(yàn)證了該建模方案的可行性。(2)在自然語(yǔ)言分詞技術(shù)基礎(chǔ)上,提出了融合多正則表達(dá)式的人物實(shí)體抽取技術(shù)。通過實(shí)驗(yàn)比較了中科院分詞和哈工大分詞的中文分詞效果,分析了兩種分詞技術(shù)的不同特點(diǎn)。同時(shí),實(shí)驗(yàn)證明了結(jié)合多正則表達(dá)式可以提高實(shí)體抽取效果,特別適用于識(shí)別賬號(hào)類實(shí)體。(3)提出了基于知識(shí)圖譜的人物關(guān)系搜索、語(yǔ)義搜索、場(chǎng)景化搜索這三種應(yīng)用方案,并對(duì)比了三種方案的應(yīng)用場(chǎng)景。
[Abstract]:The focus of public security intelligence work is to focus on the relationships between people, organizations, and accounts. In practical work, it is often necessary to obtain all the relevant information through a name, such as who someone has recently contacted. As to what activities someone has participated in, what social accounts they have used, and so on, which needs to be manually searched for answers in a huge amount of information, the paper proposes to build a knowledge map that describes the relationships between people. It makes it possible to obtain the basic information of the characters, the information about the trajectory of the activities of the characters, the information about the characters, and so on by querying the relationships of the characters in the knowledge map. Of course, The application of knowledge atlas in querying relationships of people brings convenience to intelligence work, but how to design and construct knowledge atlas is a difficulty. However, many existing researches assume that the original data has been cleaned. The relationship between people has been constructed into triple data, even the knowledge map has been constructed, and the analysis methods and application scenarios of knowledge map are mainly studied. The main work of this paper is to focus on the process from the original data to the formation of the character knowledge map, but the application of the character knowledge map only needs to meet the requirements of querying the relationship between the people, and to design and construct the knowledge map of the person relationship, There are mainly three difficult problems: first, the original data volume is very large and the data structure is completely different, how to extract the objects of concern such as characters, organizations, accounts, and how to judge the relationship between the two characters. In view of the problem of updating the knowledge map, how to judge whether the newly added characters already exist in the knowledge map, and how to combine the relevant information about the existing characters. Third, the relationship of people includes people, people and organizations. How to design the data model of each object can describe the basic information of the object and the relationship between the objects. The main work of this paper is: 1) on the basis of ontology modeling, how to design the data model of each object, such as human and website, person and account number, etc. Firstly, according to the definition of domain, class, attribute and entity, four kinds of data structures are designed in detail. Finally, the feasibility of the modeling scheme is verified. (2) based on the natural language word segmentation technology, a character entity extraction technique combining multiple regular expressions is put forward, and the Chinese word segmentation effect of Chinese Academy of Sciences segmentation and Hart participle is compared through experiments. This paper analyzes the different characteristics of two word segmentation techniques. At the same time, the experiment proves that the combination of multiple regular expressions can improve the effect of entity extraction, especially for identifying account class entities. Scene-based search for these three application schemes, and compare the application scenarios of the three schemes.
【學(xué)位授予單位】:重慶大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2016
【分類號(hào)】:D035.3
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 劉嶠;鐘云;李楊;劉瑤;秦志光;;基于圖的中文集成實(shí)體鏈接算法[J];計(jì)算機(jī)研究與發(fā)展;2016年02期
2 郭維威;劉鋒;;基于領(lǐng)域本體的語(yǔ)義web智能檢索模型研究[J];蘭州文理學(xué)院學(xué)報(bào)(自然科學(xué)版);2016年01期
3 熊晶;鐘珞;王愛民;;甲骨文知識(shí)圖譜構(gòu)建中的實(shí)體關(guān)系發(fā)現(xiàn)研究[J];計(jì)算機(jī)工程與科學(xué);2015年11期
4 丁博;苗世迪;;制造資源本體的概念語(yǔ)義相似度研究[J];計(jì)算機(jī)應(yīng)用研究;2016年01期
5 孫曉寧;閆勵(lì);張強(qiáng);;科學(xué)知識(shí)圖譜在學(xué)科可視化研究中的應(yīng)用[J];圖書館;2014年05期
6 楊錦鋒;于秋濱;關(guān)毅;蔣志鵬;;電子病歷命名實(shí)體識(shí)別和實(shí)體關(guān)系抽取研究綜述[J];自動(dòng)化學(xué)報(bào);2014年08期
7 武金剛;;知識(shí)圖譜——搜索引擎的進(jìn)化[J];百科知識(shí);2013年22期
8 胡澤文;孫建軍;武夷山;;國(guó)內(nèi)知識(shí)圖譜應(yīng)用研究綜述[J];圖書情報(bào)工作;2013年03期
9 宗乾進(jìn);袁勤儉;沈洪洲;;國(guó)外社交網(wǎng)絡(luò)研究熱點(diǎn)與前沿[J];圖書情報(bào)知識(shí);2012年06期
10 陳宇;鄭德權(quán);趙鐵軍;;基于Deep Belief Nets的中文名實(shí)體關(guān)系抽取[J];軟件學(xué)報(bào);2012年10期
相關(guān)博士學(xué)位論文 前1條
1 胡芳槐;基于多種數(shù)據(jù)源的中文知識(shí)圖譜構(gòu)建方法研究[D];華東理工大學(xué);2015年
相關(guān)碩士學(xué)位論文 前9條
1 張成海;網(wǎng)絡(luò)數(shù)據(jù)的交互可視分析[D];華東師范大學(xué);2015年
2 劉永耀;Excel數(shù)據(jù)導(dǎo)入Oracle數(shù)據(jù)庫(kù)表方法的研究與對(duì)比[D];東華大學(xué);2014年
3 魯軼奇;知識(shí)圖譜的數(shù)據(jù)清理和應(yīng)用探索[D];復(fù)旦大學(xué);2013年
4 劉一夢(mèng);基于 MongoDB的云數(shù)據(jù)管理技術(shù)的研究與應(yīng)用[D];北京交通大學(xué);2012年
5 張杰;基于關(guān)系數(shù)據(jù)庫(kù)的本體存儲(chǔ)研究與實(shí)現(xiàn)[D];重慶大學(xué);2012年
6 傅臨云;數(shù)據(jù)萬維網(wǎng)自動(dòng)實(shí)體匹配[D];上海交通大學(xué);2010年
7 趙夷平;傳統(tǒng)搜索引擎與語(yǔ)義搜索引擎比較研究[D];吉林大學(xué);2009年
8 楊虹;基于知識(shí)圖譜的知識(shí)管理研究進(jìn)展[D];大連理工大學(xué);2008年
9 王剛;自動(dòng)抽取維基百科文本中的語(yǔ)義關(guān)系[D];上海交通大學(xué);2008年
,本文編號(hào):1503008
本文鏈接:http://sikaile.net/falvlunwen/fanzuizhian/1503008.html