基于個人檔案的信息提取和可視化分析
[Abstract]:With the popularity of the Internet, the information on the Internet has exploded. In addition to the expansion of the number, the types of information are becoming more diverse. In a variety of data types, one kind of data can be called "personal files", such as resume, personal home page, personage introduction page on online encyclopedia, and so on. Social relationships among people are possible. For example, if two people have been learning from the same university in the overlapping period of time, they are likely to be classmates. The social network obtained through this analysis is valuable and can be applied to a number of problems, such as the most common in social network analysis. This paper introduces a system for information extraction and visual analysis of personal file data, and describes the main algorithms involved in the system. The system includes two main functions: extracting information from personal files, building an entity based association network, and predicting among people. Social relationships; based on this network, a shallow analysis of the importance or influence of PageRank on people is carried out. The process of building the above network is divided into two steps. First, the establishment of an association network composed of various types of entities, which can be considered as a heterogeneous information network for a specific domain. This step involves To the structured processing of personal file data, including entity recognition and event extraction, we select the method of clustering based on syntactic parsing tree similarity and combine rules extraction to extract the event. The second step is based on the established association network and through path analysis to establish name nodes between people. Before this, we need to supplement the relationship between other types of nodes in order to get more comprehensive path information. Considering the characteristics of heterogeneous networks, we use different methods to build the relationship between different types of nodes. The visual analysis of the information network is mainly through the calculation of the importance of PageRank to the characters. In a visual environment, limited to human cognitive ability and the accuracy of display devices, we think that the ranking of nodes is more important than the actual PageRank value. Therefore, the calculation of PageRank should stop in advance when the relative order of the node is no longer changed. There are two branches of research on the improvement of PageRank. One class of studies tends to speed up the convergence rate of traditional Power methods from a mathematical point of view; another is based on the Monte Carlo method to approximate the results of PageRank. However, they are not suitable for the approximate ranking of nodes. The first method is committed to maintaining the accuracy. Under the premise, the speed of convergence is accelerated; while the second method is very efficient, but it is better at the recognition of high ranking nodes, and the order of the high ranking nodes is not ideal. Therefore, the second part of the article puts forward the Early-stop algorithm. The algorithm can be divided into two steps: Grouping and Parallel Updating.Grouping are simulated random by random. Walk to determine the general range of node order; Parallel Updating adjusts the order of nodes near the ranking in a small range by parallel update methods. The experimental results prove that the Early-stop algorithm effectively improves the accuracy of the order approximation of high ranking nodes. The main contributions of this paper are as follows: a personal file is proposed. The system that carries out data extraction and analysis, completes the whole process from information extraction to visual analysis. It points out that visual analysis reduces the precision requirements of the calculation results, and then proposes a fast approximate PageRank Early-stop algorithm. Through a large number of experiments, it is proved that the accuracy of the Early-stop algorithm in the approximate node ranking is higher than that of when. The latest stochastic simulation algorithm.
【學位授予單位】:山東大學
【學位級別】:碩士
【學位授予年份】:2017
【分類號】:TP391.1
【相似文獻】
相關(guān)期刊論文 前10條
1 趙麗華;聶建國;;可視化技術(shù)在圖書館中的應(yīng)用[J];圖書館學刊;2011年03期
2 趙倩;任磊;滕東興;;基于筆式界面的交互式可視化分析系統(tǒng)[J];計算機工程與應(yīng)用;2009年06期
3 袁順波;蔣定福;董文鴛;;期刊影響因子研究演進的可視化分析[J];嘉興學院學報;2011年05期
4 王偉軍;官思發(fā);李亞芳;;知識共享研究熱點與前沿的可視化分析[J];圖書情報知識;2012年01期
5 李琰;趙龍釗;李紅霞;;1991—2012年《中國安全科學學報》發(fā)表論文可視化分析[J];中國安全科學學報;2013年09期
6 邱均平;呂紅;;基于知識圖譜的知識網(wǎng)絡(luò)研究可視化分析[J];情報科學;2013年12期
7 侯筱蓉;趙德春;胡虹;;專利引證類型可視化分析[J];科技管理研究;2011年17期
8 張婷;;國際核心期刊中云計算研究的可視化分析[J];農(nóng)業(yè)圖書情報學刊;2012年03期
9 劉真真;;探討園藝植物可視化技術(shù)的應(yīng)用[J];現(xiàn)代園藝;2013年16期
10 程業(yè)炳;;國內(nèi)外知識轉(zhuǎn)移研究現(xiàn)狀的可視化分析[J];內(nèi)蒙古財經(jīng)大學學報;2013年03期
相關(guān)會議論文 前7條
1 郭建勇;劉俊;張鑒;遲學斌;;5·12汶川地震的可視化與分析[A];圖像圖形技術(shù)研究與應(yīng)用(2010)[C];2010年
2 張振龍;楊波;;可視化智能化機構(gòu)分析與設(shè)計系統(tǒng)的研制[A];第十三屆全國機構(gòu)學學術(shù)研討會論文集[C];2002年
3 孫傳諄;鄭新奇;鄧紅蒂;左玉強;蘇航;;土地節(jié)約集約利用研究進展的可視化分析[A];中國山區(qū)土地資源開發(fā)利用與人地協(xié)調(diào)發(fā)展研究[C];2010年
4 孫傳諄;鄭新奇;鄧紅蒂;左玉強;蘇航;;土地節(jié)約集約利用研究進展的可視化分析[A];中國山區(qū)土地資源開發(fā)利用與人地協(xié)調(diào)發(fā)展研究[C];2010年
5 柳輝;;基于AutoCAD的維修性人機可視化分析[A];面向制造業(yè)的自動化與信息化技術(shù)創(chuàng)新設(shè)計的基礎(chǔ)技術(shù)——2001年中國機械工程學會年會暨第九屆全國特種加工學術(shù)年會論文集[C];2001年
6 楊璐;伍蓓;杜杰麗;;IT外包決策研究回顧和模型評介——基于CiteSpaceⅡ的可視化分析[A];第九屆中國科技政策與管理學術(shù)年會論文集[C];2013年
7 李紅綱;鮑玉斌;焦洪國;于戈;鄭懷遠;;維分析樹導(dǎo)航下的可視化OLAP分析[A];第十八屆全國數(shù)據(jù)庫學術(shù)會議論文集(研究報告篇)[C];2001年
相關(guān)碩士學位論文 前10條
1 王舒可;新聞可視化研究[D];河北大學;2015年
2 夏晴;科研工作成功原因挖掘及可視化[D];上海大學;2015年
3 楊宏偉;宜賓電網(wǎng)可視化分析預(yù)警系統(tǒng)的設(shè)計與實現(xiàn)[D];電子科技大學;2014年
4 楊陽;微博內(nèi)容的采集、分析及其可視化研究[D];大連理工大學;2015年
5 趙玨;區(qū)域經(jīng)濟普查數(shù)據(jù)可視化分析系統(tǒng)的設(shè)計與實現(xiàn)[D];電子科技大學;2015年
6 朱美玲;近十五年來我國高等教育質(zhì)量研究的可視化分析[D];西北師范大學;2015年
7 李潔;基于SNA的館藏數(shù)字資源知識聚合可視化研究[D];吉林大學;2016年
8 孫偉偉;圖結(jié)構(gòu)數(shù)據(jù)的可視化分析系統(tǒng)的設(shè)計與實現(xiàn)[D];東南大學;2016年
9 呂朝陽;基于個人檔案的信息提取和可視化分析[D];山東大學;2017年
10 馬井剛;面向復(fù)雜網(wǎng)絡(luò)的可視化分析工具的設(shè)計與實現(xiàn)[D];北京郵電大學;2010年
,本文編號:2119385
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2119385.html