郵件社團特殊人物發(fā)現算法的研究
發(fā)布時間:2018-08-02 09:07
【摘要】:隨著信息化時代的到來,郵件成為人們普遍的一種信息傳遞方式,郵件網絡通過人們的通信行為而形成,其中蘊藏著豐富的郵件使用者的社會關系信息。因此,社會網絡分析(SNA)對社會關系中的電子郵件網絡的挖掘具有很大的潛在意義。 本文主要的工作是挖掘郵件網絡中的特殊人物,本文研究的特殊人物有兩種:垃圾郵件發(fā)送者以及關鍵的領導者。 垃圾郵件發(fā)送者發(fā)現算法主要是在垃圾郵件社團挖掘算法的基礎上改進提出的。利用有向賦權拓撲圖來構建郵件網絡通信,可以更好地反映郵件網絡真實的傳送信息情況,根據垃圾郵件發(fā)送者特征,通過先剝離再整合的思想,運用平均密度函數、Dijkstra算法(狄克斯特拉算法)中間中心度的計算,垃圾郵件發(fā)送者和其他評價函數就能找到垃圾郵件。 連接分析的思想,,則可以運用于尋找郵件網絡重要領導人物,在有向圖的基礎上,首先運用PageRank算法,根據節(jié)點的發(fā)送和接收關系計算節(jié)點重要度,將重要度排序、拓展集合、計算相似度等篩選初始種子集合,并改進對單向惡意鏈接節(jié)點的發(fā)現和剔除。通過添加節(jié)點雙向聯系度作為剔除單向惡意節(jié)點的依據,篩選后的節(jié)點集合作為EHITS算法運用對象。并使用節(jié)點PageRank值為節(jié)點重要度,運用EHITS算法計算節(jié)點的權威值和樞紐值,權威值高的節(jié)點就是我們要尋找的重要領導人物。最后在數據集上與度數中心度、中間中心度、HITS、PageRank這些算法進行對比,定義混淆度作為評價指標,評價算法有效性和優(yōu)越性。
[Abstract]:With the arrival of the information age, mail has become a universal way of information transmission. The mail network is formed through the communication behavior of people, which contains rich social information of users. Therefore, social network analysis of (SNA) has great potential significance for the mining of email networks in social relations. The main work of this paper is to mine the special characters in the mail network. There are two kinds of special people studied in this paper: the spammers and the key leaders. The spam senders discovery algorithm is mainly based on the spam community mining algorithm. By using directed weighted topology to construct mail network communication, it can better reflect the true transmission of information in mail network. According to the characteristics of spam sender, the idea of first stripping off and then integrating is adopted. Using the mean density function and Dijkstra algorithm (Dijkstra algorithm), the spam sender and other evaluation functions can find the spam. The idea of connection analysis can be used to find important leaders in mail network. On the basis of directed graph, PageRank algorithm is first used to calculate the importance of nodes according to the sending and receiving relationships of nodes, to sort the importance degrees and to expand the set. The initial seed set is filtered by calculating similarity, and the discovery and culling of one-way malicious link nodes are improved. By adding the bi-directional connection degree of nodes as the basis for eliminating one-way malicious nodes, the filtered set of nodes is used as the object of EHITS algorithm. The node PageRank value is used as the node importance, and the EHITS algorithm is used to calculate the node authority value and the hinge value. The node with high authority value is the important leader we are looking for. Finally, compared with the degree center degree, the intermediate center degree and the PageRank algorithm, the confusion degree is defined as the evaluation index to evaluate the validity and superiority of the algorithm.
【學位授予單位】:吉林大學
【學位級別】:碩士
【學位授予年份】:2014
【分類號】:TP393.08;TP393.098
本文編號:2158943
[Abstract]:With the arrival of the information age, mail has become a universal way of information transmission. The mail network is formed through the communication behavior of people, which contains rich social information of users. Therefore, social network analysis of (SNA) has great potential significance for the mining of email networks in social relations. The main work of this paper is to mine the special characters in the mail network. There are two kinds of special people studied in this paper: the spammers and the key leaders. The spam senders discovery algorithm is mainly based on the spam community mining algorithm. By using directed weighted topology to construct mail network communication, it can better reflect the true transmission of information in mail network. According to the characteristics of spam sender, the idea of first stripping off and then integrating is adopted. Using the mean density function and Dijkstra algorithm (Dijkstra algorithm), the spam sender and other evaluation functions can find the spam. The idea of connection analysis can be used to find important leaders in mail network. On the basis of directed graph, PageRank algorithm is first used to calculate the importance of nodes according to the sending and receiving relationships of nodes, to sort the importance degrees and to expand the set. The initial seed set is filtered by calculating similarity, and the discovery and culling of one-way malicious link nodes are improved. By adding the bi-directional connection degree of nodes as the basis for eliminating one-way malicious nodes, the filtered set of nodes is used as the object of EHITS algorithm. The node PageRank value is used as the node importance, and the EHITS algorithm is used to calculate the node authority value and the hinge value. The node with high authority value is the important leader we are looking for. Finally, compared with the degree center degree, the intermediate center degree and the PageRank algorithm, the confusion degree is defined as the evaluation index to evaluate the validity and superiority of the algorithm.
【學位授予單位】:吉林大學
【學位級別】:碩士
【學位授予年份】:2014
【分類號】:TP393.08;TP393.098
【參考文獻】
相關期刊論文 前10條
1 劉馨月;趙明硯;張憲超;劉芳芳;;基于最大流HITS的改進算法[J];計算機工程與應用;2008年17期
2 孫名松;高慶國;王宣丹;;基于雙隸屬度模糊支持向量機的郵件過濾[J];計算機工程與應用;2010年02期
3 楊勁松;凌培亮;;搜索引擎PageRank算法的改進[J];計算機工程;2009年22期
4 喬少杰;唐常杰;彭京;劉威;溫粉蓮;邱江濤;;基于個性特征仿真郵件分析系統(tǒng)挖掘犯罪網絡核心[J];計算機學報;2008年10期
5 唐常杰;劉威;溫粉蓮;喬少杰;;社會網絡分析和社團信息挖掘的三項探索——挖掘虛擬社團的結構、核心和通信行為[J];計算機應用;2006年09期
6 鄧維斌;洪智勇;;基于粗糙集的兩階段郵件過濾方法[J];計算機應用;2010年08期
7 熊金;劉悅;白碩;;基于結構的e-mail挖掘算法:EHITS[J];計算機應用研究;2008年04期
8 李瀟;羅軍勇;尹美娟;;基于郵件通聯關系的郵箱用戶權威別名評估[J];計算機應用與軟件;2011年04期
9 劉松彬;都云程;施水才;;基于分解轉移矩陣的PageRank迭代計算方法[J];中文信息學報;2007年05期
10 劉伍穎;王挺;;基于多過濾器集成學習的在線垃圾郵件過濾[J];中文信息學報;2008年01期
本文編號:2158943
本文鏈接:http://sikaile.net/guanlilunwen/ydhl/2158943.html
最近更新
教材專著