利用圖計(jì)算檢測(cè)大規(guī)模社交網(wǎng)絡(luò)虛假賬戶
本文選題:在線社交網(wǎng)絡(luò) 切入點(diǎn):網(wǎng)絡(luò)安全 出處:《吉林大學(xué)》2017年碩士論文 論文類型:學(xué)位論文
【摘要】:近年來,隨著在線社交網(wǎng)絡(luò)規(guī)模的不斷發(fā)展,攻擊者們?yōu)榱四踩±嫱ㄟ^虛假賬戶進(jìn)行惡意攻擊行為,包括傳播垃圾信息,竊取用戶隱私等,極大的威脅了社交網(wǎng)絡(luò)用戶的安全,阻礙了社交網(wǎng)絡(luò)的發(fā)展。因此,社交網(wǎng)絡(luò)虛假賬戶的檢測(cè)成為了網(wǎng)絡(luò)安全及信息安全的關(guān)鍵研究性問題之一。目前,關(guān)于虛假賬戶檢測(cè)的方法有很多,包括基于用戶行為特征,基于用戶信息內(nèi)容以及基于社交網(wǎng)絡(luò)圖結(jié)構(gòu)等三大類檢測(cè)方法。其中,基于圖結(jié)構(gòu)的檢測(cè)方法有著檢測(cè)效率高,攻擊者不能輕易模擬去躲避檢測(cè),特征抓取簡(jiǎn)單等優(yōu)點(diǎn)。但是隨著社交網(wǎng)絡(luò)規(guī)模的不斷擴(kuò)大,現(xiàn)有的復(fù)雜的檢測(cè)算法很難擴(kuò)展并應(yīng)用到實(shí)際的大規(guī)模的社交網(wǎng)絡(luò)檢測(cè)中。并且,傳統(tǒng)的大規(guī)模處理框架,如Map Reduce,很難去處理非結(jié)構(gòu)化的圖結(jié)構(gòu)數(shù)據(jù),因此,一些工作,開始使用Pregel/Giraph等,一類以點(diǎn)為中心的圖計(jì)算系統(tǒng),來實(shí)現(xiàn)檢測(cè)大規(guī)模社交網(wǎng)絡(luò)。但是目前的圖計(jì)算系統(tǒng)的設(shè)計(jì)都是過度通用化的,即它們都是被設(shè)計(jì)出來處理通用圖的,而不是針對(duì)具有特征的社交網(wǎng)絡(luò)圖,因此現(xiàn)有的系統(tǒng)在處理社交網(wǎng)絡(luò)圖時(shí)性能不佳。本文通過對(duì)社交網(wǎng)絡(luò)圖數(shù)據(jù)的研究,提出了一種針對(duì)社交網(wǎng)絡(luò)圖特征優(yōu)化的以點(diǎn)為中心的單機(jī)圖計(jì)算系統(tǒng)框架,并且分析現(xiàn)有的檢測(cè)算法,在我們的系統(tǒng)實(shí)現(xiàn)了2種不同類別的基于圖結(jié)構(gòu)的虛假賬戶檢測(cè)算法。具體工作如下:1.在系統(tǒng)層面,我們對(duì)目前的社交網(wǎng)絡(luò)圖進(jìn)行分析,考慮到目前社交網(wǎng)絡(luò)以及單機(jī)服務(wù)器的配置,參考利用核外計(jì)算技術(shù)來提高計(jì)算擴(kuò)展性的單機(jī)圖計(jì)算系統(tǒng)來設(shè)計(jì)檢測(cè)系統(tǒng)框架。2.在數(shù)據(jù)層面,考慮到社交網(wǎng)絡(luò)圖的冪率分布,將圖按照頂點(diǎn)度數(shù)分成2個(gè)不相交集合,稱為重集合和輕集合,對(duì)輕重集合使用不同的存儲(chǔ)格式,執(zhí)行模式,選擇性調(diào)度策略以及緩存策略來優(yōu)化圖系統(tǒng)。3.在算法層面,分析了目前現(xiàn)有的基于圖結(jié)構(gòu)的虛假賬戶檢測(cè)算法,將其中涉及的圖算法歸為兩類,一類是基于隨機(jī)游走的冪迭代算法,如Sybil Rank;一類是基于社區(qū)發(fā)現(xiàn)的遍歷型算法,如COLOR/COLOR+,本文提出d Sybil Rank和d COLOR算法,即將兩類算法轉(zhuǎn)化為以點(diǎn)為中心的并行迭代式圖算法,來提高原有檢測(cè)算法的效率。4.最后,我們?cè)谖覀兿到y(tǒng)上測(cè)試了以上兩類社交網(wǎng)絡(luò)檢測(cè)算法的性能。實(shí)驗(yàn)結(jié)果顯示,我們的系統(tǒng)展現(xiàn)了很好的性能,比如,我們?cè)趩螜C(jī)服務(wù)器上處理5千萬頂點(diǎn)的網(wǎng)絡(luò)需要459秒,相比于Sybil Rank處理1.6億頂點(diǎn)數(shù)據(jù)需要11臺(tái)m1.large集群同時(shí)處理33小時(shí)表現(xiàn)的很優(yōu)秀。同時(shí),我們也將我們的系統(tǒng)與現(xiàn)有的圖計(jì)算系統(tǒng)相比較,其性能在社交網(wǎng)絡(luò)圖上提高到1.14~5.91倍。
[Abstract]:In recent years, with the continuous development of the scale of online social network, the attackers, in order to gain profits, carry out malicious attacks through false accounts, including spreading spam information, stealing user privacy, etc. It greatly threatens the security of social network users and hinders the development of social network. Therefore, the detection of false accounts of social network has become one of the key research problems of network security and information security. There are many methods to detect false accounts, including three kinds of detection methods, which are based on user behavior characteristics, user information content and social network graph structure, among which, the detection method based on graph structure has high detection efficiency. Attackers can not easily simulate to avoid detection, feature capture simple and other advantages. But with the continuous expansion of the scale of social networks, the existing complex detection algorithms are difficult to extend and apply to the actual large-scale social network detection. Traditional large-scale processing framework, such as Map reduction, is very difficult to deal with unstructured graph structure data. Therefore, some work has begun to use Pregel/Giraph, a kind of point-centered graph computing system. To detect large-scale social networks. But the current graphic computing systems are designed to be overly generic, that is, they are designed to deal with universal graphs, not for characteristic social network diagrams. Therefore, the existing system has poor performance in dealing with the social network graph. Through the research of the social network graph data, this paper proposes a point-centered single-machine graph computing system framework for the social network graph feature optimization. And analyzing the existing detection algorithms, we implemented two different kinds of false account detection algorithms based on graph structure in our system. The specific work is as follows: 1. At the system level, we analyze the current social network graph. Considering the configuration of social network and single-machine server, the frame of the detection system is designed by using the technology of out-of-core computing to improve the expansibility of computing. 2. At the data level, considering the distribution of power rate of social network graph, The graph is divided into two disjoint sets according to the vertex degrees, which are called heavy set and light set. Different storage format, execution mode, selective scheduling strategy and cache policy are used to optimize graph system. In this paper, the existing algorithms of false account detection based on graph structure are analyzed. The graph algorithms are classified into two categories: one is power iteration algorithm based on random walk, such as Sybil rank, the other is traversal algorithm based on community discovery, such as COLOR/COLOR. In this paper, d Sybil Rank and d COLOR algorithms are proposed, which transform the two algorithms into point-centered parallel iterative graph algorithms to improve the efficiency of the original detection algorithms. Finally, We tested the performance of these two kinds of social network detection algorithms on our system. Experimental results show that our system shows good performance. For example, we need 459 seconds to handle the 50 million vertex network on a single server. Compared to Sybil Rank, it takes 11 m1.Large clusters to process 160 million vertex data and 33 hours of processing at the same time. At the same time, we have improved our system to 1.14591 times in social network compared with the existing graph computing system.
【學(xué)位授予單位】:吉林大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:TP393.08
【相似文獻(xiàn)】
相關(guān)期刊論文 前10條
1 Bruce Antelman;李雯;;社交網(wǎng)絡(luò)[J];高校圖書館工作;2008年01期
2 ;基于位置的手機(jī)社交網(wǎng)絡(luò)“貝多”正式發(fā)布[J];中國新通信;2008年06期
3 曹增輝;;社交網(wǎng)絡(luò)更偏向于用戶工具[J];信息網(wǎng)絡(luò);2009年11期
4 ;美國:印刷企業(yè)青睞社交網(wǎng)絡(luò)營銷新方式[J];中國包裝工業(yè);2010年Z1期
5 李智惠;柳承燁;;韓國移動(dòng)社交網(wǎng)絡(luò)服務(wù)的類型分析與促進(jìn)方案[J];現(xiàn)代傳播(中國傳媒大學(xué)學(xué)報(bào));2010年08期
6 賈富;;改變一切的社交網(wǎng)絡(luò)[J];互聯(lián)網(wǎng)天地;2011年04期
7 譚拯;;社交網(wǎng)絡(luò):連接與發(fā)現(xiàn)[J];廣東通信技術(shù);2011年07期
8 陳一舟;;社交網(wǎng)絡(luò)的發(fā)展趨勢(shì)[J];傳媒;2011年12期
9 殷樂;;全球社交網(wǎng)絡(luò)新態(tài)勢(shì)及文化影響[J];新聞與寫作;2012年01期
10 許麗;;社交網(wǎng)絡(luò):孤獨(dú)年代的集體狂歡[J];上海信息化;2012年09期
相關(guān)會(huì)議論文 前10條
1 趙云龍;李艷兵;;社交網(wǎng)絡(luò)用戶的人格預(yù)測(cè)與關(guān)系強(qiáng)度研究[A];第七屆(2012)中國管理學(xué)年會(huì)商務(wù)智能分會(huì)場(chǎng)論文集(選編)[C];2012年
2 宮廣宇;李開軍;;對(duì)社交網(wǎng)絡(luò)中信息傳播的分析和思考——以人人網(wǎng)為例[A];首屆華中地區(qū)新聞與傳播學(xué)科研究生學(xué)術(shù)論壇獲獎(jiǎng)?wù)撐腫C];2010年
3 楊子鵬;喬麗娟;王夢(mèng)思;楊雪迎;孟子冰;張禹;;社交網(wǎng)絡(luò)與大學(xué)生焦慮緩解[A];心理學(xué)與創(chuàng)新能力提升——第十六屆全國心理學(xué)學(xué)術(shù)會(huì)議論文集[C];2013年
4 畢雪梅;;體育虛擬社區(qū)中的體育社交網(wǎng)絡(luò)解析[A];第九屆全國體育科學(xué)大會(huì)論文摘要匯編(4)[C];2011年
5 杜p,
本文編號(hào):1571016
本文鏈接:http://sikaile.net/guanlilunwen/ydhl/1571016.html