社交網(wǎng)絡(luò)下的垃圾信息過(guò)濾技術(shù)的研究
發(fā)布時(shí)間:2018-09-18 19:48
【摘要】:隨著Web2.0技術(shù)的發(fā)展,社交媒體已經(jīng)成為最大最活躍的社交平臺(tái),為數(shù)億用戶(hù)提供優(yōu)質(zhì)且良好的溝通渠道。然而,當(dāng)人們?cè)谏缃痪W(wǎng)絡(luò)分享、交流、互動(dòng)的同時(shí),垃圾信息卻也不斷的膨脹。當(dāng)前迫切需要一種技術(shù)來(lái)凈化網(wǎng)絡(luò)空間,營(yíng)造健康的社交生態(tài)體系,因此,社交網(wǎng)絡(luò)下的垃圾信息過(guò)濾技術(shù)已經(jīng)成為研究者們普遍關(guān)注的熱點(diǎn)問(wèn)題。 基于機(jī)器學(xué)習(xí)的分類(lèi)技術(shù)廣泛應(yīng)用在社交平臺(tái)垃圾信息過(guò)濾上,機(jī)器學(xué)習(xí)方法具有準(zhǔn)確率高、成本低等特點(diǎn)。本文以新浪微博為研究對(duì)象,主要的研究?jī)?nèi)容分為如下幾個(gè)部分: 首先,從社交網(wǎng)絡(luò)服務(wù)的信息傳播形態(tài)出發(fā),,分析了微博平臺(tái)中垃圾信息的傳播規(guī)律,設(shè)計(jì)了基于機(jī)器學(xué)習(xí)方法的過(guò)濾技術(shù)來(lái)識(shí)別新浪微博網(wǎng)絡(luò)中的可疑賬戶(hù),并實(shí)現(xiàn)了基于邏輯回歸、支持向量機(jī)、隨機(jī)森林三種模型的垃圾信息過(guò)濾系統(tǒng)。 其次,從微博賬戶(hù)中提取多種具有區(qū)分性的特征,應(yīng)用機(jī)器學(xué)習(xí)模型進(jìn)行分類(lèi)過(guò)濾。從用戶(hù)行為以及內(nèi)容行為兩方面提取垃圾微博的屬性特征,使用社交網(wǎng)絡(luò)關(guān)系圖來(lái)分析微博平臺(tái)的數(shù)據(jù)流動(dòng)和傳播規(guī)律。以微博中的消息為主體構(gòu)建信息傳播圖來(lái)描述用戶(hù)之間的親密度。最后通過(guò)數(shù)據(jù)分析以及實(shí)驗(yàn)測(cè)試評(píng)價(jià)了整個(gè)過(guò)濾系統(tǒng)的性能。 再次,從系統(tǒng)的實(shí)際應(yīng)用角度出發(fā),提出采用在線主動(dòng)學(xué)習(xí)的方法過(guò)濾垃圾微博,主動(dòng)學(xué)習(xí)方法不僅能夠減少系統(tǒng)對(duì)于標(biāo)注數(shù)據(jù)的需求量,降低系統(tǒng)時(shí)間復(fù)雜度,同時(shí)也能保證良好的過(guò)濾性能。 最后,垃圾信息制造者非常熱衷于劫持正常用戶(hù)的賬號(hào)給人刷粉、幫人轉(zhuǎn)發(fā)。本文提出基于序貫概率比檢驗(yàn)的方法來(lái)檢測(cè)僵尸賬號(hào),僵尸賬號(hào)檢測(cè)系統(tǒng)能夠有效檢測(cè)出社會(huì)網(wǎng)絡(luò)中的僵尸賬號(hào)。
[Abstract]:With the development of Web2.0 technology, social media has become the largest and most active social platform, providing hundreds of millions of users with excellent and good communication channels. However, when people share, communicate and interact on social networks, spam is expanding. At present, we urgently need a kind of technology to purify the cyberspace and build a healthy social ecosystem. Therefore, the spam filtering technology under the social network has become a hot issue that researchers pay attention to. The classification technology based on machine learning is widely used in social platform spam filtering. Machine learning method has the characteristics of high accuracy and low cost. This paper takes Sina Weibo as the research object, the main research content is divided into the following several parts: first, from the social network service information dissemination pattern, has analyzed the garbage information dissemination rule in the Weibo platform, The filtering technology based on machine learning method is designed to identify suspicious accounts in Sina Weibo network, and a garbage information filtering system based on logical regression, support vector machine and random forest model is implemented. Secondly, a number of distinguishing features are extracted from Weibo account and classified by machine learning model. This paper extracts the attributes of spam Weibo from user behavior and content behavior, and analyzes the data flow and propagation rules of the Weibo platform by using the social network relationship graph. Taking the messages in Weibo as the main body, the information transmission graph is constructed to describe the user affinity. Finally, the performance of the whole filter system is evaluated by data analysis and experimental test. Thirdly, from the point of view of the practical application of the system, an online active learning method is proposed to filter garbage Weibo. The active learning method can not only reduce the demand for annotated data, but also reduce the time complexity of the system. At the same time, it can ensure good filtration performance. Finally, spammers are keen to hijack a normal user's account and feed it. This paper proposes a method based on sequential probability ratio test to detect zombie accounts. Zombie account detection system can effectively detect zombie accounts in social networks.
【學(xué)位授予單位】:哈爾濱理工大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2014
【分類(lèi)號(hào)】:TP393.09;TP181
本文編號(hào):2248943
[Abstract]:With the development of Web2.0 technology, social media has become the largest and most active social platform, providing hundreds of millions of users with excellent and good communication channels. However, when people share, communicate and interact on social networks, spam is expanding. At present, we urgently need a kind of technology to purify the cyberspace and build a healthy social ecosystem. Therefore, the spam filtering technology under the social network has become a hot issue that researchers pay attention to. The classification technology based on machine learning is widely used in social platform spam filtering. Machine learning method has the characteristics of high accuracy and low cost. This paper takes Sina Weibo as the research object, the main research content is divided into the following several parts: first, from the social network service information dissemination pattern, has analyzed the garbage information dissemination rule in the Weibo platform, The filtering technology based on machine learning method is designed to identify suspicious accounts in Sina Weibo network, and a garbage information filtering system based on logical regression, support vector machine and random forest model is implemented. Secondly, a number of distinguishing features are extracted from Weibo account and classified by machine learning model. This paper extracts the attributes of spam Weibo from user behavior and content behavior, and analyzes the data flow and propagation rules of the Weibo platform by using the social network relationship graph. Taking the messages in Weibo as the main body, the information transmission graph is constructed to describe the user affinity. Finally, the performance of the whole filter system is evaluated by data analysis and experimental test. Thirdly, from the point of view of the practical application of the system, an online active learning method is proposed to filter garbage Weibo. The active learning method can not only reduce the demand for annotated data, but also reduce the time complexity of the system. At the same time, it can ensure good filtration performance. Finally, spammers are keen to hijack a normal user's account and feed it. This paper proposes a method based on sequential probability ratio test to detect zombie accounts. Zombie account detection system can effectively detect zombie accounts in social networks.
【學(xué)位授予單位】:哈爾濱理工大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2014
【分類(lèi)號(hào)】:TP393.09;TP181
【引證文獻(xiàn)】
相關(guān)碩士學(xué)位論文 前2條
1 王文君;移動(dòng)社交網(wǎng)絡(luò)信息過(guò)濾及推薦系統(tǒng)研究[D];南京郵電大學(xué);2015年
2 黃興鳳;在線社會(huì)網(wǎng)絡(luò)下的垃圾信息過(guò)濾技術(shù)的研究[D];上海師范大學(xué);2015年
本文編號(hào):2248943
本文鏈接:http://sikaile.net/guanlilunwen/ydhl/2248943.html
最近更新
教材專(zhuān)著