基于聚類分析的微博廣告發(fā)布者識(shí)別
發(fā)布時(shí)間:2018-10-14 18:40
【摘要】:微博空間存在大量的廣告內(nèi)容,這些信息嚴(yán)重影響著普通用戶的用戶體驗(yàn)和相關(guān)的研究工作。現(xiàn)有研究多使用支持向量機(jī)(SVM)或隨機(jī)森林等分類算法對(duì)廣告微博進(jìn)行處理,然而分類方法中人工標(biāo)注大數(shù)據(jù)量訓(xùn)練集存在困難,因此提出基于聚類分析的微博廣告發(fā)布者識(shí)別方法:對(duì)于用戶維度,針對(duì)微博廣告發(fā)布者通過發(fā)布大量普通微博來稀釋其廣告內(nèi)容的現(xiàn)象,提出核心微博的概念,通過提取核心微博主題及其對(duì)應(yīng)的微博序列,計(jì)算用戶特征和對(duì)應(yīng)微博的文本特征,并使用聚類算法對(duì)特征進(jìn)行聚類,從而識(shí)別微博廣告發(fā)布者。實(shí)驗(yàn)結(jié)果顯示,所提方法準(zhǔn)確率為92%,召回率為97%,F值為95%,證明所提方法在廣告內(nèi)容被人為稀釋的情況下能準(zhǔn)確地識(shí)別微博廣告發(fā)布者,可以為微博垃圾信息識(shí)別、清理等工作提供理論支持和實(shí)用方法。
[Abstract]:Weibo space has a large amount of advertising content, which seriously affects the user experience and related research work of ordinary users. In recent studies, support vector machine (SVM) (SVM) or random forest classification algorithms are often used to deal with advertising Weibo. However, it is difficult to manually annotate large amount of data training set in classification methods. Therefore, this paper puts forward a method of identifying Weibo advertisement publishers based on cluster analysis: for the user dimension, aiming at the phenomenon that a large number of ordinary Weibo advertisers dilute their advertising content by publishing a large number of ordinary Weibo, this paper puts forward the concept of the core Weibo. By extracting the core Weibo theme and its corresponding Weibo sequence, the user features and the corresponding text features are calculated, and then the features are clustered by clustering algorithm, so as to identify the advertiser. The experimental results show that the accuracy of the proposed method is 92 and the recall rate is 97 and F is 95. It is proved that the proposed method can accurately identify the advertisement publisher Weibo under the condition that the advertising content is artificially diluted, and can identify the spam information for Weibo. Cleaning work provides theoretical support and practical methods.
【作者單位】: 南京大學(xué)軟件學(xué)院
【基金】:江蘇省產(chǎn)學(xué)研前瞻性聯(lián)合研究項(xiàng)目(BY2015069-03)~~
【分類號(hào)】:TP391.1
,
本文編號(hào):2271296
[Abstract]:Weibo space has a large amount of advertising content, which seriously affects the user experience and related research work of ordinary users. In recent studies, support vector machine (SVM) (SVM) or random forest classification algorithms are often used to deal with advertising Weibo. However, it is difficult to manually annotate large amount of data training set in classification methods. Therefore, this paper puts forward a method of identifying Weibo advertisement publishers based on cluster analysis: for the user dimension, aiming at the phenomenon that a large number of ordinary Weibo advertisers dilute their advertising content by publishing a large number of ordinary Weibo, this paper puts forward the concept of the core Weibo. By extracting the core Weibo theme and its corresponding Weibo sequence, the user features and the corresponding text features are calculated, and then the features are clustered by clustering algorithm, so as to identify the advertiser. The experimental results show that the accuracy of the proposed method is 92 and the recall rate is 97 and F is 95. It is proved that the proposed method can accurately identify the advertisement publisher Weibo under the condition that the advertising content is artificially diluted, and can identify the spam information for Weibo. Cleaning work provides theoretical support and practical methods.
【作者單位】: 南京大學(xué)軟件學(xué)院
【基金】:江蘇省產(chǎn)學(xué)研前瞻性聯(lián)合研究項(xiàng)目(BY2015069-03)~~
【分類號(hào)】:TP391.1
,
本文編號(hào):2271296
本文鏈接:http://sikaile.net/wenyilunwen/guanggaoshejilunwen/2271296.html
最近更新
教材專著