天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于半監(jiān)督學(xué)習(xí)的微博謠言檢測(cè)研究

發(fā)布時(shí)間:2019-04-29 06:16
【摘要】:微博作為高科技信息化時(shí)代產(chǎn)物,在快速發(fā)展的同時(shí),隨之迅速蔓延的謠言信息也成為日益突出的問(wèn)題。謠言的自動(dòng)檢測(cè)研究作為社交網(wǎng)絡(luò)謠言研究、監(jiān)控、應(yīng)對(duì)和治理的前提,正逐漸受到關(guān)注,關(guān)于微博謠言識(shí)別的研究工作越來(lái)越多。國(guó)內(nèi)外學(xué)者對(duì)社交網(wǎng)絡(luò)和微博尤其是Twitter可信度作了大量的研究,主流研究實(shí)現(xiàn)的主要思路是從用戶特征、文本內(nèi)容特征、傳播特征等方面抽取信息特征,建立分類器來(lái)實(shí)現(xiàn)謠言檢測(cè)。然而采用傳統(tǒng)機(jī)器學(xué)習(xí)算法并不能有效解決微博謠言檢測(cè)中存在的數(shù)據(jù)標(biāo)注代價(jià)高昂和數(shù)據(jù)類別不平衡導(dǎo)致檢測(cè)準(zhǔn)確率低等問(wèn)題。本文以新浪微博為背景,以微博謠言為研究對(duì)象,在前人將檢測(cè)任務(wù)作為分類問(wèn)題求解的框架下,重點(diǎn)關(guān)注于解決傳統(tǒng)監(jiān)督學(xué)習(xí)算法數(shù)據(jù)標(biāo)注代價(jià)高昂的問(wèn)題,將半監(jiān)督學(xué)習(xí)算法引入微博謠言檢測(cè)中。同時(shí),針對(duì)微博中謠言數(shù)量遠(yuǎn)少于非謠言、準(zhǔn)確識(shí)別謠言比識(shí)別非謠言價(jià)值更高的事實(shí),將微博謠言檢測(cè)定義為一個(gè)不平衡數(shù)據(jù)的二分類問(wèn)題。綜合上述因素,提出一種針對(duì)不平衡數(shù)據(jù)集的半監(jiān)督學(xué)習(xí)算法,用于謠言檢測(cè)的分類任務(wù)中。本文的工作主要體現(xiàn)在如下兩個(gè)方面。首先,圍繞不平衡數(shù)據(jù)集分類,提出一種基于Co-Forest算法針對(duì)不平衡數(shù)據(jù)集的改進(jìn)方法——ImCo-Forest算法(semi-supervised learning algorithm from imbalanced data based on Co-Forest),利用SMOTE算法和分層抽樣平衡數(shù)據(jù)分布,并通過(guò)引入代價(jià)敏感的加權(quán)投票法來(lái)提高對(duì)未標(biāo)記樣本預(yù)測(cè)的正確率。為驗(yàn)證算法的有效性,在10組UCI測(cè)試數(shù)據(jù)上進(jìn)行了實(shí)驗(yàn)比較。其次,在研究不平衡數(shù)據(jù)集分類問(wèn)題的基礎(chǔ)上,將不平衡數(shù)據(jù)集分類的機(jī)器學(xué)習(xí)方法引入微博謠言檢測(cè)領(lǐng)域,并給出一個(gè)微博謠言檢測(cè)的流程圖。文章最后,通過(guò)2組微博謠言的實(shí)證實(shí)驗(yàn)證明了所提方法的有效性和優(yōu)越性。通過(guò)在新浪微博平臺(tái)上抽取的數(shù)據(jù)進(jìn)行實(shí)驗(yàn),表明論文提出的方法能有效解決微博謠言檢測(cè)中存在的數(shù)據(jù)標(biāo)注代價(jià)高昂和數(shù)據(jù)類別不平衡導(dǎo)致檢測(cè)準(zhǔn)確率低等問(wèn)題,適用于海量微博數(shù)據(jù)的分析和謠言檢測(cè)。
[Abstract]:As a product of the high-tech information age, Weibo is developing rapidly, and the rumor information has become an increasingly prominent problem along with the rapid spread of rumor information. As the premise of social network rumor research, monitoring, response and governance, the research on automatic detection of rumors is getting more and more attention. The research on Weibo rumor recognition is more and more. Scholars at home and abroad have done a lot of research on social networks and Weibo, especially on the credibility of Twitter. The main idea of mainstream research is to extract information features from the aspects of user characteristics, text content features, communication features, and so on. A classifier is established to detect rumors. However, the traditional machine learning algorithm can not effectively solve the problems such as high cost of data tagging and imbalance of data categories in Weibo rumor detection, which lead to low detection accuracy. Taking Sina Weibo as the background and Weibo rumor as the research object, this paper focuses on solving the expensive problem of traditional supervised learning algorithm data tagging, under the framework of the forefathers taking the detection task as the classification problem solving, and focusing on solving the problem of high cost of traditional supervised learning algorithm data tagging. Semi-supervised learning algorithm is introduced into Weibo rumor detection. At the same time, in view of the fact that the number of rumors in Weibo is far less than that of non-rumors, accurate identification of rumors is more valuable than recognition of non-rumors, and Weibo rumor detection is defined as a binary classification problem of unbalanced data. Based on the above factors, a semi-supervised learning algorithm for unbalanced data sets is proposed, which can be used in the classification of rumor detection. The work of this paper is mainly reflected in the following two aspects. Firstly, based on the classification of unbalanced datasets, an improved Co-Forest algorithm-ImCo-Forest algorithm (semi-supervised learning algorithm from imbalanced data based on Co-Forest) is proposed for unbalanced datasets. The SMOTE algorithm and stratified sampling are used to balance the data distribution, and the cost-sensitive weighted voting method is introduced to improve the accuracy of unlabeled samples prediction. In order to verify the effectiveness of the algorithm, 10 groups of UCI test data were compared by experiments. Secondly, on the basis of studying the problem of unbalanced dataset classification, the machine learning method of unbalanced dataset classification is introduced into the field of Weibo rumor detection, and a flowchart of Weibo rumor detection is given. At the end of the paper, the validity and superiority of the proposed method are proved by two groups of Weibo rumors empirical experiments. The experimental results on Sina Weibo show that the method proposed in this paper can effectively solve the problems of high cost of data tagging and low detection accuracy caused by unbalanced data categories in the detection of Weibo rumors, and the results show that the proposed method can effectively solve the problems of high cost of data tagging and imbalance of data categories. It is suitable for mass Weibo data analysis and rumor detection.
【學(xué)位授予單位】:山東大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2015
【分類號(hào)】:TP393.092

【參考文獻(xiàn)】

相關(guān)期刊論文 前1條

1 葉志飛;文益民;呂寶糧;;不平衡分類問(wèn)題研究綜述[J];智能系統(tǒng)學(xué)報(bào);2009年02期

相關(guān)碩士學(xué)位論文 前1條

1 朱慧鑫;微博謠言的傳播模式及傳播流程研究[D];山東大學(xué);2013年



本文編號(hào):2468003

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/guanlilunwen/ydhl/2468003.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶0b5ac***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com
国产精品欧美一区二区三区不卡| 国产又粗又猛又黄又爽视频免费| 99久久精品免费精品国产| 激情偷拍一区二区三区视频| 中文字字幕在线中文乱码二区| 精品丝袜一区二区三区性色| 老司机精品一区二区三区| 精品欧美国产一二三区| 九九热这里只有免费精品| 绝望的校花花间淫事2| 国产午夜福利片在线观看| 日韩不卡一区二区三区色图| 亚洲国产欧美精品久久| 亚洲欧美日韩中文字幕二欧美| 91人妻丝袜一区二区三区| 偷自拍亚洲欧美一区二页| 欧美一区二区日韩一区二区| 一本久道久久综合中文字幕| 亚洲精品福利视频你懂的| 欧美中文字幕一区在线| 欧洲日韩精品一区二区三区| 男女午夜视频在线观看免费| 麻豆剧果冻传媒一二三区| 午夜福利直播在线视频| 99久久精品视频一区二区| 国产精品熟女乱色一区二区| 激情视频在线视频在线视频 | 黑鬼糟蹋少妇资源在线观看| 美女黄片大全在线观看| 欧美日韩一区二区综合| 日本丁香婷婷欧美激情| 日韩无套内射免费精品| 欧美午夜国产在线观看| 综合久综合久综合久久| 91后入中出内射在线| 91人妻久久精品一区二区三区| 欧美日韩国产的另类视频| 高潮少妇高潮久久精品99| 亚洲欧美日本视频一区二区| 亚洲美女国产精品久久| 男人大臿蕉香蕉大视频|