天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于人類動力學(xué)的評論垃圾識別方法研究

發(fā)布時(shí)間:2018-05-09 19:28

  本文選題:評論垃圾識別 + 垃圾用戶識別; 參考:《西南石油大學(xué)》2017年碩士論文


【摘要】:隨著電子商務(wù),移動互聯(lián)網(wǎng)和在線社交媒體平臺的不斷涌現(xiàn),人們可以通過互聯(lián)網(wǎng)進(jìn)行購物,交友,娛樂,互聯(lián)網(wǎng)成為了大眾生活中密不可分的一部分。這些平臺的評論功能讓用戶在自由表達(dá)自己觀點(diǎn)的同時(shí),逐步的從最初單純的網(wǎng)絡(luò)信息獲取者變成了網(wǎng)絡(luò)信息的貢獻(xiàn)者,也讓用戶生成內(nèi)容充斥網(wǎng)絡(luò)世界。隱藏在這些內(nèi)容中的垃圾信息嚴(yán)重的影響著人們的日常生活。如何讓計(jì)算機(jī)自動高效的從這些龐大的信息中識別出垃圾內(nèi)容以及垃圾內(nèi)容的產(chǎn)生者是一項(xiàng)非常具有挑戰(zhàn)性的課題,也是文本挖掘和自然語言處理領(lǐng)域的熱點(diǎn)問題之一。基于現(xiàn)有的研究工作以及互聯(lián)網(wǎng)輿情分析的需求,本文以網(wǎng)易新聞門戶網(wǎng)站的新聞評論以及用戶數(shù)據(jù)為研究對象,提出了基于人類動力學(xué)思想的評論垃圾識別方法。在方法研究過程中,本文從評論發(fā)布者和評論兩個角度出發(fā),分別提取了用于模型構(gòu)建的樣本特征空間。在提取評論發(fā)布者特征時(shí),首先分析了網(wǎng)站垃圾用戶與正常用戶的行為規(guī)律特點(diǎn),根據(jù)分析結(jié)果對評論發(fā)布者的個人行為規(guī)律進(jìn)行了統(tǒng)計(jì)計(jì)算,包括用戶的基礎(chǔ)行為數(shù)據(jù)如評論,回復(fù),收藏與訂閱總數(shù),日均評論數(shù)等;以及用戶的評論發(fā)布行為規(guī)律,如評論發(fā)布的時(shí)間間隔均值,方差等。此外,本文對評論者的四種交互行為:回復(fù)、關(guān)注、評論同一新聞和發(fā)布相似評論進(jìn)行了建模分析,并根據(jù)建立的網(wǎng)絡(luò)模型,采用六種網(wǎng)絡(luò)拓?fù)涮卣饔?jì)算方法提取評論者的交互特征。最后本文計(jì)算了評論文本的IV值,結(jié)合評論相關(guān)屬性構(gòu)建了評論的特征空間。基于構(gòu)建的評論者以及評論特征空間,本文設(shè)計(jì)了四組實(shí)驗(yàn),采用GBDT和SVM機(jī)器學(xué)習(xí)算法對不同的特征子集進(jìn)行了模型訓(xùn)練,并對比分析最終的實(shí)驗(yàn)結(jié)果,得出了評論垃圾識別方法的最優(yōu)特征子集。實(shí)驗(yàn)結(jié)果充分的證明了,基于人類動力學(xué)行為規(guī)律的方法能夠?qū)W(wǎng)絡(luò)平臺中存在的垃圾用戶進(jìn)行有效的識別,尤其在識別機(jī)器行為的垃圾用戶上具有較高準(zhǔn)確率。此外,加入用戶行為特征的評論垃圾識別模型在評論垃圾識別的精確率和召回率上都有明顯提升。
[Abstract]:With the emergence of e-commerce, mobile Internet and online social media platforms, people can purchase, make friends, entertainment and the Internet through the Internet has become an inseparable part of public life. The comment function of these platforms allows users to express their views freely, at the same time, gradually from the original simple network information acquirers to the network information contributors, but also allows users to generate content flooding the network world. The junk information hidden in these contents seriously affects people's daily life. How to make the computer automatically and efficiently identify the garbage content and its generator from these huge information is a very challenging issue, and it is also one of the hot issues in the field of text mining and natural language processing. Based on the existing research work and the demand of Internet public opinion analysis, this paper takes the news comments and user data of NetEase news portal as the research object, and puts forward a comment garbage recognition method based on human dynamics. In the process of research, this paper extracts the sample feature space for model construction from the point of view of comment publisher and comment. When extracting the characteristics of comment publisher, firstly, it analyzes the behavior characteristics of spam users and normal users, and calculates the individual behavior rules of comment publishers according to the analysis results. It includes the basic behavior data of users such as comments, replies, collections and subscriptions, daily average comments, etc., as well as the behavior rules of users' comment publishing, such as the average time interval and variance of comments published. In addition, this paper models and analyzes four kinds of interactive behaviors of reviewers: reply, attention, comment on the same news and publish similar comments, and according to the established network model, Six computing methods of network topology feature are used to extract the interactive features of commenters. Finally, the IV value of the comment text is calculated, and the comment feature space is constructed by combining the comment related attributes. Based on the constructed reviewer and comment feature space, this paper designs four groups of experiments, uses GBDT and SVM machine learning algorithms to train different feature subsets, and compares and analyzes the final experimental results. The optimal feature subset of the comment garbage recognition method is obtained. The experimental results fully prove that the method based on the human dynamics behavior law can effectively identify the garbage users in the network platform, especially for the garbage users who recognize the behavior of the machine. In addition, the accuracy and recall rate of comment garbage recognition model with user behavior feature are improved obviously.
【學(xué)位授予單位】:西南石油大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:TP391.1

【參考文獻(xiàn)】

相關(guān)期刊論文 前3條

1 周志華;;《機(jī)器學(xué)習(xí)》[J];中國民商;2016年03期

2 樊超;郭進(jìn)利;韓筱璞;汪秉宏;;人類行為動力學(xué)研究綜述[J];復(fù)雜系統(tǒng)與復(fù)雜性科學(xué);2011年02期

3 何海江;;一種適應(yīng)短文本的相關(guān)測度及其應(yīng)用[J];計(jì)算機(jī)工程;2009年06期



本文編號:1867121

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/jingjilunwen/dianzishangwulunwen/1867121.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶f751c***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com
亚洲天堂久久精品成人| 亚洲精品偷拍一区二区三区| 国产成人免费激情视频| 狠狠干狠狠操亚洲综合| 成人精品日韩专区在线观看 | 久久精品国产99精品亚洲| 亚洲中文字幕在线观看黑人| 日韩在线一区中文字幕| 老司机精品视频在线免费看| 国产一区二区三区成人精品| 色综合久久中文综合网| 欧美精品日韩精品一区| 精品偷拍一区二区三区| 黄片在线观看一区二区三区| 国产精品久久精品毛片| 国产一级特黄在线观看| 91欧美日韩一区人妻少妇| 日韩中文字幕在线不卡一区| 国产精品免费自拍视频| 成人区人妻精品一区二区三区 | 狠狠干狠狠操亚洲综合| 国产亚洲神马午夜福利| 91精品国产综合久久精品| 欧美黑人精品一区二区在线| 不卡一区二区在线视频| 出差被公高潮久久中文字幕| 国产一区二区三区av在线| 中文字字幕在线中文乱码二区| 久久精品中文字幕人妻中文| 亚洲日本韩国一区二区三区| 九九热最新视频免费观看| 亚洲日本加勒比在线播放| 国产午夜福利片在线观看| 亚洲国产性生活高潮免费视频| 久久午夜福利精品日韩| 亚洲欧美黑人一区二区| 亚洲专区一区中文字幕| 在线观看欧美视频一区| 成年人视频日本大香蕉久久| 日韩人妻免费视频一专区| 日韩中文字幕狠狠人妻|