基于規(guī)則和統(tǒng)計的網(wǎng)絡(luò)不良信息識別研究
[Abstract]:The rapid development of the Internet has brought great and profound influence to the society and people's life. As a carrier of information dissemination, Internet has unparalleled advantages compared with traditional paper media. It provides a high quality platform for information dissemination in different fields such as politics, economy, culture and so on. It also creates a new way for people to communicate with each other. Internet brings convenience to people's life, but also brings some negative effects. In the virtual network environment, every user is transformed into a string of virtual symbols. The information and comments issued by the users through personal web pages, Weibo, WeChat public numbers, forums, etc., are all uncertain. Even though many platforms take certain measures of prior vetting and filtering after the event, there are still some people with hidden identities, moral awareness, and poor cultural attainment, making a large number of false, pornographic, politically sensitive, and swindling types. Superstition and other information are filled with Internet corner, corrupt social atmosphere, demagoguery, and cause great damage to people's physical and mental health. As a kind of network social media with a large number of users, Weibo is a platform for sharing, disseminating and obtaining information based on user relations. The information posted by users can be pushed to fans through clients or platforms in a timely manner, thus realizing real time. Quick dissemination of information. At the same time, Weibo fans can interact with the blogger by publishing comments, or can transmit, comment, collect and other operations, achieve information sharing, dissemination, expand the scope of information dissemination, enhance the influence of information. Weibo's this characteristic also led to Weibo to become the hiding place of bad information at the same time. Therefore, Weibo has become the object of many scholars. In order to purify the network environment, keep minors away from the violation of bad information and provide Internet users with good search experience, it is necessary to control the publication and dissemination of these bad information and take appropriate measures and means to strengthen supervision and management. Therefore, the purpose of this paper is to identify the bad information in the network, combined with the existing Chinese text mining technology to carry out experimental research. The crawler program collects Weibo users to comment and forward the text of a particular Weibo, and gets the original data. The original data are removed independent symbols, word segmentation, dependency tagging, word frequency statistics and so on, and the text feature set is extracted by using the obtained data. In order to improve the accuracy of word segmentation, this paper designs a bad thesaurus, which includes the basic word list, the synonym table, the abbreviated lexicon and the dependency table of the words. The feature extraction algorithm based on statistics is combined with dependency analysis to extract text features effectively, and a text classification model is implemented by using naive Bayes algorithm. Furthermore, the model is applied to the classification of user comments in Weibo, and the classifier is tested by experiments. Compared with the improved model, the classification accuracy and recall rate are obviously improved. Finally, this paper summarizes the research, puts forward the innovation and shortcomings of this paper, and continues to improve in the follow-up research process.
【學(xué)位授予單位】:華中師范大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:TP391.1;TP393.092
【相似文獻】
相關(guān)期刊論文 前10條
1 科卞;信號細(xì)微特征提取分析技術(shù)[J];電子科技大學(xué)學(xué)報;2000年02期
2 馬少華,高峰,李敏,吳成東;神經(jīng)網(wǎng)絡(luò)分類器的特征提取和優(yōu)選[J];基礎(chǔ)自動化;2000年06期
3 管聰慧,宣國榮;多類問題中的特征提取[J];計算機工程;2002年01期
4 胡威;李建華;陳波;;入侵檢測建模過程中特征提取最優(yōu)化評估[J];計算機工程;2006年12期
5 朱玉蓮;陳松燦;趙國安;;推廣的矩陣模式特征提取方法及其在人臉識別中的應(yīng)用[J];小型微型計算機系統(tǒng);2007年04期
6 趙振勇;王保華;王力;崔磊;;人臉圖像的特征提取[J];計算機技術(shù)與發(fā)展;2007年05期
7 馮海亮;王麗;李見為;;一種新的用于人臉識別的特征提取方法[J];計算機科學(xué);2009年06期
8 朱笑榮;楊德運;;基于入侵檢測的特征提取方法[J];計算機應(yīng)用與軟件;2010年06期
9 王菲;白潔;;一種基于非線性特征提取的被動聲納目標(biāo)識別方法研究[J];軟件導(dǎo)刊;2010年05期
10 陳偉;瞿曉;葛丁飛;;主觀引導(dǎo)特征提取法在光譜識別中的應(yīng)用[J];科技通報;2011年04期
相關(guān)會議論文 前10條
1 尚修剛;蔣慰孫;;模糊特征提取新算法[A];1997中國控制與決策學(xué)術(shù)年會論文集[C];1997年
2 潘榮江;孟祥旭;楊承磊;王銳;;旋轉(zhuǎn)體的幾何特征提取方法[A];第一屆建立和諧人機環(huán)境聯(lián)合學(xué)術(shù)會議(HHME2005)論文集[C];2005年
3 薛燕;李建良;朱學(xué)芳;;人臉識別中特征提取的一種改進方法[A];第十三屆全國圖象圖形學(xué)學(xué)術(shù)會議論文集[C];2006年
4 杜栓平;曹正良;;時間—頻率域特征提取及其應(yīng)用[A];2005年全國水聲學(xué)學(xué)術(shù)會議論文集[C];2005年
5 黃先鋒;韓傳久;陳旭;周劍軍;;運動目標(biāo)的分割與特征提取[A];全國第二屆信號處理與應(yīng)用學(xué)術(shù)會議專刊[C];2008年
6 魏明果;;方言比較的特征提取與矩陣分析[A];2009系統(tǒng)仿真技術(shù)及其應(yīng)用學(xué)術(shù)會議論文集[C];2009年
7 林土勝;賴聲禮;;視網(wǎng)膜血管特征提取的拆支跟蹤法[A];1999年中國神經(jīng)網(wǎng)絡(luò)與信號處理學(xué)術(shù)會議論文集[C];1999年
8 秦建玲;李軍;;基于核的主成分分析的特征提取方法與樣本篩選[A];2005年中國機械工程學(xué)會年會論文集[C];2005年
9 劉紅;陳光,
本文編號:2382415
本文鏈接:http://sikaile.net/guanlilunwen/ydhl/2382415.html