天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于規(guī)則和統(tǒng)計的網(wǎng)絡(luò)不良信息識別研究

發(fā)布時間:2018-12-16 13:21
【摘要】:互聯(lián)網(wǎng)的高速發(fā)展,給社會和人們的生活帶來了巨大而深遠(yuǎn)的影響;ヂ(lián)網(wǎng)作為信息傳播的載體,與傳統(tǒng)的紙媒相比具有無法比擬的優(yōu)越性,為不同領(lǐng)域如政治、經(jīng)濟、文化等的信息傳播提供了優(yōu)質(zhì)的平臺,也為人與人之間的交流創(chuàng)建了一種新的途徑;ヂ(lián)網(wǎng)給人們生活帶來便利的同時,也帶來一些負(fù)面的效應(yīng)。虛擬的網(wǎng)絡(luò)環(huán)境中,每一個用戶都被轉(zhuǎn)化為一串虛擬的符號,用戶通過個人網(wǎng)頁、微博、微信公眾號、論壇等形式的網(wǎng)絡(luò)媒體發(fā)布的信息、言論等,都具有一定的不確定性,即使許多平臺采取一定的事前審核、事后過濾措施,但仍然有某些身份隱蔽、道德意識、文化素養(yǎng)較差的人存在,使得大量虛假的、色情類、政治敏感類、詐騙類、迷信類等信息充斥網(wǎng)絡(luò)的角角落落,敗壞社會風(fēng)氣,蠱惑人心,給人們的身心健康造成極大的損害。作為一種用戶量巨大的網(wǎng)絡(luò)社交媒體,微博是一種基于用戶關(guān)系的信息分享、傳播、獲取的平臺,用戶發(fā)布的微博消息可以通過客戶端或者平臺及時推送給粉絲,實現(xiàn)了實時、快捷的信息傳播。同時微博粉絲也可以通過發(fā)表評論與博主進行互動,或者可以進行轉(zhuǎn)發(fā)、評論、收藏等操作,實現(xiàn)信息分享、傳播,擴大信息傳播的范圍,增強信息的影響力。微博的這個特點同時也導(dǎo)致了微博成為不良信息的藏身之地。因此微博已經(jīng)成為許多學(xué)者研究的對象。為了凈化網(wǎng)絡(luò)環(huán)境,讓未成年人遠(yuǎn)離不良信息的侵害,給互聯(lián)網(wǎng)用戶提供良好的搜索體驗,有必要控制這些不良信息的發(fā)布和傳播,采取相應(yīng)的措施和手段加強監(jiān)督和管理。為此,本文以網(wǎng)絡(luò)中不良信息的識別為目的,結(jié)合已有的中文文本挖掘技術(shù)來進行實驗研究。通過爬蟲程序采集微博用戶針對特定微博正文進行評論和轉(zhuǎn)發(fā)內(nèi)容,得到原始數(shù)據(jù)。并對原始數(shù)據(jù)進行去除無關(guān)的符號、分詞處理、依存關(guān)系標(biāo)注、詞頻統(tǒng)計等操作,并利用得到的數(shù)據(jù)來提取文本的特征集。為了提高分詞的準(zhǔn)確性,本文設(shè)計了不良詞庫,其中包含不良詞語本身對應(yīng)的基本詞表、近義詞表、縮寫詞表、詞語之間的依存關(guān)系表;將基于統(tǒng)計的特征提取算法與依存關(guān)系分析相結(jié)合,有效提取文本特征,并使用樸素貝葉斯算法實現(xiàn)了文本分類模型。進一步將該模型應(yīng)用于微博中用戶評論的分類處理,通過實驗對分類器進行測試,與改進前相比,分類的準(zhǔn)確率和召回率有明顯的提高。最后針對本文的研究做出總結(jié),提出本文的創(chuàng)新點和不足之處,并在后續(xù)的研究過程繼續(xù)完善。
[Abstract]:The rapid development of the Internet has brought great and profound influence to the society and people's life. As a carrier of information dissemination, Internet has unparalleled advantages compared with traditional paper media. It provides a high quality platform for information dissemination in different fields such as politics, economy, culture and so on. It also creates a new way for people to communicate with each other. Internet brings convenience to people's life, but also brings some negative effects. In the virtual network environment, every user is transformed into a string of virtual symbols. The information and comments issued by the users through personal web pages, Weibo, WeChat public numbers, forums, etc., are all uncertain. Even though many platforms take certain measures of prior vetting and filtering after the event, there are still some people with hidden identities, moral awareness, and poor cultural attainment, making a large number of false, pornographic, politically sensitive, and swindling types. Superstition and other information are filled with Internet corner, corrupt social atmosphere, demagoguery, and cause great damage to people's physical and mental health. As a kind of network social media with a large number of users, Weibo is a platform for sharing, disseminating and obtaining information based on user relations. The information posted by users can be pushed to fans through clients or platforms in a timely manner, thus realizing real time. Quick dissemination of information. At the same time, Weibo fans can interact with the blogger by publishing comments, or can transmit, comment, collect and other operations, achieve information sharing, dissemination, expand the scope of information dissemination, enhance the influence of information. Weibo's this characteristic also led to Weibo to become the hiding place of bad information at the same time. Therefore, Weibo has become the object of many scholars. In order to purify the network environment, keep minors away from the violation of bad information and provide Internet users with good search experience, it is necessary to control the publication and dissemination of these bad information and take appropriate measures and means to strengthen supervision and management. Therefore, the purpose of this paper is to identify the bad information in the network, combined with the existing Chinese text mining technology to carry out experimental research. The crawler program collects Weibo users to comment and forward the text of a particular Weibo, and gets the original data. The original data are removed independent symbols, word segmentation, dependency tagging, word frequency statistics and so on, and the text feature set is extracted by using the obtained data. In order to improve the accuracy of word segmentation, this paper designs a bad thesaurus, which includes the basic word list, the synonym table, the abbreviated lexicon and the dependency table of the words. The feature extraction algorithm based on statistics is combined with dependency analysis to extract text features effectively, and a text classification model is implemented by using naive Bayes algorithm. Furthermore, the model is applied to the classification of user comments in Weibo, and the classifier is tested by experiments. Compared with the improved model, the classification accuracy and recall rate are obviously improved. Finally, this paper summarizes the research, puts forward the innovation and shortcomings of this paper, and continues to improve in the follow-up research process.
【學(xué)位授予單位】:華中師范大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:TP391.1;TP393.092

【相似文獻】

相關(guān)期刊論文 前10條

1 科卞;信號細(xì)微特征提取分析技術(shù)[J];電子科技大學(xué)學(xué)報;2000年02期

2 馬少華,高峰,李敏,吳成東;神經(jīng)網(wǎng)絡(luò)分類器的特征提取和優(yōu)選[J];基礎(chǔ)自動化;2000年06期

3 管聰慧,宣國榮;多類問題中的特征提取[J];計算機工程;2002年01期

4 胡威;李建華;陳波;;入侵檢測建模過程中特征提取最優(yōu)化評估[J];計算機工程;2006年12期

5 朱玉蓮;陳松燦;趙國安;;推廣的矩陣模式特征提取方法及其在人臉識別中的應(yīng)用[J];小型微型計算機系統(tǒng);2007年04期

6 趙振勇;王保華;王力;崔磊;;人臉圖像的特征提取[J];計算機技術(shù)與發(fā)展;2007年05期

7 馮海亮;王麗;李見為;;一種新的用于人臉識別的特征提取方法[J];計算機科學(xué);2009年06期

8 朱笑榮;楊德運;;基于入侵檢測的特征提取方法[J];計算機應(yīng)用與軟件;2010年06期

9 王菲;白潔;;一種基于非線性特征提取的被動聲納目標(biāo)識別方法研究[J];軟件導(dǎo)刊;2010年05期

10 陳偉;瞿曉;葛丁飛;;主觀引導(dǎo)特征提取法在光譜識別中的應(yīng)用[J];科技通報;2011年04期

相關(guān)會議論文 前10條

1 尚修剛;蔣慰孫;;模糊特征提取新算法[A];1997中國控制與決策學(xué)術(shù)年會論文集[C];1997年

2 潘榮江;孟祥旭;楊承磊;王銳;;旋轉(zhuǎn)體的幾何特征提取方法[A];第一屆建立和諧人機環(huán)境聯(lián)合學(xué)術(shù)會議(HHME2005)論文集[C];2005年

3 薛燕;李建良;朱學(xué)芳;;人臉識別中特征提取的一種改進方法[A];第十三屆全國圖象圖形學(xué)學(xué)術(shù)會議論文集[C];2006年

4 杜栓平;曹正良;;時間—頻率域特征提取及其應(yīng)用[A];2005年全國水聲學(xué)學(xué)術(shù)會議論文集[C];2005年

5 黃先鋒;韓傳久;陳旭;周劍軍;;運動目標(biāo)的分割與特征提取[A];全國第二屆信號處理與應(yīng)用學(xué)術(shù)會議專刊[C];2008年

6 魏明果;;方言比較的特征提取與矩陣分析[A];2009系統(tǒng)仿真技術(shù)及其應(yīng)用學(xué)術(shù)會議論文集[C];2009年

7 林土勝;賴聲禮;;視網(wǎng)膜血管特征提取的拆支跟蹤法[A];1999年中國神經(jīng)網(wǎng)絡(luò)與信號處理學(xué)術(shù)會議論文集[C];1999年

8 秦建玲;李軍;;基于核的主成分分析的特征提取方法與樣本篩選[A];2005年中國機械工程學(xué)會年會論文集[C];2005年

9 劉紅;陳光,

本文編號:2382415


資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/guanlilunwen/ydhl/2382415.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶17d7c***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com