天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于貝葉斯理論的網(wǎng)絡(luò)輿情主題分類模型研究

發(fā)布時(shí)間:2018-09-02 06:40
【摘要】:隨著互聯(lián)網(wǎng)的普及,,網(wǎng)民的數(shù)量越來越多,很多人通過互聯(lián)網(wǎng)來關(guān)注輿情,人們?cè)诰W(wǎng)絡(luò)上瀏覽自己感興趣的輿情,并且發(fā)表評(píng)論、宣泄情感。然而網(wǎng)絡(luò)輿情信息繁雜,網(wǎng)民進(jìn)行瀏覽時(shí)具有一定的盲目性,目前各大門戶網(wǎng)站、論壇等對(duì)網(wǎng)絡(luò)輿情主題進(jìn)行了規(guī)劃,但在一定程度具有抽象性。因此,對(duì)網(wǎng)絡(luò)輿情主題進(jìn)行分類,不僅方便用戶瀏覽輿情新聞,同時(shí)是對(duì)網(wǎng)絡(luò)輿情進(jìn)行有效預(yù)警,能夠使相關(guān)部門正確引導(dǎo)網(wǎng)絡(luò)輿情。 關(guān)于中文文本分類的方法已有多種,其中常見的分類方法有樸素貝葉斯、K-近鄰和支持向量機(jī)三種。本文在利用結(jié)構(gòu)簡(jiǎn)單、分類高效的樸素貝葉斯對(duì)網(wǎng)絡(luò)輿情主題分類進(jìn)行研究時(shí)發(fā)現(xiàn),樸素貝葉斯的條件獨(dú)立性假設(shè)限制了其應(yīng)用范圍,降低了分類精度,并且該方法在面對(duì)增量的網(wǎng)絡(luò)輿情信息時(shí),需要通過學(xué)習(xí)來修正先驗(yàn)信息,而每一次學(xué)習(xí)所有文本都需要參與,缺乏靈活性。 針對(duì)上述問題,本文運(yùn)用增量學(xué)習(xí)機(jī)制和動(dòng)態(tài)約簡(jiǎn)對(duì)樸素貝葉斯分類方法進(jìn)行優(yōu)化,結(jié)合文本挖掘技術(shù),提出了一種優(yōu)化的網(wǎng)絡(luò)輿情主題分類模型。本文的研究重點(diǎn)主要有以下幾個(gè)方面: 1.網(wǎng)絡(luò)輿情文本信息的收集,通過利用網(wǎng)絡(luò)爬蟲技術(shù)收集信息,并且結(jié)合HTML解釋器和網(wǎng)頁凈化技術(shù)對(duì)輿情信息進(jìn)行解析和提取,利用優(yōu)化的特征加權(quán)方法表示網(wǎng)絡(luò)輿情文本,提高網(wǎng)絡(luò)輿情文本表示的準(zhǔn)確性。 2.利用增量學(xué)習(xí)機(jī)制和(F-λ)廣義動(dòng)態(tài)約簡(jiǎn)對(duì)樸素貝葉斯分類方法進(jìn)行優(yōu)化,提高其分類精度。(F-λ)廣義動(dòng)態(tài)約簡(jiǎn)通過引入動(dòng)態(tài)約簡(jiǎn)精度系數(shù)λ,減少參與屬性約簡(jiǎn)的文本數(shù),釋放了條件獨(dú)立性假設(shè),降低計(jì)算復(fù)雜度,提高其分類精度;樸素貝葉斯利用增量學(xué)習(xí),解決了對(duì)增量網(wǎng)絡(luò)輿情進(jìn)行主題分類時(shí)需要學(xué)習(xí)所有文本來修正先驗(yàn)信息的問題,在增量學(xué)習(xí)過程中,通過引入類置信度,避免了噪音分類加入原始訓(xùn)練集而降低分類器的分類精度。 3.通過數(shù)據(jù)實(shí)驗(yàn)分析對(duì)比文中所提到的非增量非動(dòng)態(tài)約簡(jiǎn)分類算法、增量分類算法、動(dòng)態(tài)約簡(jiǎn)分類算法以及既增量又動(dòng)態(tài)約簡(jiǎn)分類算法,以檢驗(yàn)本文所提出的優(yōu)化的網(wǎng)絡(luò)輿情主題分類算法的有效性,并且通過仿真實(shí)驗(yàn)研究了網(wǎng)絡(luò)輿情主題分類算法的可行性。
[Abstract]:With the popularity of the Internet, the number of Internet users more and more, many people through the Internet to pay attention to public opinion, people in the Internet browse their own interest in public opinion, and comment, vent feelings. However, the network public opinion information is complicated, Internet users have certain blindness when browsing, at present, the major portal websites, forums and so on have carried on the plan to the network public opinion theme, but has the abstraction to a certain extent. Therefore, classifying the topic of network public opinion is not only convenient for users to browse the news of public opinion, but also an effective early warning of network public opinion, which can make the relevant departments guide the network public opinion correctly. There are many methods for Chinese text classification, among which the common classification methods are naive Bayesian K-nearest neighbor and support vector machine. In this paper, we study the topic classification of network public opinion by using naive Bayes with simple structure and efficient classification. It is found that the conditional independence hypothesis of naive Bayes limits its application scope and reduces the classification accuracy. In the face of the incremental network public opinion information, the method needs to modify the prior information by learning, and every time learning all the texts need to participate, so it is inflexible. Aiming at the above problems, this paper uses incremental learning mechanism and dynamic reduction to optimize the naive Bayes classification method, and combines the text mining technology, proposes an optimized network public opinion topic classification model. The main research focus of this paper is as follows: 1. The collection of network public opinion text information, through the use of web crawler technology to collect information, and combined with HTML interpreter and page purification technology to analyze and extract public opinion information, using the optimized feature weighting method to express network public opinion text. Improve the accuracy of network public opinion text representation. 2. By using incremental learning mechanism and (F- 位) generalized dynamic reduction, the naive Bayes classification method is optimized and its classification accuracy is improved. (F- 位) generalized dynamic reduction reduces the number of text involved in attribute reduction by introducing dynamic reduction precision coefficient 位. The assumption of conditional independence is released, the computational complexity is reduced, and the classification accuracy is improved. By using incremental learning, naive Bayes solves the problem that we need to learn all the texts to correct the prior information when classifying the topic of incremental network public opinion. In the process of incremental learning, the accuracy of the classifier is reduced by introducing the confidence degree of the class and avoiding the noise classification from being added to the original training set. Through the data experiment analysis and comparison of the non-incremental non-dynamic reduction classification algorithm, incremental classification algorithm, dynamic reduction classification algorithm and both incremental and dynamic reduction classification algorithm, In order to test the effectiveness of the optimized algorithm, the feasibility of the algorithm is studied through simulation experiments.
【學(xué)位授予單位】:江蘇科技大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2014
【分類號(hào)】:TP393.09

【參考文獻(xiàn)】

相關(guān)期刊論文 前7條

1 呂嵐;;基于層次聚類算法的WEB文本挖掘技術(shù)研究[J];福建電腦;2011年03期

2 程克非;張聰;;基于特征加權(quán)的樸素貝葉斯分類器[J];計(jì)算機(jī)仿真;2006年10期

3 袁軍鵬;朱東華;李毅;李連宏;黃進(jìn);;文本挖掘技術(shù)研究進(jìn)展[J];計(jì)算機(jī)應(yīng)用研究;2006年02期

4 魏松;鐘義信;王翔英;;中文Web文本挖掘系統(tǒng)WebTextMiner開發(fā)[J];計(jì)算機(jī)應(yīng)用研究;2006年06期

5 孫玲芳;翟鵬博;;基于可變模糊集理論的輿情指標(biāo)預(yù)警模型研究[J];計(jì)算機(jī)與數(shù)字工程;2014年02期

6 劉毅;;基于三角模糊數(shù)的網(wǎng)絡(luò)輿情預(yù)警指標(biāo)體系構(gòu)建[J];統(tǒng)計(jì)與決策;2012年02期

7 戴媛;郝曉偉;郭巖;余智華;;我國網(wǎng)絡(luò)輿情安全評(píng)估指標(biāo)體系的構(gòu)建研究[J];信息網(wǎng)絡(luò)安全;2010年04期



本文編號(hào):2218566

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/guanlilunwen/ydhl/2218566.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶afdca***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com