天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當前位置:主頁 > 管理論文 > 移動網絡論文 >

基于云服務模式的文本過濾關鍵技術研究與應用

發(fā)布時間:2018-06-19 15:27

  本文選題:文本過濾 + 分類 ; 參考:《電子科技大學》2014年碩士論文


【摘要】:互聯(lián)網的快速發(fā)展,使其成為人們交流信息的主要方式之一。但由于它的這種開放性,導致網絡上存在很多如色情、暴力、迷信、反動等垃圾信息,嚴重影響了人們的日常上網活動。雖然目前已有很多文本過濾技術,但是隨著外界環(huán)境的變化,文本過濾技術也需要不斷地改進和提高。同時,隨著人們生活水平的不斷提高,越來越多的用戶通過移動終端來訪問互聯(lián)網。如何保證移動用戶能夠通過移動設備獲得健康的、有效的正常信息,這就需要在面向移動終端的云平臺上實現文本過濾技術,從而實現對垃圾網頁進行過濾處理。在這種需求下,本文在對現有的文本過濾關鍵技術進行了分析和討論的基礎上,改進了傳統(tǒng)的基于向量空間模型的文本分類算法以及樸素貝葉斯分類算法,并采用這兩種改進的文本分類算法構建了一個高性能的文本過濾系統(tǒng);然后將該系統(tǒng)部署于面向移動終端的云平臺,實現了云平臺上的文本過濾服務。保證了移動終端用戶能夠通過移動設備訪問互聯(lián)網上正常的、合法的網頁。本文的主要內容為:1、在對文本過濾技術中常用的特征選擇算法進行分析研究的基礎上,將等比例的思想運用于特征選擇,使得提取的文本特征向量能夠更準確地體現文本主題、類別信息等。2、在對文本過濾技術中已有的權重計算方法進行分析和討論的基礎上,考慮了特征項的結構信息、長度信息、比重信息等,對傳統(tǒng)的權重計算方法進行了改進,使其能夠更好地反映特征項對網頁分類的重要程度。3、網頁是一種結構化或半結構化的文檔,因此本文采用模塊化的方式對網頁進行分類處理;同時將基于比重的改進權值計算方法以及等比例的特征選擇方法應用于傳統(tǒng)的基于向量空間模型的分類算法和樸素貝葉斯分類算法;從而利用這兩個改進的分類算法構造了一個高性能的網頁過濾系統(tǒng),并且將該系統(tǒng)部署于云平臺,提供了文本過濾服務。測試結果證明,改進的文本分類算法與傳統(tǒng)的算法相比,具有更高的分類準確率、分類精度,較小的誤判率和錯誤率等,進而改進的文本過濾系統(tǒng)具有更好的性能。
[Abstract]:With the rapid development of the Internet, it has become one of the main ways for people to exchange information. However, because of its openness, there are a lot of junk information such as pornography, violence, superstition, reactionary and so on the Internet, which seriously affects people's daily online activities. Although there are many text filtering technologies, text filtering technology needs to be improved and improved with the change of external environment. At the same time, with the continuous improvement of people's living standards, more and more users access the Internet through mobile terminals. How to ensure that mobile users can obtain healthy and effective normal information through mobile devices, which requires the implementation of text filtering technology on the cloud platform for mobile terminals, so as to achieve the filtering of garbage pages. Based on the analysis and discussion of the existing key technologies of text filtering, this paper improves the traditional text classification algorithm based on vector space model and naive Bayes classification algorithm. The two improved text classification algorithms are used to construct a high performance text filtering system, and then the system is deployed to the mobile terminal oriented cloud platform to realize the text filtering service on the cloud platform. It ensures that mobile end users can access normal and legitimate web pages on the Internet through mobile devices. The main content of this paper is: 1. On the basis of analyzing and studying the common feature selection algorithms in text filtering technology, we apply the idea of equal proportion to feature selection, so that the extracted text feature vector can reflect the text topic more accurately. Based on the analysis and discussion of the existing weight calculation methods in text filtering technology, the structure information, length information and specific gravity information of feature items are considered, and the traditional weight calculation method is improved. It can better reflect the importance of feature items to the classification of web pages. Web pages are a kind of structured or semi-structured documents. At the same time, the improved weight calculation method based on specific gravity and the equal proportion feature selection method are applied to the traditional classification algorithm based on vector space model and naive Bayes classification algorithm. Therefore, a high performance web page filtering system is constructed by using these two improved classification algorithms, and the system is deployed on the cloud platform to provide text filtering services. The test results show that the improved text classification algorithm has higher classification accuracy, lower error rate and error rate than the traditional algorithm, and the improved text filtering system has better performance.
【學位授予單位】:電子科技大學
【學位級別】:碩士
【學位授予年份】:2014
【分類號】:TP393.09;TP391.1

【參考文獻】

相關期刊論文 前2條

1 阮彤,馮東雷,李京;基于貝葉斯網絡的信息過濾模型研究[J];計算機研究與發(fā)展;2002年12期

2 張霖;羅永亮;陶飛;任磊;郭華;;制造云構建關鍵技術研究[J];計算機集成制造系統(tǒng);2010年11期

,

本文編號:2040343

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/guanlilunwen/ydhl/2040343.html


Copyright(c)文論論文網All Rights Reserved | 網站地圖 |

版權申明:資料由用戶e06f9***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com