基于數(shù)據(jù)挖掘的惡意網(wǎng)站檢測技術(shù)研究
[Abstract]:With the development of the Internet, network security has been paid more and more attention. The frequent occurrence of malicious website attacks has brought huge property losses to users, but also seriously threatened the security of individuals and even countries. Therefore, it is of great significance to establish a certain model and identify and detect malicious websites. At present, many scholars at home and abroad have improved the feature selection methods, most of them focus on the host features and lexical features of the two aspects of in-depth mining and improvement, but there are still low accuracy and efficiency. In order to solve these problems, in this paper, the concept of establishing the list of vulnerable websites is proposed, and a new feature extraction scheme based on weighted distance is proposed. At the same time, in the data mining algorithm, this paper improves the KNN model based on the improved fuzzy C-means clustering algorithm, and improves the efficiency of the model. The research work of this paper mainly includes: data acquisition: this paper crawls, cleans, standardizes and stores the data of normal website and malicious website respectively, and finally puts the data into MySQL database. Feature extraction: different from the common concepts of website whitelist and website blacklist, this paper summarizes the vulnerable websites and puts forward the concept of establishing vulnerable website lists. At the same time, malicious websites usually change to a certain extent on the basis of normal websites. According to the different weights of the change types, the concept of weighted distance is put forward, and the nearest weighted distance between malicious websites and URL in the list of vulnerable sites is calculated for any input URL, and it is regarded as a new feature. Model improvement: in this paper, the KNN algorithm and fuzzy C-means algorithm are improved. In order to solve the problem that the initial clustering center of FCM is uncertain and easy to fall into local optimization, the coordinate density method is proposed to determine the initial clustering center. In order to solve the problem of random selection of the initial clustering number of FCM algorithm, a method is proposed to determine the K value and the number of data sets. Finally, the clustering center of the sample and the cluster in which the clustering center is located are obtained. By finding the cluster with the smallest distance from the test set, the category of the test set is determined. Model verification: in this paper, LR model, J48 model and improved KNN model are used to classify the data by WEKA. At the same time, the data with new features and the data using original features are compared with the data mining algorithm. Finally, the classification results are improved to a certain extent. At the same time, compared with other methods in the literature, it is found that the characteristics have better results.
【學(xué)位授予單位】:北京郵電大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:TP393.092;TP311.13
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 周慶平;譚長庚;王宏君;湛淼湘;;基于聚類改進(jìn)的KNN文本分類算法[J];計(jì)算機(jī)應(yīng)用研究;2016年11期
2 陳莊;劉龍飛;;融合域名注冊信息的惡意網(wǎng)站檢測方法研究[J];計(jì)算機(jī)光盤軟件與應(yīng)用;2015年01期
3 曹玖新;董丹;毛波;王田峰;;基于URL特征的Phishing檢測方法(英文)[J];Journal of Southeast University(English Edition);2013年02期
4 李洋;劉飚;封化民;;基于機(jī)器學(xué)習(xí)的網(wǎng)頁惡意代碼檢測方法[J];北京電子科技學(xué)院學(xué)報(bào);2012年04期
5 劉喜梅;雷達(dá);;一種改進(jìn)的模糊C均值聚類算法[J];青島科技大學(xué)學(xué)報(bào)(自然科學(xué)版);2011年02期
6 胡明;劉嘉勇;劉亮;;一種基于代碼特征的網(wǎng)頁木馬改良模型研究[J];通信技術(shù);2010年08期
7 張孝飛;黃河燕;;一種采用聚類技術(shù)改進(jìn)的KNN文本分類方法[J];模式識別與人工智能;2009年06期
8 呂曉燕;羅立民;李祥生;;FCM算法的改進(jìn)及仿真實(shí)驗(yàn)研究[J];計(jì)算機(jī)工程與應(yīng)用;2009年20期
9 張慧哲;王堅(jiān);;基于初始聚類中心選取的改進(jìn)FCM聚類算法[J];計(jì)算機(jī)科學(xué);2009年06期
10 吳潤浦;方勇;吳少華;;基于統(tǒng)計(jì)與代碼特征分析的網(wǎng)頁木馬檢測模型[J];信息與電子工程;2009年01期
相關(guān)會議論文 前1條
1 劉琪;牛文靜;;正則表達(dá)式在惡意代碼動態(tài)分析中的應(yīng)用[A];2009通信理論與技術(shù)新發(fā)展——第十四屆全國青年通信學(xué)術(shù)會議論文集[C];2009年
相關(guān)博士學(xué)位論文 前2條
1 汪慶淼;基于目標(biāo)函數(shù)的模糊聚類新算法及其應(yīng)用研究[D];江蘇大學(xué);2014年
2 張健毅;大規(guī)模反釣魚識別引擎關(guān)鍵技術(shù)研究[D];北京郵電大學(xué);2012年
相關(guān)碩士學(xué)位論文 前2條
1 趙茉莉;網(wǎng)絡(luò)爬蟲系統(tǒng)的研究與實(shí)現(xiàn)[D];電子科技大學(xué);2013年
2 王穎杰;基于惡意網(wǎng)頁檢測的蜜罐系統(tǒng)研究[D];南京師范大學(xué);2008年
,本文編號:2505599
本文鏈接:http://sikaile.net/guanlilunwen/ydhl/2505599.html