基于機(jī)器學(xué)習(xí)的異常流量檢測(cè)系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)
本文選題:流量分析 切入點(diǎn):異常檢測(cè) 出處:《北京郵電大學(xué)》2017年碩士論文
【摘要】:現(xiàn)如今隨著互聯(lián)網(wǎng)技術(shù)的不斷發(fā)展,人們的生活和工作越來(lái)越依賴(lài)于各種互聯(lián)網(wǎng)應(yīng)用。但由于安全意識(shí)的缺乏和攻擊技術(shù)不斷向復(fù)雜化、多樣化發(fā)展,許多網(wǎng)絡(luò)應(yīng)用都遭受著各種各樣的網(wǎng)絡(luò)攻擊和安全威脅,暴露出很多的網(wǎng)絡(luò)安全漏洞。異常流量檢測(cè)作為攻擊防御的第一步為攻擊的攔截提供了有效的保障,因此,準(zhǔn)確地檢測(cè)出異常流量是保障網(wǎng)絡(luò)應(yīng)用可用性和安全性的必需。本文通過(guò)研究現(xiàn)有的異常流量檢測(cè)技術(shù),把先進(jìn)的機(jī)器學(xué)習(xí)方法引入到異常檢測(cè)系統(tǒng)中,提出并設(shè)計(jì)一個(gè)基于機(jī)器學(xué)習(xí)的異常流量檢測(cè)的模型。該模型主要包括四個(gè)部分:1)從數(shù)據(jù)挖掘角度統(tǒng)計(jì)分析異常流量的特點(diǎn)并形成惡意關(guān)鍵字庫(kù)與多維特征庫(kù);2)對(duì)多維特征庫(kù)進(jìn)行有效性測(cè)試與集合優(yōu)化;3)選擇機(jī)器學(xué)習(xí)算法對(duì)訓(xùn)練集進(jìn)行學(xué)習(xí)與驗(yàn)證,對(duì)分類(lèi)結(jié)果進(jìn)行性能評(píng)估;4)在系統(tǒng)的實(shí)際應(yīng)用中將其部署于Hadoop與Spark云平臺(tái),通過(guò)并行化的檢測(cè)提高異常流量檢測(cè)的效率。在分析異常流量特點(diǎn)的研究中,結(jié)合了基于特征規(guī)則和基于統(tǒng)計(jì)分析的方法,把異常流量檢測(cè)看作一個(gè)模式識(shí)別問(wèn)題,分解出異常流量的共性以及與正常流量之間的差異性,將其歸納學(xué)習(xí)為特征字段,供機(jī)器學(xué)習(xí)算法進(jìn)行驗(yàn)證和評(píng)估。在特征優(yōu)化的研究中,本文提出了基于Sigmoid的特征選擇算法,基于信息增益的特征排序算法以及基于時(shí)間反饋的特征優(yōu)化算法三個(gè)特征提取算法。通過(guò)過(guò)濾,排序,性能優(yōu)化三個(gè)步驟挖掘出多維特征集合中最優(yōu)的特征子集。在機(jī)器學(xué)習(xí)算法的選擇上,本文比較并評(píng)估了決策樹(shù),隨機(jī)森林和GBDT三種優(yōu)秀的分類(lèi)算法,并將并行化考慮其中,最終實(shí)驗(yàn)證明了 GBDT算法在準(zhǔn)確率和召回率上的優(yōu)勢(shì)。最后,本文考慮到系統(tǒng)實(shí)際應(yīng)用所面臨的大數(shù)據(jù)環(huán)境,設(shè)計(jì)并實(shí)現(xiàn)了一套基于分布式的檢測(cè)系統(tǒng),利用Hadoop和Spark分布式平臺(tái)與云存儲(chǔ)的數(shù)據(jù)處理優(yōu)勢(shì),將數(shù)據(jù)預(yù)處理,特征解析以及機(jī)器學(xué)習(xí)過(guò)程實(shí)現(xiàn)了完全的并行化,大大提高了系統(tǒng)的檢測(cè)效率。
[Abstract]:Nowadays, with the continuous development of Internet technology, people's life and work are more and more dependent on various Internet applications.However, due to the lack of security awareness and the continuous development of attack technology, many network applications suffer from various network attacks and security threats, exposing a lot of network security vulnerabilities.As the first step of attack defense, anomaly traffic detection provides an effective guarantee for the interception of attacks. Therefore, it is necessary to accurately detect abnormal traffic to ensure the usability and security of network applications.This paper introduces the advanced machine learning method into the anomaly detection system by studying the existing abnormal traffic detection technology, and proposes and designs a model of abnormal traffic detection based on machine learning.The model mainly includes four parts: 1) from the angle of data mining, the characteristics of abnormal traffic are statistically analyzed and the malicious keyword library and multidimensional signature library are formed. (2) the validity test and set optimization of multidimensional signature library are carried out.The learning and verification of the training set is based on the learning algorithm.Performance evaluation of the classification results is carried out. In the practical application of the system, it is deployed on the cloud platform of Hadoop and Spark to improve the efficiency of anomaly traffic detection by parallel detection.In the research of analyzing the characteristics of abnormal traffic, combining the method based on feature rule and statistical analysis, the detection of abnormal traffic is regarded as a pattern recognition problem, which decomposes the commonness of abnormal traffic and the difference between abnormal flow and normal traffic.Its inductive learning is used as feature field for machine learning algorithm to verify and evaluate.In the research of feature optimization, this paper proposes three feature extraction algorithms: feature selection algorithm based on Sigmoid, feature sorting algorithm based on information gain and feature optimization algorithm based on time feedback.Through filtering, sorting and performance optimization, the optimal feature subset of multidimensional feature set is mined.In the selection of machine learning algorithm, this paper compares and evaluates three excellent classification algorithms: decision tree, random forest and GBDT, and considers the parallelism among them. Finally, the experiment proves the superiority of GBDT algorithm in accuracy and recall.Finally, considering the big data environment that the system is facing in practical application, this paper designs and implements a set of distributed detection system, which makes use of the advantages of Hadoop and Spark distributed platform and cloud storage to preprocess the data.The process of feature resolution and machine learning achieves complete parallelization, which greatly improves the detection efficiency of the system.
【學(xué)位授予單位】:北京郵電大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類(lèi)號(hào)】:TP181;TP393.06
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 許曉東;楊燕;李剛;;基于K-means聚類(lèi)的網(wǎng)絡(luò)流量異常檢測(cè)[J];無(wú)線通信技術(shù);2013年04期
2 冶曉隆;蘭巨龍;郭通;;基于主成分分析禁忌搜索和決策樹(shù)分類(lèi)的異常流量檢測(cè)方法[J];計(jì)算機(jī)應(yīng)用;2013年10期
3 鄭黎明;鄒鵬;賈焰;韓偉紅;;網(wǎng)絡(luò)流量異常檢測(cè)中分類(lèi)器的提取與訓(xùn)練方法研究[J];計(jì)算機(jī)學(xué)報(bào);2012年04期
4 陳鴻昶;程國(guó)振;伊鵬;;基于多尺度特征融合的異常流量檢測(cè)方法[J];計(jì)算機(jī)科學(xué);2012年02期
5 程國(guó)振;程?hào)|年;俞定玖;;基于多尺度低秩模型的網(wǎng)絡(luò)異常流量檢測(cè)方法[J];通信學(xué)報(bào);2012年01期
6 李強(qiáng);嚴(yán)承華;;基于直方圖聚類(lèi)的網(wǎng)絡(luò)流量異常檢測(cè)技術(shù)研究[J];信息網(wǎng)絡(luò)安全;2012年01期
7 賴(lài)粵;黃河濤;謝勝利;;基于IXP2850的異常流量檢測(cè)模塊的設(shè)計(jì)與實(shí)現(xiàn)[J];計(jì)算機(jī)工程與設(shè)計(jì);2011年07期
8 朱劍;李輝;;利用鏈路相關(guān)性進(jìn)行網(wǎng)絡(luò)流量異常檢測(cè)[J];計(jì)算機(jī)應(yīng)用與軟件;2011年06期
9 孫紅艷;張紅玉;;一種基于Under-sampling的BGP異常流量檢測(cè)方法[J];電子技術(shù);2011年01期
10 賈慧;高仲合;;基于自相似的異常流量檢測(cè)模型[J];通信技術(shù);2010年12期
相關(guān)博士學(xué)位論文 前3條
1 周穎杰;基于行為分析的通信網(wǎng)絡(luò)流量異常檢測(cè)與關(guān)聯(lián)分析[D];電子科技大學(xué);2013年
2 楊曉峰;基于機(jī)器學(xué)習(xí)的Web安全檢測(cè)方法研究[D];南京理工大學(xué);2011年
3 左申正;基于機(jī)器學(xué)習(xí)的網(wǎng)絡(luò)異常分析及響應(yīng)研究[D];北京郵電大學(xué);2010年
相關(guān)碩士學(xué)位論文 前3條
1 姜海東;基于機(jī)器學(xué)習(xí)的異常流量檢測(cè)[D];南京郵電大學(xué);2014年
2 許倩;基于特征統(tǒng)計(jì)分析的異常流量檢測(cè)技術(shù)研究[D];解放軍信息工程大學(xué);2012年
3 童行行;基于機(jī)器學(xué)習(xí)的網(wǎng)絡(luò)流量分析研究[D];清華大學(xué);2005年
,本文編號(hào):1726624
本文鏈接:http://sikaile.net/guanlilunwen/ydhl/1726624.html