基于機(jī)器學(xué)習(xí)和統(tǒng)計(jì)分析的DDoS攻擊檢測技術(shù)研究
發(fā)布時(shí)間:2018-06-03 05:36
本文選題:多元降維分析 + 隨機(jī)森林 ; 參考:《北京郵電大學(xué)》2017年博士論文
【摘要】:隨著計(jì)算機(jī)與通信技術(shù)的快速發(fā)展,以及當(dāng)前“互聯(lián)網(wǎng)+”時(shí)代背景下,云計(jì)算、物聯(lián)網(wǎng)、移動(dòng)互聯(lián)網(wǎng)和大數(shù)據(jù)等信息技術(shù)的興起與蓬勃發(fā)展,分布式拒絕服務(wù)(DistributedDenialofService, DDoS)攻擊已經(jīng)成為導(dǎo)致信息網(wǎng)絡(luò)環(huán)境最不穩(wěn)定的因素之一;同時(shí),伴隨著近年來僵尸網(wǎng)絡(luò)的盛行,DDoS攻擊帶來的危害更是日趨嚴(yán)重。由于DDoS攻擊的危害性大,每次發(fā)生重大攻擊事件波及范圍廣,因此,DDoS攻擊檢測始終是信息與網(wǎng)絡(luò)安全領(lǐng)域一個(gè)非常重要的研究課題。然而,一些已有的研究工作仍然存在如下一些問題,如:1)保證了檢測率(DetectionRate,DR)等指標(biāo),卻犧牲了檢測時(shí)間,且資源消耗大;2)不能較好地兼顧攻擊檢測的DR、正確率(Accuracy)、精確率(Precision)和假正率(False Positive Rate, FPR)等。鑒于此,本文旨在利用當(dāng)前較為流行的機(jī)器學(xué)習(xí)與數(shù)據(jù)挖掘、統(tǒng)計(jì)分析等相關(guān)理論方法和技術(shù),根據(jù)DDoS攻擊的特點(diǎn)以及對(duì)攻擊流量中各字段的不同屬性特征進(jìn)行提取、分析,以求對(duì)互聯(lián)網(wǎng)中大流量的DDoS攻擊進(jìn)行實(shí)時(shí)、高效、準(zhǔn)確的檢測。本文的主要貢獻(xiàn)和創(chuàng)新點(diǎn)包括如下幾個(gè)方面:(1)針對(duì)當(dāng)前大數(shù)據(jù)時(shí)代的大流量攻擊行為檢測,尤其是在DDoS攻擊實(shí)時(shí)檢測方面效果較差等一系列問題,我們以統(tǒng)計(jì)分析中的多元統(tǒng)計(jì)分析、相關(guān)性統(tǒng)計(jì)分析和機(jī)器學(xué)習(xí)中的主成分分析(Principal Component Analysis, PCA)為理論基礎(chǔ),研究并設(shè)計(jì)了一種基于多元降維分析(Multivariate Dimensionality Reduction Analysis, MDRA)算法的實(shí)時(shí)攻擊檢測(Real-time Attack Detection, RTAD)方法。該方法通過對(duì)網(wǎng)絡(luò)流量屬性特征字段降維處理并消除相關(guān)性,旨在解決互聯(lián)網(wǎng)中大流量DDoS攻擊的實(shí)時(shí)檢測問題。在經(jīng)過實(shí)驗(yàn)數(shù)據(jù)預(yù)處理和實(shí)驗(yàn)驗(yàn)證后,得到如下結(jié)論:RTAD方法在Precision和真負(fù)率(True Negative Rate, TNR)兩項(xiàng)評(píng)價(jià)指標(biāo)中均要優(yōu)于基于多元相關(guān)性分析(Multivariate Correlation Analysis, MCA)算法的攻擊檢測方法;在CPU計(jì)算時(shí)間和內(nèi)存消耗等方面,RTAD方法也有著明顯的優(yōu)勢。(2)針對(duì)傳統(tǒng)DDoS攻擊集中式和準(zhǔn)分布式檢測方法無法實(shí)現(xiàn)協(xié)同式檢測的目的,而且可擴(kuò)展性差,部署困難等一系列問題,本文研究了一種基于組合分類器的DDoS攻擊隨機(jī)森林分布式檢測(Random ForestDistributionDetection,RFDD)模型。該模型的核心部分采用的是機(jī)器學(xué)習(xí)中應(yīng)用非常廣泛的集成學(xué)習(xí)方法,即組合分類器的隨機(jī)森林方法,并將集成學(xué)習(xí)中的隨機(jī)森林算法和分布式并行計(jì)算框架相結(jié)合,通過對(duì)攻擊流量中不同屬性字段進(jìn)行降噪聲和消除相關(guān)性,以達(dá)到對(duì)其準(zhǔn)確檢測的目的。RFDD模型拓展性好,能夠適應(yīng)網(wǎng)絡(luò)環(huán)境中異常監(jiān)測的動(dòng)態(tài)調(diào)整與部署。通過實(shí)驗(yàn)驗(yàn)證得出如下結(jié)論:本研究所采用的RFDD模型無論是在DR、Accuracy、Precision還是在FPR方面均要優(yōu)于Adaboost方法,并且在取不同閾值時(shí),RFDD模型在上述四項(xiàng)指標(biāo)方面均能保持較好的穩(wěn)定性。(3)針對(duì)已有的基于同構(gòu)分類器的DDoS攻擊檢測模型的泛化能力和穩(wěn)定性較差等一系列問題,本文研究了一種基于奇異值分解(Singular Value Decomposition, SVD)和 Rotation Forest 集成策略的異構(gòu)多分類器集成學(xué)習(xí)(Heterogeneous Multi-classifier Ensemble Learning,HMEL)檢測模型。該模型主要包括三個(gè)模塊,即數(shù)據(jù)集預(yù)處理模塊、異構(gòu)多分類器檢測模塊和分類結(jié)果獲取模塊。HMEL檢測模型能夠?qū)W(wǎng)絡(luò)流量的不同屬性字段進(jìn)行去冗余和消除相關(guān)性。通過理論分析可以得出:該模型具有更強(qiáng)的泛化能力和普適性;通過與經(jīng)過SVD處理和未經(jīng)過SVD處理的隨機(jī)森林、k-NN以及Bagging等著名機(jī)器學(xué)習(xí)算法所構(gòu)成的同構(gòu)分類檢測器進(jìn)行實(shí)驗(yàn)對(duì)比后,得出如下結(jié)論:HMEL檢測模型在TNR、Accuracy和Precision方面接近于隨機(jī)森林和Bagging,并且完全優(yōu)于k-NN;同時(shí),隨著不同閾值的選取,k-NN的TNR、Accuracy和Precision均呈現(xiàn)出不穩(wěn)定性。因此,該模型不但具有較強(qiáng)的檢測能力,而且穩(wěn)定性好。綜上所述,本文以機(jī)器學(xué)習(xí)和統(tǒng)計(jì)分析的相關(guān)理論方法為基礎(chǔ),本著對(duì)網(wǎng)絡(luò)流量屬性特征“去冗余”、“降噪聲”、“消除相關(guān)性”的三大原則,為解決DDoS攻擊檢測中的實(shí)時(shí)、分布式、準(zhǔn)確檢測以及通過具有較強(qiáng)泛化能力和穩(wěn)定性的異構(gòu)集成分類檢測模型進(jìn)行檢測,做出了一系列積極探索和深入研究,并得出了一些具有顯著優(yōu)勢的實(shí)驗(yàn)結(jié)果,從而為推動(dòng)相關(guān)理論方法的進(jìn)一步研究以及未來在不同場景中的應(yīng)用,做出了一些有價(jià)值的工作。
[Abstract]:With the rapid development of computer and communication technology, and the current "Internet plus" era, cloud computing, Internet of things, the rise of mobile Internet and big data and other information technology and flourishing, distributed denial of service (DistributedDenialofService, DDoS) attacks have become the factors leading to the information network environment the most unstable. At the same time, with the prevalence of zombie network in recent years, the harm caused by DDoS attack is becoming more and more serious. Because of the great harm of DDoS attack and a wide range of major attacks each time, DDoS attack detection is always an important research subject in the field of information and network security. However, some existing research workers have been studied. There are still some problems as follows, such as: 1) guaranteed the detection rate (DetectionRate, DR) and other indicators, but sacrificed the detection time, and the resource consumption is large; 2) can not better take into account the attack detection DR, the accuracy rate (Accuracy), the accuracy rate (Precision) and false positive rate (False Positive Rate, FPR). In view of this, this article aims to make use of the current popular Machine learning and data mining, statistical analysis and other relevant theoretical methods and techniques, according to the characteristics of the DDoS attack and the characteristics of the different properties of the field in the attack flow, analysis, in order to carry out real-time, efficient and accurate detection of large traffic DDoS attacks in the Internet. The main contributions and innovation points of this paper include the following Several aspects: (1) aiming at a series of problems such as the detection of large traffic attack behavior in the large data age, especially in the real time detection of DDoS attack, we take the multivariate statistical analysis, the correlation statistical analysis and the principal component analysis (Principal Component Analysis, PCA) in the machine learning as the theoretical basis. A real-time attack detection (Real-time Attack Detection, RTAD) method based on the Multivariate Dimensionality Reduction Analysis (MDRA) algorithm is studied and designed. This method is designed to reduce the dimension of the network traffic attribute feature field and eliminate the phase correlation. This method is designed to solve the real traffic DDoS attack in the Internet. After the experimental data preprocessing and experimental verification, the following conclusions are obtained: the RTAD method is superior to the multiple correlation analysis (Multivariate Correlation Analysis, MCA) algorithm based on the two evaluation indexes of Precision and true negative (True Negative Rate, TNR), and in CPU computing time and memory. The RTAD method also has obvious advantages. (2) in view of the traditional DDoS attack centralized and quasi distributed detection methods can not achieve the purpose of cooperative detection, and the scalability, deployment difficulties and other problems, this paper studies a DDoS attack random forest distributed detection based on combiner classifiers (Random ForestD). IstributionDetection, RFDD) model. The core part of the model uses a very wide range of integrated learning methods in machine learning, that is, the random forest method of combining classifier, and combines the random forest algorithm in integrated learning with the distributed parallel computing framework to reduce the different attribute fields in the attack traffic. Noise and elimination of correlation in order to achieve the purpose of accurate detection of the.RFDD model is good expansibility and can adapt to the dynamic adjustment and deployment of abnormal monitoring in the network environment. Through experimental verification, the following conclusions are drawn: the RFDD model used in this study is better than the Adaboost method in DR, Accuracy, Precision or FPR. And when taking different thresholds, the RFDD model can maintain good stability in the above four indexes. (3) a series of problems, such as the generalization ability and poor stability of the existing DDoS attack detection model based on the isomorphism classifier, are studied in this paper, which is based on the singular value decomposition (Singular Value Decomposition, SVD) and Rotation Forest integration strategy for heterogeneous multiple classifier integrated learning (Heterogeneous Multi-classifier Ensemble Learning, HMEL) detection model, which mainly includes three modules, the data set preprocessing module, the heterogeneous classifier detection module and the classification result acquisition module.HMEL detection model to the network traffic of different attribute fields. Through theoretical analysis, it can be concluded that the model has stronger generalization ability and universality, and by comparing with the isomorphic classifier made up of famous machine learning algorithms such as SVD and untreated random forests, k-NN and Bagging, the following conclusions are drawn: HMEL The detection model is close to the random forest and Bagging in TNR, Accuracy and Precision, and is better than k-NN. At the same time, the TNR, Accuracy and Precision of k-NN are unstable with the selection of different thresholds. Therefore, the model not only has strong detection ability, but also has good stability. In summary, this paper is based on machine learning and Based on the theory and method of statistical analysis, in line with the three principles of "redundancy", "noise reduction" and "eliminating correlation" on network traffic attributes, it can be used to detect real-time, distributed, accurate detection and heterogeneous integrated classification detection model with strong generalization ability and stability in DDoS attack detection. A series of positive and in-depth studies have been made, and some experimental results with significant advantages have been obtained, and some valuable work has been made to promote the further research of the relevant theoretical methods and the future application in different scenes.
【學(xué)位授予單位】:北京郵電大學(xué)
【學(xué)位級(jí)別】:博士
【學(xué)位授予年份】:2017
【分類號(hào)】:TP393.08;TP181
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 劉汝雋;賈斌;辛陽;;基于信息增益特征選擇的網(wǎng)絡(luò)異常檢測模型[J];計(jì)算機(jī)應(yīng)用;2016年S2期
2 李,q;徐克付;張鵬;郭莉;胡s,
本文編號(hào):1971664
本文鏈接:http://sikaile.net/guanlilunwen/ydhl/1971664.html
最近更新
教材專著