當(dāng)前位置：主頁(yè) > 管理論文 > 移動(dòng)網(wǎng)絡(luò)論文 >

基于重采樣的級(jí)聯(lián)分類器入侵檢測(cè)研究

發(fā)布時(shí)間：2019-03-11 10:35

【摘要】：隨著信息技術(shù)的快速發(fā)展和網(wǎng)絡(luò)的普及,互聯(lián)網(wǎng)已經(jīng)成為人們工作生活的重要組成部分,同時(shí)互聯(lián)網(wǎng)中惡意信息竊取、人身攻擊、非法牟取暴利的行為也大量增長(zhǎng),網(wǎng)絡(luò)安全問題日益突出,致使網(wǎng)絡(luò)安全研究的重要性日漸凸顯。入侵檢測(cè)是網(wǎng)絡(luò)安全領(lǐng)域的研究熱點(diǎn),是一種檢測(cè)計(jì)算機(jī)網(wǎng)絡(luò)或系統(tǒng)中違反安全使用行為的過程。隨著信息技術(shù)的發(fā)展,各類計(jì)算機(jī)系統(tǒng)的復(fù)雜性也呈指數(shù)級(jí)增長(zhǎng),這給入侵檢測(cè)帶來極大困難。本文通過對(duì)網(wǎng)絡(luò)入侵檢測(cè)方法的研究,發(fā)現(xiàn)常用的入侵檢測(cè)方法主要致力于于提高整體的檢測(cè)率,然而卻忽視了部分重要類別的檢測(cè)率,使得R2L(來自遠(yuǎn)程主機(jī)的未授權(quán)訪問)和U2R(未授權(quán)的本地超級(jí)用戶特權(quán)訪問)兩類攻擊行為檢測(cè)率很低,然而該兩類行為入侵成功后均可對(duì)服務(wù)器資源進(jìn)行竊取或破壞,因此,提高其檢測(cè)性能顯得刻不容緩。本文首先針對(duì)目前常見的主要檢測(cè)方法,分析了導(dǎo)致R2L和U2R兩類攻擊檢測(cè)效果不理想的原因,其主要原因有兩點(diǎn):一是數(shù)據(jù)分布不平衡,導(dǎo)致分類發(fā)生偏斜,其為不平衡分類問題(即訓(xùn)練集中數(shù)據(jù)分布極其不平衡,某一類或某些類的樣本數(shù)量遠(yuǎn)遠(yuǎn)大于或小于其他類別);二是該兩類攻擊很難從包頭分辨,需要數(shù)據(jù)包的詳細(xì)內(nèi)容信息。通過對(duì)常用入侵檢測(cè)方法的分析與研究,發(fā)現(xiàn)他們均采用相同方法檢測(cè)各類,故很難達(dá)到理想效果,而級(jí)聯(lián)多個(gè)分類器分別做不同類的分類能有效解決入侵檢測(cè)中數(shù)據(jù)分布不平衡問題。入侵檢測(cè)屬于典型的不平衡分類問題,本文系統(tǒng)深入地研究了重采樣等不平衡分類方法,針對(duì)SMOTE在對(duì)入侵檢測(cè)數(shù)據(jù)集重采樣過程中會(huì)產(chǎn)生噪音及邊界數(shù)據(jù)的問題,引入NCL(鄰域清理)過濾器方法;提出了改進(jìn)優(yōu)化的重采樣方法SMOTE-NCL用于過濾掉噪音與邊界數(shù)據(jù)。由于級(jí)聯(lián)分類器方法在解決不平衡分類問題中的優(yōu)勢(shì)和在入侵檢測(cè)中表現(xiàn)的良好效果,本文使用級(jí)聯(lián)分類器進(jìn)行入侵檢測(cè)。但考慮到入侵檢測(cè)數(shù)據(jù)集中較高的特征維數(shù)對(duì)檢測(cè)性能的影響的問題,本文通過引入改進(jìn)優(yōu)化的CGFR特征選擇方法,分別為級(jí)聯(lián)的分類器選擇特征子集。然后將CGFR與SMOTE-NCL應(yīng)用于級(jí)聯(lián)分類器,在此基礎(chǔ)上提出了基于重采樣的級(jí)聯(lián)分類器入侵檢測(cè)模型,以解決現(xiàn)有入侵檢測(cè)方法中對(duì)R2L和U2R兩類攻擊檢測(cè)效果不理想的問題。根據(jù)理論分析實(shí)驗(yàn),本文選擇的級(jí)聯(lián)分類器中的分類方法分別為決策樹算法(C4.5)和樸素貝葉斯(NB)算法,模型級(jí)聯(lián)的第一個(gè)分類器用于訓(xùn)練Do S(拒絕服務(wù)攻擊)、Probe(端口掃描)和Normal(正常數(shù)據(jù))三類,第二個(gè)分類器用于訓(xùn)練Normal、R2L和U2R三類;在檢測(cè)過程中,測(cè)試集首先進(jìn)入第一個(gè)分類器被分類器分類為Normal的數(shù)據(jù)進(jìn)入到第二個(gè)分類器分類,最終能夠完成Do S、Probe、Normal、R2L和U2R五類的分類。實(shí)驗(yàn)首先對(duì)比了各種特征選擇方法與CGFR方法選擇的特征子集在級(jí)聯(lián)分類器上的分類結(jié)果;然后對(duì)比了在原數(shù)據(jù)集、SMOTE不同采樣率的和SMOTE-NCL重采樣的數(shù)據(jù)集上使用級(jí)聯(lián)分類器進(jìn)行分類的結(jié)果;最后對(duì)比了在SMOTE-NCL重采樣的數(shù)據(jù)集上使用SVM、KNN、NB、C4.5以及級(jí)聯(lián)分類器方法進(jìn)行分類的結(jié)果;對(duì)于U2R和R2L兩類攻擊,本文提出的基于CGFR和SMOTE-NCL的級(jí)聯(lián)分類器入侵檢測(cè)模型的AUC值均高于其他情況。但對(duì)于R2L的檢測(cè)效果仍不夠理想,這是因?yàn)镽2L類攻擊很難通過包頭特征分辨,需要數(shù)據(jù)包的詳細(xì)內(nèi)容特征才能判定,其大量樣本包頭特征與Normal無異,因此檢測(cè)效果不理想。要進(jìn)一步解決該問題,作者考慮應(yīng)在提取數(shù)據(jù)時(shí)從數(shù)據(jù)包內(nèi)容中抽取部分特征,重新動(dòng)態(tài)生成訓(xùn)練集和測(cè)試集,這也是本文下一步的工作。
[Abstract]:With the rapid development of the information technology and the popularization of the network, the Internet has become an important part of people's work life, and meanwhile, the malicious information stealing, personal attack and illegal exploitation of the Internet in the Internet also increase, and the problem of network security is becoming more and more serious. The importance of network security research is becoming more and more prominent. Intrusion detection is a hot topic in the field of network security, and it is a process to detect the violation of safe use in computer network or system. With the development of information technology, the complexity of all kinds of computer systems also grows exponentially, which brings great difficulty to the intrusion detection. In this paper, through the research of the network intrusion detection method, it is found that the common intrusion detection method is mainly devoted to the improvement of the overall detection rate, but the detection rate of some important categories is ignored, such that the R2L (unauthorized access from the remote host) and the U2R (unauthorized local super-user privilege access) have a low detection rate, however, after the two types of behavior intrusion are successful, the server resources can be stolen or destroyed, It is very urgent to improve its detection performance. In this paper, the main causes of the two kinds of attack detection results of R2L and U2R are analyzed in this paper. The main cause of this paper is that the data distribution is not balanced, leading to the skew of the classification. It is an unbalanced classification problem (that is, the distribution of the training concentrated data is extremely unbalanced, the number of samples of one or some classes is far greater than or smaller than the other categories), and the other is that the two types of attacks are difficult to distinguish from the header, and the detailed content information of the data packet is required. Through the analysis and research of the common intrusion detection method, it is found that they all adopt the same method to detect various types, so it is difficult to achieve the ideal effect, and the cascade of multiple classifiers can effectively solve the problem of unbalanced data distribution in the intrusion detection. The intrusion detection is a typical non-equilibrium classification problem. In this paper, the non-equilibrium classification method such as re-sampling is deeply studied in this paper, and the method of NCL (neighborhood cleaning) filter is introduced to the problem of noise and boundary data in the process of re-sampling the intrusion detection data set by the SMOTE. An improved re-sampling method, SMOTE-NCL, is proposed to filter out the noise and boundary data. In this paper, the cascade classifier is used for intrusion detection due to the advantages of the cascade classifier method in solving the problem of unbalanced classification and the good effect in the intrusion detection. However, considering the influence of the feature dimension of the intrusion detection data set on the detection performance, this paper selects the feature subset for the cascaded classifier by introducing the improved optimized CGFR feature selection method. And then the CGFR and the SMOTE-NCL are applied to a cascade classifier, and on the basis of that, a cascade classifier intrusion detection model based on the re-sampling is proposed to solve the problem that the two types of attack detection effects of the R2L and U2R are not ideal in the prior intrusion detection method. according to the theoretical analysis experiment, the classification method in the cascade classifier selected by the invention is a decision tree algorithm (C4.5) and a Naive Bayes (NB) algorithm, and the first classifier of the model cascade is used for training a Do S (denial of service attack), Probe (port scan) and Normal (normal data), the second classifier is used to train three types of Normal, R2L and U2R; in the course of detection, the test set first enters the first classifier to be classified by the classifier as normal data into the second classifier, and finally can complete Do S, Probe, The classification of the Normal, R2L and U2R categories. In this paper, the classification results of the feature subsets selected by the feature selection method and the CGFR method on the cascade classifier are compared, and the results of the classification using the cascade classifier on the data set with different sampling rates of the original data set and the SMOTE and the SMOTE-NCL re-sampling are compared. Finally, the results of classification by using the SVM, KNN, NB, C4.5 and the cascade classifier method on the data set of the SMOTE-NCL re-sampling are compared, and the AUC values of the cascade classifier intrusion detection model based on CGFR and SMOTE-NCL are higher than that of other cases for both U2R and R2L attacks. However, the detection result of the R2L is still not ideal because the R2L class attack is difficult to distinguish by the packet header feature, and the detailed content characteristics of the data packet are required to determine that the large number of sample header features are not identical to Normal, so the detection effect is not ideal. To further solve this problem, the author considers that part of the feature should be extracted from the contents of the data packet when the data is extracted, and the training set and test set can be dynamically generated, which is also the work of the next step.
【學(xué)位授予單位】：西南大學(xué)
【學(xué)位級(jí)別】：碩士
【學(xué)位授予年份】：2017
【分類號(hào)】：TP393.08

【參考文獻(xiàn)】

相關(guān)期刊論文前10條

1 龔儉;臧小東;蘇琪;胡曉艷;徐杰;;網(wǎng)絡(luò)安全態(tài)勢(shì)感知綜述[J];軟件學(xué)報(bào);2017年04期

2 李威;楊忠明;;入侵檢測(cè)系統(tǒng)的研究綜述[J];吉林大學(xué)學(xué)報(bào)(信息科學(xué)版);2016年05期

3 袁開銀;費(fèi)嵐;;混合粒子群優(yōu)化算法選擇特征的網(wǎng)絡(luò)入侵檢測(cè)[J];吉林大學(xué)學(xué)報(bào)(理學(xué)版);2016年02期

4 江頡;王卓芳;陳鐵明;朱陳晨;陳波;;自適應(yīng)AP聚類算法及其在入侵檢測(cè)中的應(yīng)用[J];通信學(xué)報(bào);2015年11期

5 武小年;彭小金;楊宇洋;方X;;入侵檢測(cè)中基于SVM的兩級(jí)特征選擇方法[J];通信學(xué)報(bào);2015年04期

6 崔亞芬;解男男;;一種基于特征選擇的入侵檢測(cè)方法[J];吉林大學(xué)學(xué)報(bào)(理學(xué)版);2015年01期

7 楊雅輝;黃海珍;沈晴霓;吳中海;張英;;基于增量式GHSOM神經(jīng)網(wǎng)絡(luò)模型的入侵檢測(cè)研究[J];計(jì)算機(jī)學(xué)報(bào);2014年05期

8 肖仙謙;朱俊平;景旭;馬巧娥;;基于貝葉斯方法的單分類入侵檢測(cè)技術(shù)[J];河北大學(xué)學(xué)報(bào)(自然科學(xué)版);2014年01期

9 付忠良;;多標(biāo)簽代價(jià)敏感分類集成學(xué)習(xí)算法[J];自動(dòng)化學(xué)報(bào);2014年06期

10 張玲;白中英;羅守山;謝康;崔冠寧;孫茂華;;基于粗糙集和人工免疫的集成入侵檢測(cè)模型[J];通信學(xué)報(bào);2013年09期

相關(guān)博士學(xué)位論文前1條

1 劉運(yùn);DDoS Flooding攻擊檢測(cè)技術(shù)研究[D];國(guó)防科學(xué)技術(shù)大學(xué);2011年

相關(guān)碩士學(xué)位論文前3條

1 劉敏捷;基于組合學(xué)習(xí)和主動(dòng)學(xué)習(xí)的蛋白質(zhì)關(guān)系抽取[D];大連理工大學(xué);2015年

2 張楠;數(shù)據(jù)挖掘在入侵檢測(cè)中的應(yīng)用研究[D];電子科技大學(xué);2015年

3 陳明旺;面向不平衡數(shù)據(jù)的支持向量機(jī)方法在入侵檢測(cè)中的應(yīng)用與研究[D];南京大學(xué);2011年

，

本文編號(hào)：2438217

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會(huì)員下載

Download by Member

本文鏈接：http://sikaile.net/guanlilunwen/ydhl/2438217.html

上一篇：重采樣與機(jī)器學(xué)習(xí)結(jié)合的防火墻鏈接動(dòng)態(tài)分配
下一篇：基于IPFIX的網(wǎng)絡(luò)流量日志系統(tǒng)

論文發(fā)表

·知網(wǎng)|萬方|維普|龍?jiān)磡省級(jí)|國(guó)家級(jí)|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于重采樣的級(jí)聯(lián)分類器入侵檢測(cè)研究