面向網(wǎng)絡(luò)入侵檢測的數(shù)據(jù)樣本綜合處理方法
本文選題:入侵檢測 + 不平衡數(shù)據(jù); 參考:《浙江工業(yè)大學(xué)》2014年碩士論文
【摘要】:入侵檢測作為一個十分有效且重要的主動安全防御技術(shù),長久以來一直是學(xué)者熱點研究的前沿課題。訓(xùn)練數(shù)據(jù)的組成和優(yōu)劣直接決定了分類模型的有效性、精確度和可伸縮性,從而影響整個入侵檢測系統(tǒng)的性能。通過檢測網(wǎng)絡(luò)獲得的訓(xùn)練數(shù)據(jù)具有海量、不平衡、噪音大等特點,給入侵檢測系統(tǒng)的實時性和準(zhǔn)確性帶來了一定挑戰(zhàn)。因此,構(gòu)造入侵檢測分類模型前,高效的樣本綜合預(yù)處理十分必要。 網(wǎng)絡(luò)環(huán)境的特殊性對預(yù)處理提出了特殊的要求。網(wǎng)絡(luò)樣本的不斷產(chǎn)生使得已知分布率無法直接應(yīng)用于數(shù)據(jù)挖掘的不平衡處理;樣本數(shù)量過大給壓縮處理本身帶來了麻煩,此外樣本內(nèi)的類別不平衡極大地影響了壓縮處理的準(zhǔn)確率。由此針對網(wǎng)絡(luò)數(shù)據(jù)的預(yù)處理必須采取結(jié)合處理。 本文將從兩個方面對樣本進行預(yù)處理:(1)利用與分布率不相關(guān)的K-S統(tǒng)計分割數(shù)據(jù)集,降低每個數(shù)據(jù)子集的不平衡程度,減少類別不平衡對分類規(guī)則的影響。實驗結(jié)果表明該方法能夠提高不平衡數(shù)據(jù)分類問題的準(zhǔn)確性和效率。(2)改進Affinity Propagation聚類算法,與簇中心距離較近的樣本采取直接關(guān)聯(lián)的方法,減少聚類樣本數(shù)量,降低時空消耗。并依據(jù)關(guān)聯(lián)結(jié)果,不斷調(diào)整模型,精確聚類結(jié)果。實驗表明該方法能夠有效地降低聚類算法的時空代價,同時保持較好的數(shù)據(jù)壓縮結(jié)果。 最后結(jié)合不平衡數(shù)據(jù)處理及樣本數(shù)據(jù)壓縮方法,設(shè)計獨立于分類學(xué)習(xí)的預(yù)處理算法,構(gòu)建一個輕量級網(wǎng)絡(luò)安全入侵檢測模型。為檢驗該模型的有效性,使用KDD99數(shù)據(jù)集進行實驗,并采用不同分類方法學(xué)習(xí),以測試模型的適用性。實驗結(jié)果表明,本文提出的模型在3種分類器下入侵檢測時間性能和準(zhǔn)確精度都得到了有效提升。且該模型能以較優(yōu)的時空性能對大數(shù)據(jù)進行預(yù)處理,并可以依據(jù)實際需求選擇相應(yīng)分類方法,具有實際可用性。
[Abstract]:As a very effective and important active security defense technology, intrusion detection has long been a hot topic in the hot research of scholars. The composition and advantages of training data directly determine the effectiveness, accuracy and scalability of the classification model, thus affecting the performance of the entire intrusion detection system. The practice of data has the characteristics of mass, unbalance, noise and so on. It brings some challenges to the real-time and accuracy of intrusion detection system. So, before constructing the intrusion detection classification model, the efficient sample comprehensive preprocessing is very necessary.
The particularity of the network environment puts forward special requirements for preprocessing. The continuous generation of network samples makes the known distribution not directly applied to the unbalanced processing of data mining; the large number of samples brings trouble to the compression processing itself, and the classification imbalances in the sample greatly affect the accuracy of the compression processing. The pretreatment of network data must be combined.
This article will preprocess the sample from two aspects: (1) using the K-S statistics that is not related to the distribution rate to divide the data set to reduce the imbalance degree of each subset of the data and reduce the influence of the category imbalance on the classification rules. The experimental results show that the method can improve the accuracy and efficiency of the problem of disequilibrium data classification. (2) improve the Affi Nity Propagation clustering algorithm, which is directly related to the nearest cluster center, reduces the number of cluster samples and reduces the time and space consumption. According to the correlation results, the model is constantly adjusted and the results of clustering are adjusted. The experiment shows that the method can effectively reduce the time and space cost of the low clustering algorithm and keep good data compression at the same time. Result.
Finally, combining the method of unbalanced data processing and sample data compression, a pre processing algorithm independent of classification learning is designed and a lightweight network security intrusion detection model is built. In order to test the validity of the model, the KDD99 data set is used to experiment, and the different classification methods are used to test the applicability of the model. It shows that the proposed model can effectively improve the time performance and accuracy of intrusion detection under the 3 classifiers. And the model can preprocess large data with better temporal and spatial performance, and can select the corresponding classification method according to the actual requirements. It has practical availability.
【學(xué)位授予單位】:浙江工業(yè)大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2014
【分類號】:TP393.08
【參考文獻】
相關(guān)期刊論文 前10條
1 王麗娜,董曉梅,郭曉淳,于戈;基于數(shù)據(jù)挖掘的網(wǎng)絡(luò)數(shù)據(jù)庫入侵檢測系統(tǒng)[J];東北大學(xué)學(xué)報;2003年03期
2 羅敏,王麗娜,張煥國;基于無監(jiān)督聚類的入侵檢測方法[J];電子學(xué)報;2003年11期
3 張軍;季偉東;韓振強;;基于主機和網(wǎng)絡(luò)的入侵檢測技術(shù)的比較與分析[J];哈爾濱師范大學(xué)自然科學(xué)學(xué)報;2006年02期
4 陳仕濤;陳國龍;郭文忠;劉延華;;基于粒子群優(yōu)化和鄰域約簡的入侵檢測日志數(shù)據(jù)特征選擇[J];計算機研究與發(fā)展;2010年07期
5 周荃;王崇駿;王王君;陳世福;;PC4.5:用于不均衡數(shù)據(jù)集的C4.5改進算法[J];計算機輔助工程;2006年03期
6 陳鵬,呂衛(wèi)鋒,單征;基于網(wǎng)絡(luò)的入侵檢測方法研究[J];計算機工程與應(yīng)用;2001年19期
7 單松巍,馮是聰,李曉明;幾種典型特征選取方法在中文網(wǎng)頁分類上的效果比較[J];計算機工程與應(yīng)用;2003年22期
8 楊向榮,宋擒豹,沈鈞毅;基于數(shù)據(jù)挖掘的智能化入侵檢測系統(tǒng)[J];計算機工程;2001年09期
9 李炎,李皓,錢肖魯,朱揚勇;異常檢測算法分析[J];計算機工程;2002年06期
10 李雄飛;李軍;董元方;屈成偉;;一種新的不平衡數(shù)據(jù)學(xué)習(xí)算法PCBoost[J];計算機學(xué)報;2012年02期
,本文編號:1914716
本文鏈接:http://sikaile.net/guanlilunwen/ydhl/1914716.html