面向網(wǎng)絡入侵檢測的串匹配算法優(yōu)化
本文選題:網(wǎng)絡安全 + 入侵檢測系統(tǒng); 參考:《哈爾濱工業(yè)大學》2014年博士論文
【摘要】:互聯(lián)網(wǎng)已經(jīng)成為人們?nèi)粘I钪胁豢苫蛉钡囊粋部分。伴隨而來的是日趨猖獗的網(wǎng)絡犯罪對網(wǎng)絡節(jié)點上主機的信息安全的威脅。出于不同的目的,網(wǎng)絡使用者可以利用一定的技術(shù)或者社會手段,非法進入個人主機、盜取個人賬戶密碼甚至是對網(wǎng)站進行協(xié)作攻擊。建立一個堅固的預防、阻斷網(wǎng)絡犯罪的安全體系至關重要。入侵檢測系統(tǒng)就是網(wǎng)絡安全體系中不可或缺的重要組組成部分,其主要處理對象就是來自網(wǎng)絡的數(shù)據(jù)包。入侵檢測系統(tǒng)會對網(wǎng)絡數(shù)據(jù)包和已知的數(shù)據(jù)庫進行比對,根據(jù)已掌握的數(shù)據(jù)將數(shù)據(jù)包中的信息歸類為入侵行為或者安全行為。針對入侵檢測系統(tǒng)監(jiān)測到的入侵行為,還可以將這些入侵告警進行分析進一步挖掘告警之間的邏輯聯(lián)系,揭示隱含的攻擊過程用于定位攻擊源或者為后續(xù)檢測提供知識儲備。為了確保能有效地提取網(wǎng)絡數(shù)據(jù)包中的信息需要高效的串匹配算法提供技術(shù)支持。然而隨著網(wǎng)絡技術(shù)的發(fā)展,各種新的攻擊類型也層出不窮,這也使得串匹配自動機需要維護的特征集的規(guī)模越來越大,直接降低了入侵檢測系統(tǒng)的性能。 基于這樣的背景,本文的研究目標是構(gòu)造支持大規(guī)模特征集合的串匹配算法,并且能夠找到對現(xiàn)有串匹配算法進行速度提升的方法。具體地,本文著重從以下幾個方面、不同層次對串匹配的優(yōu)化方法進行了深入的研究: (1)特征集規(guī)模過大導致匹配自動機無法正常構(gòu)建限制了入侵檢測系統(tǒng)能夠識別的有害信息的數(shù)量。本文特別針對大特征集文本匹配問題,提出一種混合匹配自動機模型。該模型旨在通過長度對特征進行分類,構(gòu)造更加緊湊的匹配自動機。通過將具有不同長度范圍的特征分布到不同的自動機中,可以采用不同的方式對文本進行匹配。本文同時依據(jù)混合匹配自動機的結(jié)構(gòu)特點設計了兩種獨立的文本過濾策略,即逆向狀態(tài)映射狀態(tài)轉(zhuǎn)移方法(SLSPM算法)和二級AVL過濾策略(BCHDFA算法)。兩種技術(shù)分別利用“將‘自動機狀態(tài)-讀入文本信息’的映射關系進行翻轉(zhuǎn)”和“借助AVL樹對特征段中特定位置上的雙字符文本進行兩次過濾”達到減少將文本塊與特征塊進行驗證的次數(shù)的目的,從而抑制混合自動機匹配速度降低的趨勢。 (2)針對URL等長特征,本文結(jié)合SSE指令提出了兩種專門面向16字節(jié)數(shù)據(jù)的匹配算法SSEHash和RTRIE,這兩種算法旨在提高長特征的匹配速度。SSEHash算法中,通過2個指令周期共16個加法運算和一個24比特模運算能夠?qū)?6字節(jié)文本比較均勻地分布到24比特的數(shù)據(jù)中。由于采用SSE指令進行計算,使得SSEHash算法和常規(guī)意義的散列算法相比需要更短的處理時間,能夠更快地過濾文本信息。為了解決大規(guī)模特征集環(huán)境下Wu-Manber類算法的移動距離較短的問題,本文結(jié)合SSE指令提出了一種反轉(zhuǎn)自動機RTRIE。該算法提取每條長特征(長度至少16)的16字節(jié)前綴中的所有后綴子串的逆向串的指紋,并為每個指紋計算其安全移動距離。該算法和通常的Wu-Manber類算法相比在計算文本指針移動距離時考慮的文本字符數(shù)目更多,這樣可以盡量避免由于特征規(guī)模增加而導致的指針移動距離的降低。 (3)為了提高表達式匹配性能消除無用表達式對內(nèi)存的占用,本文分析了表達式的各種包含關系并提出了BitCount算法的優(yōu)化算法MaskVeri以及表達式冗余消除算法KPGEM。為了不遺漏任何可能的表達式的包含關系,本文借助表達式中出現(xiàn)的各個短關鍵字的位置以及關鍵字的長度等信息限定了各關鍵字的前驅(qū)后繼關系。借助這種關系可以定義針對每條表達式的關鍵字路徑圖。通過對關鍵字路徑圖的遍歷KPGEM算法可以分析出當前表達式中隱含的其它表達式,并正確地刪除表達式中冗余的關鍵字或者包含其它表達式的表達式。 (4)基于串匹配算法的數(shù)據(jù)包檢測部件是入侵檢測系統(tǒng)中的重要功能部件。在此基礎上才能對通過串匹配模塊獲得的網(wǎng)絡告警日志進行告警關聯(lián)分析。本文在最后一部分實現(xiàn)了一個入侵告警事件關聯(lián)系統(tǒng)。該系統(tǒng)通過對告警中的重復結(jié)構(gòu)進行保持語義的序列最小化處理,使得串匹配算法捕獲的告警得到大幅精簡。系統(tǒng)中采用D-S證據(jù)理論對宏觀分析方法(序列挖掘)和微觀分析方法(權(quán)能提升推理)二者處理的結(jié)果進行融合,,并通過信任能量計算公式對經(jīng)過D-S融合后的告警序列進行再處理,從獲得的頻繁序列中得到更能體現(xiàn)前后關系的攻擊告警序列。該系統(tǒng)能夠獲得有效的攻擊場景,并達到自動提取關鍵攻擊序列的目的。
[Abstract]:The Internet has become an indispensable part of people's daily life. It is accompanied by the threat that the increasingly rampant network crime threatens the information security of the host on the network nodes. For different purposes, users can use certain technical or social means to enter the personal host by non law and steal personal account passwords. The intrusion detection system is an important component part of the network security system, and its main object is the data packet from the network. The intrusion detection system will carry on the network data packet and the known data. According to the data that has been mastered, the data is classified as intrusion or security behavior according to the data already mastered. In view of the intrusion behavior monitored by the intrusion detection system, these Intrusion Alerts can be analyzed to further excavate the logical connection between the alarm and reveal the hidden attack process used to locate the source of the attack or to be after the attack. Continuous detection provides knowledge reserves. In order to ensure effective extraction of information in network packets, efficient string matching algorithms are required. However, with the development of network technology, various new types of attack are emerging, which makes the scale of the feature set that the string matching automata need to maintain is becoming more and more large and is directly reduced. The performance of the intrusion detection system.
Based on this background, the aim of this paper is to construct a series matching algorithm that supports a large set of features, and to find a way to speed up the current string matching algorithm.
(1) the size of the feature set is too large to limit the number of harmful information that the intrusion detection system can recognize. In this paper, a hybrid matching automaton model is proposed for the text matching problem of the great special collection. This model is designed to classify the characteristics by the length and construct a more compact matching automatically. By distributing features of different length range to different automata, the text can be matched in different ways. In this paper, two independent text filtering strategies are designed based on the structure characteristics of mixed matching automata, that is, reverse state mapping state transfer (SLSPM algorithm) and two level AVL filtering strategy. (BCHDFA algorithm). The two techniques use "the mapping relationship of the automata state reading text information" and "the two filtering with the help of the double character text on the specific location in the feature segment" to reduce the number of verifying the text block and the feature block by the AVL tree, thus restraining the mixed automata. The trend of decreasing speed.
(2) in view of the URL isometric feature, this paper presents two matching algorithms, SSEHash and RTRIE, which are specialized for 16 byte data in combination with the SSE instruction. The two algorithms are designed to improve the long feature matching speed.SSEHash algorithm with a total of 16 addition operations and a 24 specific mode operation in the 2 instruction cycles to distribute the 16 byte text more evenly. To 24 bits of data, because of the use of SSE instruction, the SSEHash algorithm needs shorter processing time compared with the conventional hash algorithm and can filter text information faster. In order to solve the problem of the shorter moving distance of the Wu-Manber class algorithm in the large model collection environment, this paper presents a kind of SSE instruction. This algorithm extracts the fingerprint of the reverse string of all the suffixes in the 16 byte prefix of each long feature (at least 16) and calculates its safe moving distance for each fingerprint. This algorithm and the usual Wu-Manber class algorithm consider the number of text characters in the calculation of the moving distance of the text pointer, so that the number of text characters is more. It is possible to avoid the reduction of pointer movement distance due to the increase of feature size.
(3) in order to improve the performance of expression matching to eliminate the use of useless expressions for memory, this paper analyzes the various inclusion relations of the expression and proposes an optimization algorithm MaskVeri of BitCount algorithm and the expression redundancy elimination algorithm KPGEM. in order to avoid missing any possible expression. The position of each short keyword and the length of the keyword limit the forward and subsequent relations of each keyword. By this relationship, the keyword path graph for each expression can be defined. By traversing the keyword path graph, the KPGEM algorithm can be used to analyze the other expressions implying in the current expression and delete it correctly. A redundant keyword in an expression or an expression containing other expressions.
(4) the packet detection unit based on string matching algorithm is an important functional component in the intrusion detection system. On this basis, the network alarm log obtained through the string matching module can be analyzed. In the last part, an intrusion alarm event closing system is implemented. The system is repeated in the alarm. The structure carries out the sequential minimization of the semantics, making the alarm captured by the string matching algorithm greatly streamlined. In the system, the D-S evidence theory is used to fuse the results of the macro analysis method (sequence mining) and the microanalysis method (the weight lifting reasoning), and the D-S fusion formula is used to fuse the D-S through the trust energy calculation formula. The alarm sequence is reprocessed to get the attack alarm sequence which is more able to reflect the front and back relations from the frequent sequences obtained. The system can obtain the effective attack scene and automatically extract the key attack sequence.
【學位授予單位】:哈爾濱工業(yè)大學
【學位級別】:博士
【學位授予年份】:2014
【分類號】:TP393.08
【參考文獻】
相關期刊論文 前10條
1 徐從富,耿衛(wèi)東,潘云鶴;面向數(shù)據(jù)融合的DS方法綜述[J];電子學報;2001年03期
2 宋麒;羅志宇;叢鵬;;SSE指令集在~(60)Co集裝箱CT系統(tǒng)圖像重建算法中的應用[J];核電子學與探測技術(shù);2007年01期
3 李小紅;基于SSE技術(shù)的H.264運動估計的并行處理[J];合肥學院學報(自然科學版);2005年03期
4 李成軍;周衛(wèi)峰;朱重光;;基于Intel SIMD指令的二維FFT優(yōu)化算法[J];計算機工程與應用;2007年05期
5 張慶丹;戴正華;馮圣中;孫凝暉;;基于GPU的串匹配算法研究[J];計算機應用;2006年07期
6 唐定車;劉任任;譚建龍;;基于統(tǒng)一計算設備架構(gòu)的并行串匹配算法[J];計算機應用;2009年S1期
7 羅若愚,魯強,曾紹群;在醫(yī)學圖像處理中使用MMX及SSE指令[J];計算機應用研究;2005年01期
8 曹京;譚建龍;劉萍;郭莉;;布爾表達式匹配問題研究[J];計算機應用研究;2007年09期
9 何文華;;基于海量數(shù)據(jù)的多模式匹配算法研究[J];計算機應用與軟件;2012年04期
10 宋云;龍際珍;;規(guī)則數(shù)量無關的多布爾表達式匹配算法[J];軟件導刊;2012年03期
本文編號:1796080
本文鏈接:http://sikaile.net/guanlilunwen/ydhl/1796080.html