PE病毒文件聚類技術(shù)研究與實(shí)現(xiàn)

發(fā)布時(shí)間：2018-12-14 08:28

【摘要】：當(dāng)前,互聯(lián)網(wǎng)已經(jīng)成為人們生活中不可或缺的一部分。但互聯(lián)網(wǎng)給人們生活帶來便利的同時(shí),互聯(lián)網(wǎng)安全問題就像懸在頭上的一把利劍,隨時(shí)都可能對(duì)社會(huì)生活造成巨大的危害。由于Windows操作系統(tǒng)還是主流的操作系統(tǒng),所以PE病毒的危害范圍也是最廣的。而且每年新出現(xiàn)病毒的數(shù)量急劇增多,安全廠商應(yīng)接不暇。因此,將PE病毒文件按照其所屬家族自動(dòng)化聚類的研究有重要的現(xiàn)實(shí)意義。本文針對(duì)從PE病毒文件中提取靜態(tài)特征時(shí)沒有考慮其n-gram時(shí)序特征的問題,在分析Word2vec原理的基礎(chǔ)上,提出了 PE病毒文件時(shí)序特征提取算法。研究了 PE文件結(jié)構(gòu)和聚類算法原理,設(shè)計(jì)了 PE病毒文件聚類系統(tǒng),并使用該系統(tǒng)對(duì)本文中提出的算法進(jìn)行了驗(yàn)證。本文的主要研究?jī)?nèi)容及成果如下:(1)分析了從PE病毒文件提取特征時(shí)沒有考慮其時(shí)序特征的問題,提出了 PE病毒文件時(shí)序特征提取算法。目前PE文件提取靜態(tài)特征的研究集中在使用信息增益選擇n-gram特征以及提取API函數(shù)調(diào)用、字符串信息等,忽略了其時(shí)序特征。因此,本文在詳細(xì)分析了 PE文件結(jié)構(gòu)的基礎(chǔ)上提出了一種時(shí)序特征提取算法。(2)設(shè)計(jì)并實(shí)現(xiàn)了 PE病毒文件時(shí)序特征提取算法。本文中采用Word2vec將PE文件的n-gram詞轉(zhuǎn)換成詞向量,之后使用詞向量作為衡量詞與詞間相似度的依據(jù),通過K-means算法將上下文語義相近的詞劃分為一類,以降低時(shí)序特征向量的維度。(3)設(shè)計(jì)并實(shí)現(xiàn)了 PE病毒文件聚類系統(tǒng)。該系統(tǒng)主要有兩部分組成,第一部分是對(duì)時(shí)序特征有效性的驗(yàn)證,采用的是SGD多分類算法,第二部分是將時(shí)序特征應(yīng)用到PE病毒文件的聚類中,并對(duì)比了 K-means和密度峰值算法的聚類效果。(4)綜合評(píng)測(cè)了本文提出PE病毒文件聚類系統(tǒng)。使用了一批病毒樣本對(duì)本文設(shè)計(jì)的PE病毒文件聚類系統(tǒng)進(jìn)行了測(cè)試,測(cè)試結(jié)果顯示該系統(tǒng)達(dá)到了預(yù)期的聚類效果,時(shí)序特征提取算法具有一定的實(shí)用性。
[Abstract]:At present, the Internet has become an indispensable part of people's lives. However, the Internet brings convenience to people's life at the same time, Internet security is like a sword hanging on the head, which may cause great harm to social life at any time. Because the Windows operating system is still the mainstream operating system, so the scope of PE virus is also the most extensive. And every year the number of new viruses increased dramatically, security manufacturers are overwhelmed. Therefore, it is of great practical significance to study the automatic clustering of PE virus files according to their families. In order to solve the problem of extracting static features from PE virus files without considering their n-gram temporal features, this paper proposes an algorithm for extracting temporal features of PE virus files based on the analysis of Word2vec principle. This paper studies the structure of PE files and the principle of clustering algorithm, designs the PE virus file clustering system, and verifies the algorithm proposed in this paper. The main contents and achievements of this paper are as follows: (1) after analyzing the problem that the temporal features of PE virus files are not considered, an algorithm for extracting temporal features of PE virus files is proposed. At present, the research on extracting static features of PE files focuses on the use of information gain to select n-gram features and extract API function calls, string information, etc. Therefore, based on the detailed analysis of the structure of PE files, a timing feature extraction algorithm is proposed. (2) A timing feature extraction algorithm for PE virus files is designed and implemented. In this paper, Word2vec is used to convert n-gram words in PE files into word vectors, and word vectors are then used as the basis for measuring the similarity between words and words. By using K-means algorithm, the words with similar context and semantics are divided into a class. In order to reduce the dimension of temporal feature vector. (3) the PE virus file clustering system is designed and implemented. The system consists of two parts. The first part is to verify the validity of temporal features, and the SGD multi-classification algorithm is used. The second part is to apply the temporal features to the clustering of PE virus files. The clustering effects of K-means and peak density algorithm are compared. (4) A PE virus file clustering system is proposed in this paper. A group of virus samples are used to test the PE virus file clustering system designed in this paper. The test results show that the system achieves the expected clustering effect and the timing feature extraction algorithm is practical.
【學(xué)位授予單位】：北京郵電大學(xué)
【學(xué)位級(jí)別】：碩士
【學(xué)位授予年份】：2016
【分類號(hào)】：TP393.08;TP311.13

【參考文獻(xiàn)】

相關(guān)期刊論文前8條

1 韓蘭勝;高昆侖;趙保華;趙東艷;王于波;金文德;;基于API函數(shù)及其參數(shù)相結(jié)合的惡意軟件行為檢測(cè)[J];計(jì)算機(jī)應(yīng)用研究;2013年11期

2 趙躍華;林聚偉;;面向海量病毒樣本家族聚類方法的研究[J];計(jì)算機(jī)工程與應(yīng)用;2014年18期

3 王蕊;馮登國(guó);楊軼;蘇璞睿;;基于語義的惡意代碼行為特征提取及檢測(cè)方法[J];軟件學(xué)報(bào);2012年02期

4 王維;張鵬濤;譚營(yíng);何新貴;;一種基于人工免疫和代碼相關(guān)性的計(jì)算機(jī)病毒特征提取方法[J];計(jì)算機(jī)學(xué)報(bào);2011年02期

5 左黎明;劉二根;徐保根;湯鵬志;;惡意代碼族群特征提取與分析技術(shù)[J];華中科技大學(xué)學(xué)報(bào)(自然科學(xué)版);2010年04期

6 樊震;楊秋翔;;基于PE文件結(jié)構(gòu)異常的未知病毒檢測(cè)[J];計(jì)算機(jī)技術(shù)與發(fā)展;2009年10期

7 王成;龐建民;趙榮彩;王強(qiáng);;基于可疑行為識(shí)別的PE病毒檢測(cè)方法[J];計(jì)算機(jī)工程;2009年15期

8 陳學(xué)進(jìn);;數(shù)據(jù)挖掘中聚類分析的研究[J];計(jì)算機(jī)技術(shù)與發(fā)展;2006年09期

相關(guān)博士學(xué)位論文前2條

1 唐東明;聚類分析及其應(yīng)用研究[D];電子科技大學(xué);2010年

2 趙恒;數(shù)據(jù)挖掘中聚類若干問題研究[D];西安電子科技大學(xué);2005年

相關(guān)碩士學(xué)位論文前7條

1 劉旭;惡意代碼的檢測(cè)技術(shù)研究[D];吉林大學(xué);2014年

2 屈亞鑫;反木馬系統(tǒng)中程序行為分析關(guān)鍵技術(shù)研究與實(shí)現(xiàn)[D];北京郵電大學(xué);2014年

3 雷遲駿;基于啟發(fā)式算法的惡意代碼檢測(cè)系統(tǒng)研究與實(shí)現(xiàn)[D];南京郵電大學(xué);2012年

4 鄒夢(mèng)松;計(jì)算機(jī)病毒行為檢測(cè)方法研究[D];華中科技大學(xué);2011年

5 洪群業(yè);基于分類的未知PE病毒檢測(cè)技術(shù)的研究[D];重慶大學(xué);2010年

6 吳曉丹;反病毒虛擬機(jī)關(guān)鍵技術(shù)研究[D];中國(guó)科學(xué)技術(shù)大學(xué);2009年

7 周昭濤;文本聚類分析效果評(píng)價(jià)及文本表示研究[D];中國(guó)科學(xué)院研究生院（計(jì)算技術(shù)研究所）;2005年

，

本文編號(hào)：2378291

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會(huì)員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2378291.html

上一篇：元數(shù)據(jù)國(guó)際交換共享的客家古民居數(shù)字記憶工程建設(shè)
下一篇：組合曲面重構(gòu)技術(shù)與數(shù)控加工實(shí)驗(yàn)研究

論文發(fā)表

·知網(wǎng)|萬方|維普|龍?jiān)磡省級(jí)|國(guó)家級(jí)|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

PE病毒文件聚類技術(shù)研究與實(shí)現(xiàn)