PE病毒文件聚類技術(shù)研究與實現(xiàn)
[Abstract]:At present, the Internet has become an indispensable part of people's lives. However, the Internet brings convenience to people's life at the same time, Internet security is like a sword hanging on the head, which may cause great harm to social life at any time. Because the Windows operating system is still the mainstream operating system, so the scope of PE virus is also the most extensive. And every year the number of new viruses increased dramatically, security manufacturers are overwhelmed. Therefore, it is of great practical significance to study the automatic clustering of PE virus files according to their families. In order to solve the problem of extracting static features from PE virus files without considering their n-gram temporal features, this paper proposes an algorithm for extracting temporal features of PE virus files based on the analysis of Word2vec principle. This paper studies the structure of PE files and the principle of clustering algorithm, designs the PE virus file clustering system, and verifies the algorithm proposed in this paper. The main contents and achievements of this paper are as follows: (1) after analyzing the problem that the temporal features of PE virus files are not considered, an algorithm for extracting temporal features of PE virus files is proposed. At present, the research on extracting static features of PE files focuses on the use of information gain to select n-gram features and extract API function calls, string information, etc. Therefore, based on the detailed analysis of the structure of PE files, a timing feature extraction algorithm is proposed. (2) A timing feature extraction algorithm for PE virus files is designed and implemented. In this paper, Word2vec is used to convert n-gram words in PE files into word vectors, and word vectors are then used as the basis for measuring the similarity between words and words. By using K-means algorithm, the words with similar context and semantics are divided into a class. In order to reduce the dimension of temporal feature vector. (3) the PE virus file clustering system is designed and implemented. The system consists of two parts. The first part is to verify the validity of temporal features, and the SGD multi-classification algorithm is used. The second part is to apply the temporal features to the clustering of PE virus files. The clustering effects of K-means and peak density algorithm are compared. (4) A PE virus file clustering system is proposed in this paper. A group of virus samples are used to test the PE virus file clustering system designed in this paper. The test results show that the system achieves the expected clustering effect and the timing feature extraction algorithm is practical.
【學(xué)位授予單位】:北京郵電大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2016
【分類號】:TP393.08;TP311.13
【參考文獻】
相關(guān)期刊論文 前8條
1 韓蘭勝;高昆侖;趙保華;趙東艷;王于波;金文德;;基于API函數(shù)及其參數(shù)相結(jié)合的惡意軟件行為檢測[J];計算機應(yīng)用研究;2013年11期
2 趙躍華;林聚偉;;面向海量病毒樣本家族聚類方法的研究[J];計算機工程與應(yīng)用;2014年18期
3 王蕊;馮登國;楊軼;蘇璞睿;;基于語義的惡意代碼行為特征提取及檢測方法[J];軟件學(xué)報;2012年02期
4 王維;張鵬濤;譚營;何新貴;;一種基于人工免疫和代碼相關(guān)性的計算機病毒特征提取方法[J];計算機學(xué)報;2011年02期
5 左黎明;劉二根;徐保根;湯鵬志;;惡意代碼族群特征提取與分析技術(shù)[J];華中科技大學(xué)學(xué)報(自然科學(xué)版);2010年04期
6 樊震;楊秋翔;;基于PE文件結(jié)構(gòu)異常的未知病毒檢測[J];計算機技術(shù)與發(fā)展;2009年10期
7 王成;龐建民;趙榮彩;王強;;基于可疑行為識別的PE病毒檢測方法[J];計算機工程;2009年15期
8 陳學(xué)進;;數(shù)據(jù)挖掘中聚類分析的研究[J];計算機技術(shù)與發(fā)展;2006年09期
相關(guān)博士學(xué)位論文 前2條
1 唐東明;聚類分析及其應(yīng)用研究[D];電子科技大學(xué);2010年
2 趙恒;數(shù)據(jù)挖掘中聚類若干問題研究[D];西安電子科技大學(xué);2005年
相關(guān)碩士學(xué)位論文 前7條
1 劉旭;惡意代碼的檢測技術(shù)研究[D];吉林大學(xué);2014年
2 屈亞鑫;反木馬系統(tǒng)中程序行為分析關(guān)鍵技術(shù)研究與實現(xiàn)[D];北京郵電大學(xué);2014年
3 雷遲駿;基于啟發(fā)式算法的惡意代碼檢測系統(tǒng)研究與實現(xiàn)[D];南京郵電大學(xué);2012年
4 鄒夢松;計算機病毒行為檢測方法研究[D];華中科技大學(xué);2011年
5 洪群業(yè);基于分類的未知PE病毒檢測技術(shù)的研究[D];重慶大學(xué);2010年
6 吳曉丹;反病毒虛擬機關(guān)鍵技術(shù)研究[D];中國科學(xué)技術(shù)大學(xué);2009年
7 周昭濤;文本聚類分析效果評價及文本表示研究[D];中國科學(xué)院研究生院(計算技術(shù)研究所);2005年
,本文編號:2378291
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2378291.html