天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于分類器集成的網(wǎng)頁惡意代碼檢測(cè)研究

發(fā)布時(shí)間:2018-12-10 12:37
【摘要】:在這個(gè)互聯(lián)網(wǎng)飛速發(fā)展的時(shí)代,網(wǎng)絡(luò)不僅豐富了人們娛樂生活,也在各個(gè)方面為人們做出了巨大貢獻(xiàn),改進(jìn)了人們的生活。然而,網(wǎng)絡(luò)在為人們的生活帶來便捷的同時(shí)也帶來了隱患。不法分子在網(wǎng)絡(luò)的飛速發(fā)展中看到了可乘之機(jī),利用惡意代碼破壞網(wǎng)絡(luò)安全,謀取經(jīng)濟(jì)利益。政府和國家對(duì)于惡意代碼檢測(cè)越來越重視。惡意代碼檢測(cè)一般分為靜態(tài)檢測(cè)和動(dòng)態(tài)檢測(cè)兩種方法。靜態(tài)檢測(cè)[1]主要是基于規(guī)則和特征值匹配,提取網(wǎng)頁特征。動(dòng)態(tài)檢測(cè)[2]是通過在虛擬環(huán)境中運(yùn)行惡意代碼,根據(jù)惡意代碼的行為提取特征,本文主要是針對(duì)JavaScript惡意代碼[3],基于機(jī)器學(xué)習(xí)對(duì)惡意代碼檢測(cè)進(jìn)行研究。本文的主要工作和成果如下:1.本文對(duì)于混淆的JavaScript代碼用V8引擎編譯成機(jī)器碼[4],并針對(duì)惡意代碼特點(diǎn)將機(jī)器碼中的操作數(shù)分類簡化并與操作碼混合。對(duì)處理后的機(jī)器碼根據(jù)信息增益用Bi-Gram和Tri-Gram提取特征值。提出基于頻率、距離和互信息的方法對(duì)樣本處理找出斷點(diǎn),計(jì)算單個(gè)樣本變長N-gram特征。經(jīng)實(shí)驗(yàn)分析證實(shí),處理后的操作數(shù)和操作碼混合的特征提取能更細(xì)致的表達(dá)機(jī)器碼行為,并且通過變長N-Gram統(tǒng)計(jì)的特征能避免將有效序列分開的問題,提升了分類效果。2.在研究常見的分類算法和分類器集成算法的基礎(chǔ)上,針對(duì)輸入單一的問題,提出集成分類器輸入優(yōu)化[5],對(duì)輸入的數(shù)據(jù)集用不同方式處理,使得內(nèi)部多種分類器能針對(duì)性訓(xùn)練形成分類模型進(jìn)行集成[6]。并且通過加入次級(jí)分類器,將原本單層的分類器集成結(jié)構(gòu)變成多層次分類器集成,引入權(quán)重,給每個(gè)分類器設(shè)定不同的權(quán)重,通過訓(xùn)練,找出效果最好的權(quán)值分配。實(shí)驗(yàn)證明經(jīng)過多種優(yōu)化的多層次加權(quán)分類器集成有更好的分類效果。3.在以上算法研究的基礎(chǔ)上,設(shè)計(jì)并開發(fā)了在線惡意代碼檢測(cè)系統(tǒng)。用戶可以在線提交惡意腳本代碼或者網(wǎng)站地址,系統(tǒng)可以快速的進(jìn)行檢測(cè)。用戶可以提交檢測(cè)報(bào)告和查看別人提交的檢測(cè)報(bào)告。被系統(tǒng)檢測(cè)為惡意的代碼,系統(tǒng)會(huì)自動(dòng)保存到數(shù)據(jù)庫。
[Abstract]:In this era of rapid development of the Internet, the Internet not only enriches people's entertainment life, but also makes great contributions to people in all aspects, and improves people's lives. However, the network not only brings convenience to people's life, but also brings hidden trouble. In the rapid development of the network, lawbreakers see the opportunity to use malicious code to destroy network security and seek economic benefits. Governments and countries pay more and more attention to malicious code detection. Malicious code detection is generally divided into two methods: static detection and dynamic detection. Static detection [1] is mainly based on matching rules and feature values to extract page features. Dynamic detection [2] is by running malicious code in virtual environment, according to the behavior of malicious code to extract features, this paper is mainly aimed at JavaScript malicious code [3], based on machine learning to detect malicious code. The main work and results of this paper are as follows: 1. In this paper, the confused JavaScript code is compiled into machine code by V8 engine, and the Operand classification in machine code is simplified and mixed with the opcode according to the characteristics of malicious code. The eigenvalues are extracted by Bi-Gram and Tri-Gram according to the information gain of the processed machine code. A method based on frequency, distance and mutual information is proposed to find breakpoints for sample processing and to calculate the variable length N-gram features of a single sample. The experimental results show that the feature extraction of the mixture of operands and opcodes can express the behavior of machine code more carefully, and the problem of separating effective sequences can be avoided by the feature of variable length N-Gram statistics, and the classification effect is improved. 2. On the basis of studying common classification algorithms and classifier ensemble algorithms, aiming at the problem of single input, an integrated classifier input optimization [5] is proposed, and the input data sets are processed in different ways. Internal multiple classifiers can be trained to form a classification model for integration [6]. And by adding the secondary classifier, the original single-layer classifier integration structure is transformed into multi-level classifier integration, and the weight is introduced to set different weights for each classifier. Through training, the best weight distribution is found. Experiments show that multi-level weighted classifier ensemble has better classification effect. Based on the above algorithms, an online malicious code detection system is designed and developed. Users can submit malicious script code or site address online, the system can quickly detect. Users can submit test reports and view test reports submitted by others. Detected by the system as malicious code, the system will automatically save to the database.
【學(xué)位授予單位】:浙江工業(yè)大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:TP393.08

【參考文獻(xiàn)】

相關(guān)期刊論文 前10條

1 修揚(yáng);劉嘉勇;;基于操作碼序列頻率向量和行為特征向量的惡意軟件檢測(cè)[J];信息安全與通信保密;2016年09期

2 賀鳴;孫建軍;成穎;;基于樸素貝葉斯的文本分類研究綜述[J];情報(bào)科學(xué);2016年07期

3 張凱;王東安;李超;賈冰;;基于協(xié)同采樣主動(dòng)學(xué)習(xí)的惡意代碼檢測(cè)[J];高技術(shù)通訊;2016年05期

4 盧曉勇;陳木生;;基于隨機(jī)森林和欠采樣集成的垃圾網(wǎng)頁檢測(cè)[J];計(jì)算機(jī)應(yīng)用;2016年03期

5 廖國輝;劉嘉勇;;基于數(shù)據(jù)挖掘和機(jī)器學(xué)習(xí)的惡意代碼檢測(cè)方法[J];信息安全研究;2016年01期

6 付壘朋;張瀚;霍路陽;;基于多類特征的JavaScript惡意腳本檢測(cè)算法[J];模式識(shí)別與人工智能;2015年12期

7 向濤;李濤;趙雪專;李旭冬;;基于隨機(jī)森林的精確目標(biāo)檢測(cè)方法[J];計(jì)算機(jī)應(yīng)用研究;2016年09期

8 李盟;賈曉啟;王蕊;林東岱;;一種惡意代碼特征選取和建模方法[J];計(jì)算機(jī)應(yīng)用與軟件;2015年08期

9 徐青;朱焱;唐壽洪;;分析多類特征和欺詐技術(shù)檢測(cè)JavaScript惡意代碼[J];計(jì)算機(jī)應(yīng)用與軟件;2015年07期

10 宣以廣;周華;;基于字符熵的JavaScript代碼混淆自動(dòng)檢測(cè)方法[J];計(jì)算機(jī)應(yīng)用與軟件;2015年01期

相關(guān)博士學(xué)位論文 前3條

1 解男男;機(jī)器學(xué)習(xí)方法在入侵檢測(cè)中的應(yīng)用研究[D];吉林大學(xué);2015年

2 孫鑫;機(jī)器學(xué)習(xí)中特征選問題研究[D];吉林大學(xué);2013年

3 羅瑜;支持向量機(jī)在機(jī)器學(xué)習(xí)中的應(yīng)用研究[D];西南交通大學(xué);2007年

相關(guān)碩士學(xué)位論文 前3條

1 王宇恒;推薦系統(tǒng)中隨機(jī)森林算法的優(yōu)化與應(yīng)用[D];浙江大學(xué);2016年

2 李運(yùn);機(jī)器學(xué)習(xí)算法在數(shù)據(jù)挖掘中的應(yīng)用[D];北京郵電大學(xué);2015年

3 李洋;基于機(jī)器學(xué)習(xí)的網(wǎng)頁惡意代碼檢測(cè)技術(shù)研究[D];西安電子科技大學(xué);2013年



本文編號(hào):2370581

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/guanlilunwen/ydhl/2370581.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶c8c71***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com