基于Active SVM算法的惡意網(wǎng)頁檢測技術研究
發(fā)布時間:2018-11-11 18:03
【摘要】:網(wǎng)絡時代,以腳本語言和瀏覽器插件技術為基礎的新興應用層見疊出,但是伴隨著享受這些應用帶來的方便和快捷的同時,我們也發(fā)現(xiàn),信息泄露、信息竊取、數(shù)據(jù)篡改、數(shù)據(jù)刪添、計算機病毒等等各種人為攻擊也越來越肆虐。 針對Web威脅的網(wǎng)絡攻擊是網(wǎng)民受到的最主要的攻擊。攻擊者通過精心構造攻擊代碼,利用瀏覽器或者第三方插件的漏洞,達到攻擊目的。 惡意代碼編寫者開發(fā)出大量惡意代碼,并通過多種混淆手段對惡意腳本進行混淆和變形,逃避以特征碼檢測技術為主代表的惡意代碼檢測,其中尤其以JavaScript混淆代碼為巨。各種混淆方式的應用產生了大量惡意代碼的變種,借由因特網(wǎng)的時效性,迅捷性以廣泛撒網(wǎng)式的傳播方式威脅網(wǎng)民信息安全。這大大干擾了惡意代碼的檢測,成為整個web惡意代碼中最為艱難的防御點。如何將此類攻擊阻擋于我們計算機之外,保護網(wǎng)民的各類信息不受威脅,是當今社會亟待解決的問題,也是網(wǎng)絡安全專家們前仆后繼想要有所突破的問題。 論文主要研究了JavaScript混淆技術,提出了基于TF-IDF算法的特征提取,加入文本分類中的權重分析,使得對JavaScript腳本的特征抽取更科學,并且實驗表明,基于TF-IDF的特征提取比傳統(tǒng)的特征提取方法性能有很大提升。本文還將監(jiān)督學習傳統(tǒng)SVM的不足進行改進,提出了機器學習中主動學習策略,來簡化人工操作,提高效率,實現(xiàn)系統(tǒng)的高度智能化,實驗證明,基于Active SVM的惡意網(wǎng)頁檢測系統(tǒng)能在更少的樣本標注,更少的人力投入情況下達到更好的性能。
[Abstract]:In the era of network, new applications based on scripting language and browser plug-in technology have emerged, but along with the convenience and speed brought by these applications, we also find that information disclosure, information theft, data tampering, Data deletion, computer viruses and other human attacks are more and more rampant. The network attack against Web threat is the most important attack to netizens. Attackers exploit vulnerabilities in browsers or third-party plug-ins by crafting attack code. Malicious code writers develop a large number of malicious code, and through a variety of obfuscation means to obfuscation and deformation of malicious scripts, to escape the signature detection technology represented by malicious code detection, especially the JavaScript obfuscation code as a giant. The application of various confusion methods has produced a large number of malicious code variants, by the timeliness of the Internet, the rapid spread of a wide spread of Internet users to threaten the security of information. This greatly interferes with the detection of malicious code and becomes the most difficult defense point in the whole web malicious code. How to block such attacks outside our computer and protect all kinds of information of Internet users from threats is a problem to be solved urgently in today's society, and it is also a problem that network security experts want to break through one after another. This paper mainly studies the JavaScript obfuscation technology, proposes the feature extraction based on the TF-IDF algorithm, adds the weight analysis in the text classification, makes the feature extraction of the JavaScript script more scientific, and the experiment shows that, The performance of feature extraction based on TF-IDF is much better than that of traditional feature extraction methods. In this paper, the shortcomings of traditional SVM are improved, and the active learning strategy in machine learning is put forward to simplify manual operation, improve efficiency and realize high intelligence of the system. The malicious web page detection system based on Active SVM can achieve better performance with less sample tagging and less manpower input.
【學位授予單位】:南京理工大學
【學位級別】:碩士
【學位授予年份】:2014
【分類號】:TP393.08
本文編號:2325691
[Abstract]:In the era of network, new applications based on scripting language and browser plug-in technology have emerged, but along with the convenience and speed brought by these applications, we also find that information disclosure, information theft, data tampering, Data deletion, computer viruses and other human attacks are more and more rampant. The network attack against Web threat is the most important attack to netizens. Attackers exploit vulnerabilities in browsers or third-party plug-ins by crafting attack code. Malicious code writers develop a large number of malicious code, and through a variety of obfuscation means to obfuscation and deformation of malicious scripts, to escape the signature detection technology represented by malicious code detection, especially the JavaScript obfuscation code as a giant. The application of various confusion methods has produced a large number of malicious code variants, by the timeliness of the Internet, the rapid spread of a wide spread of Internet users to threaten the security of information. This greatly interferes with the detection of malicious code and becomes the most difficult defense point in the whole web malicious code. How to block such attacks outside our computer and protect all kinds of information of Internet users from threats is a problem to be solved urgently in today's society, and it is also a problem that network security experts want to break through one after another. This paper mainly studies the JavaScript obfuscation technology, proposes the feature extraction based on the TF-IDF algorithm, adds the weight analysis in the text classification, makes the feature extraction of the JavaScript script more scientific, and the experiment shows that, The performance of feature extraction based on TF-IDF is much better than that of traditional feature extraction methods. In this paper, the shortcomings of traditional SVM are improved, and the active learning strategy in machine learning is put forward to simplify manual operation, improve efficiency and realize high intelligence of the system. The malicious web page detection system based on Active SVM can achieve better performance with less sample tagging and less manpower input.
【學位授予單位】:南京理工大學
【學位級別】:碩士
【學位授予年份】:2014
【分類號】:TP393.08
【參考文獻】
相關期刊論文 前7條
1 王曉丹,王積勤;支持向量機訓練和實現(xiàn)算法綜述[J];計算機工程與應用;2004年13期
2 段丹青;陳松喬;楊衛(wèi)平;;網(wǎng)絡入侵檢測中的支持向量機主動學習算法[J];計算機工程與應用;2006年01期
3 奉國和;;SVM分類核函數(shù)及參數(shù)選擇比較[J];計算機工程與應用;2011年03期
4 施聰鶯;徐朝軍;楊曉江;;TFIDF算法研究綜述[J];計算機應用;2009年S1期
5 賀慧;王俊義;;主動支持向量機的研究及其在蒙文文本分類中的應用[J];內蒙古大學學報(自然科學版);2006年05期
6 凌俊斌;莊衛(wèi)華;劉魯西;;圖像檢索中的主動學習及其可測量性[J];計算機技術與發(fā)展;2006年02期
7 康松林;胡賜元;孫永新;;基于蜜罐在線惡意網(wǎng)頁檢測系統(tǒng)研究與設計[J];計算機系統(tǒng)應用;2010年02期
,本文編號:2325691
本文鏈接:http://sikaile.net/guanlilunwen/ydhl/2325691.html
最近更新
教材專著