基于數(shù)據(jù)挖掘和機(jī)器學(xué)習(xí)的木馬檢測(cè)系統(tǒng)設(shè)計(jì)與實(shí)現(xiàn)
發(fā)布時(shí)間:2018-03-16 12:02
本文選題:網(wǎng)頁(yè)木馬 切入點(diǎn):JavaScript 出處:《電子科技大學(xué)》2014年碩士論文 論文類(lèi)型:學(xué)位論文
【摘要】:計(jì)算機(jī)網(wǎng)絡(luò)正在改變著人們的生活方式,但由于網(wǎng)絡(luò)存在開(kāi)放性、互聯(lián)性等特征,致使網(wǎng)絡(luò)容易導(dǎo)致不法分子的攻擊,這使得網(wǎng)絡(luò)安全吸引著越來(lái)越多人的關(guān)注。其中,網(wǎng)頁(yè)木馬已經(jīng)稱(chēng)為網(wǎng)絡(luò)安全的頭號(hào)殺手,病毒傳播、非法入侵、服務(wù)器癱瘓等安全問(wèn)題都是以木馬為載體所引起的。傳統(tǒng)的基于模式匹配的檢測(cè)方法是當(dāng)前安全檢測(cè)系統(tǒng)使用最多的方法,它主要依賴(lài)于人工分析提取,不能夠預(yù)測(cè)未知的惡意代碼,對(duì)于混淆或變形的惡意代碼卻無(wú)能為力。數(shù)據(jù)挖掘和機(jī)器學(xué)習(xí)是當(dāng)前計(jì)算機(jī)熱門(mén)研究領(lǐng)域,結(jié)合這兩種技術(shù)對(duì)網(wǎng)頁(yè)木馬進(jìn)行檢測(cè)是未來(lái)的研究發(fā)展趨勢(shì)。本文正是基于以上問(wèn)題,在深入分析了數(shù)據(jù)挖掘和機(jī)器學(xué)習(xí)的原理基礎(chǔ)上,設(shè)計(jì)并實(shí)現(xiàn)了針對(duì)惡意JavaScript腳本的網(wǎng)頁(yè)木馬檢測(cè)系統(tǒng)。論文的主要工作內(nèi)容包括:1.首先,介紹了數(shù)據(jù)挖掘和機(jī)器學(xué)習(xí)技術(shù)的主要原理和理論知識(shí);然后概括了目前國(guó)內(nèi)外已經(jīng)出現(xiàn)的網(wǎng)頁(yè)木馬的主流檢測(cè)算法,并分析了各算法具有的優(yōu)缺點(diǎn)。2.按照軟件工程的原理與思想,分析木馬檢測(cè)系統(tǒng)的主要功能需求、總體框架、工作流程等。最后,采用VC++6.0 MFC、mysql等工具與技術(shù)設(shè)計(jì)并實(shí)現(xiàn)了網(wǎng)頁(yè)木馬檢測(cè)的原型系統(tǒng)。該系統(tǒng)主要包括了URL黑名單、網(wǎng)絡(luò)爬蟲(chóng)、特征提取、BP集成神經(jīng)網(wǎng)絡(luò)分類(lèi)器等功能子模塊。3.目前,大部分網(wǎng)頁(yè)木馬都會(huì)在頁(yè)面中嵌入惡意JavaScript腳本代碼。因此本文重點(diǎn)針對(duì)基于惡意JavaScript腳本的網(wǎng)頁(yè)木馬進(jìn)行檢測(cè)研究。為逃避防病毒軟件的檢測(cè),惡意的JS代碼往往經(jīng)過(guò)混淆或變形,常規(guī)的特征匹配檢測(cè)技術(shù)對(duì)混淆網(wǎng)頁(yè)木馬檢測(cè)基本無(wú)效。本文利用Google V8 JavaScript腳本引擎編譯惡意JS腳本生成機(jī)器碼,從機(jī)器指令中提取出操作碼后再進(jìn)行基于字N-gram的出現(xiàn)頻率統(tǒng)計(jì),以出現(xiàn)最為頻繁的200個(gè)gram作為區(qū)別正常腳本和惡意腳本的網(wǎng)頁(yè)木馬特征。4.本文使用網(wǎng)絡(luò)爬蟲(chóng)等工具從互聯(lián)網(wǎng)上收集100個(gè)正常JS腳本和100個(gè)惡意JS腳本作為網(wǎng)頁(yè)木馬樣本集合。然后利用這200個(gè)樣本數(shù)據(jù)集合進(jìn)行BP神經(jīng)網(wǎng)絡(luò)集成分類(lèi)器模型的訓(xùn)練,使用4-重交叉驗(yàn)證方法分析了該檢測(cè)方法的準(zhǔn)確率和正確率,當(dāng)分類(lèi)器達(dá)到一定的準(zhǔn)確度之后將訓(xùn)練得到的分類(lèi)器模型應(yīng)用到網(wǎng)頁(yè)木馬檢測(cè)系統(tǒng)。最后,還對(duì)系統(tǒng)的功能性和健壯性進(jìn)行了測(cè)試。
[Abstract]:The computer network is changing people's way of life, but because the network has the characteristics of openness, interconnection and so on, the network is easy to lead to the attack of lawless elements, which makes the network security attract more and more people's attention. Web Trojan has been known as the number one killer of network security, virus spread, illegal intrusion, Security problems such as server paralysis are caused by Trojan horse. Traditional detection method based on pattern matching is the most used method in current security detection system, which mainly relies on manual analysis and extraction. Can not predict unknown malicious code, but can not be confused or distorted malicious code. Data mining and machine learning is a hot area of computer research. It is the trend of future research and development to combine these two technologies to detect web Trojan horse. Based on the above problems, this paper deeply analyzes the principles of data mining and machine learning. A web Trojan detection system for malicious JavaScript script is designed and implemented. The main work of this paper includes: 1. Firstly, the main principles and theoretical knowledge of data mining and machine learning technology are introduced. Then it summarizes the main detection algorithms of the web Trojan that have appeared at home and abroad, and analyzes the advantages and disadvantages of the algorithms. 2. According to the principle and thought of software engineering, the paper analyzes the main functional requirements and the overall framework of the Trojan detection system. Finally, the prototype system of web Trojan detection is designed and implemented by using VC 6.0 MFCU MySQL and other tools and techniques. The system mainly includes URL blacklist, web crawler, web crawler, etc. Feature extraction BP integrated neural network classifier and other functional submodules. 3. At present, Most web Trojan horses will embed malicious JavaScript script code in the page. Therefore, this paper focuses on the detection of web Trojan based on malicious JavaScript scripts. The malicious JS code is often confused or deformed, and the conventional feature matching detection technique is not effective for the detection of the obfuscation page Trojan horse. This paper uses Google V8 JavaScript script engine to compile the malicious JS script to generate machine code. After extracting the operation code from the machine instruction, the occurrence frequency statistics based on the word N-gram are carried out. This paper uses web crawler and other tools to collect 100 normal JS scripts and 100 malicious JS scripts from the Internet. Then the 200 sample data sets are used to train the BP neural network ensemble classifier model. The accuracy and accuracy of the method are analyzed by using 4- re-cross verification method. After the classifier reaches a certain accuracy, the trained classifier model is applied to the web Trojan detection system. The functionality and robustness of the system are also tested.
【學(xué)位授予單位】:電子科技大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2014
【分類(lèi)號(hào)】:TP393.08
,
本文編號(hào):1619826
本文鏈接:http://sikaile.net/guanlilunwen/ydhl/1619826.html
最近更新
教材專(zhuān)著