基于機器學習的安卓惡意應用檢測方法研究
發(fā)布時間:2018-10-18 18:42
【摘要】:隨著智能手機的出現(xiàn)以及移動互聯(lián)網(wǎng)的快速發(fā)展,用戶連接網(wǎng)絡(luò)的方式也在逐漸發(fā)生變化,由PC端向移動端轉(zhuǎn)移,F(xiàn)如今智能手機與傳統(tǒng)PC相比,已不僅僅是簡單的通信工具,PC端的很多功能都在移動端實現(xiàn)。Android手機系統(tǒng)是目前市場上用戶最多的手機操作系統(tǒng),因此大量的用戶和開發(fā)人員關(guān)注安卓應用市場。同時,惡意代碼的開發(fā)者也將目光轉(zhuǎn)入這一市場,用戶的手機安全受到極大威脅。面對Android應用市場存在的大量惡意應用,如何高效的檢測惡意應用是個亟待解決的問題。針對以上問題,本論文旨在研究基于機器學習的安卓惡意應用檢測方法,主要研究重點包括:(1)對安卓惡意應用檢測的研究現(xiàn)狀和成果以及安卓系統(tǒng)架構(gòu)進行了深入的研究,分析了安卓系統(tǒng)基于Linux內(nèi)核的安全機制以及安卓系統(tǒng)特有的安全機制,如沙盒機制和權(quán)限機制等。(2)分析了惡意應用的攻擊方式以及惡意代碼植入方式,在此基礎(chǔ)上對Android應用的反編譯文件進行了深入解析,并對論文中所使用的機器學習分類算法的原理進行了分析。(3)設(shè)計了基于機器學習的安卓惡意應用檢測的方案,針對惡意應用特征提出使用N-gram Opcode特征進行機器學習的惡意應用檢測方案,實驗結(jié)果表明使用Dalvik指令分為24類的規(guī)則和3-gram生成的3-gram Opcode特征具有最好的性能。隨后依據(jù)3-gram Opcode特征結(jié)合API特征和Permission特征,對特征集合和分類算法對分類器的性能影響進行了多次實驗,大量的實驗表明使用API特征、Permission特征與3-gram Opcode特征的組合特征集合與隨機森林算法訓練得到的分類器有著較好的性能,在誤判率為5.3%的情況下達到了 94%的檢測準確率,平均預測時間為10.06s。若是使用API特征與Permission特征的組合特征集合和隨機森林算法訓練的分類器,在檢測準確率94.1%和誤判率6.5%的情況下,平均預測時間為7.5s。
[Abstract]:With the emergence of smart phones and the rapid development of mobile Internet, the way users connect to the network is gradually changing from PC to mobile. Nowadays, compared with the traditional PC, the smartphone is not only a simple communication tool, but also many functions of the PC end are implemented on the mobile side. Android mobile phone system is the most popular mobile operating system in the market. So a lot of users and developers focus on the Android app market. At the same time, malicious code developers turn to this market, users' mobile phone security is greatly threatened. In the face of a large number of malicious applications in Android application market, how to detect malicious applications efficiently is an urgent problem to be solved. Aiming at the above problems, this thesis aims to study the malware detection methods of Android based on machine learning. The main research focuses are as follows: (1) the research status and achievements of Android malicious application detection and the Android system architecture are studied deeply. This paper analyzes the security mechanism of Android system based on Linux kernel and the special security mechanism of Android system, such as sandboxie mechanism and permission mechanism. (2) the attack mode of malicious application and the way of malicious code implantation are analyzed. On this basis, the decompilation file of Android application is deeply analyzed, and the principle of machine learning classification algorithm used in this paper is analyzed. (3) the scheme of malware application detection based on machine learning is designed. A malicious application detection scheme using N-gram Opcode features for machine learning is proposed for malicious application features. The experimental results show that the Dalvik instruction is divided into 24 kinds of rules and the 3-gram Opcode features generated by 3-gram have the best performance. Then, according to the 3-gram Opcode features combined with API features and Permission features, the effects of feature sets and classification algorithms on the performance of the classifier are tested many times. A large number of experiments show that the classifier trained by API feature, Permission feature and 3-gram Opcode feature combined with random forest algorithm has good performance, and the detection accuracy is 94% when the error rate is 5.3%. The average predicted time was 10.06 s. If the combined feature set of API feature and Permission feature and the classifier trained by stochastic forest algorithm are used, the average prediction time is 7.5 s when the detection accuracy is 94.1% and the error rate is 6.5%.
【學位授予單位】:北京交通大學
【學位級別】:碩士
【學位授予年份】:2017
【分類號】:TP181;TP309
[Abstract]:With the emergence of smart phones and the rapid development of mobile Internet, the way users connect to the network is gradually changing from PC to mobile. Nowadays, compared with the traditional PC, the smartphone is not only a simple communication tool, but also many functions of the PC end are implemented on the mobile side. Android mobile phone system is the most popular mobile operating system in the market. So a lot of users and developers focus on the Android app market. At the same time, malicious code developers turn to this market, users' mobile phone security is greatly threatened. In the face of a large number of malicious applications in Android application market, how to detect malicious applications efficiently is an urgent problem to be solved. Aiming at the above problems, this thesis aims to study the malware detection methods of Android based on machine learning. The main research focuses are as follows: (1) the research status and achievements of Android malicious application detection and the Android system architecture are studied deeply. This paper analyzes the security mechanism of Android system based on Linux kernel and the special security mechanism of Android system, such as sandboxie mechanism and permission mechanism. (2) the attack mode of malicious application and the way of malicious code implantation are analyzed. On this basis, the decompilation file of Android application is deeply analyzed, and the principle of machine learning classification algorithm used in this paper is analyzed. (3) the scheme of malware application detection based on machine learning is designed. A malicious application detection scheme using N-gram Opcode features for machine learning is proposed for malicious application features. The experimental results show that the Dalvik instruction is divided into 24 kinds of rules and the 3-gram Opcode features generated by 3-gram have the best performance. Then, according to the 3-gram Opcode features combined with API features and Permission features, the effects of feature sets and classification algorithms on the performance of the classifier are tested many times. A large number of experiments show that the classifier trained by API feature, Permission feature and 3-gram Opcode feature combined with random forest algorithm has good performance, and the detection accuracy is 94% when the error rate is 5.3%. The average predicted time was 10.06 s. If the combined feature set of API feature and Permission feature and the classifier trained by stochastic forest algorithm are used, the average prediction time is 7.5 s when the detection accuracy is 94.1% and the error rate is 6.5%.
【學位授予單位】:北京交通大學
【學位級別】:碩士
【學位授予年份】:2017
【分類號】:TP181;TP309
【參考文獻】
相關(guān)期刊論文 前10條
1 桓自強;倪宏;胡琳琳;郭志川;;基于Android權(quán)限機制的應用安全檢測方法[J];計算機工程與設(shè)計;2016年01期
2 謝妞妞;;決策樹算法綜述[J];軟件導刊;2015年11期
3 王鵬;;安卓平臺下惡意軟件的檢測研究[J];中國新通信;2015年08期
4 李挺;董航;袁春陽;杜躍進;徐國愛;;基于Dalvik指令的Android惡意代碼特征描述及驗證[J];計算機研究與發(fā)展;2014年07期
5 張玉清;王凱;楊歡;方U喚,
本文編號:2280035
本文鏈接:http://sikaile.net/kejilunwen/zidonghuakongzhilunwen/2280035.html
最近更新
教材專著