基于啟發(fā)式的釣魚網(wǎng)站檢測技術的研究與實現(xiàn)

發(fā)布時間：2019-01-13 20:22

【摘要】：釣魚網(wǎng)站是在網(wǎng)頁中包含惡意欺騙信息,引誘互聯(lián)網(wǎng)用戶提交個人信息從而竊取其隱私信息乃至個人財產(chǎn)的一種網(wǎng)絡攻擊方式。為了提高釣魚網(wǎng)站檢測的準確性,減少對第三方工具及資源的依賴性,本文對釣魚網(wǎng)站啟發(fā)式檢測技術以及釣魚頁面主題識別技術展開了研究。首先,本文對網(wǎng)頁內(nèi)容預處理關鍵技術展開研究,在網(wǎng)頁數(shù)據(jù)采集和存儲方面,本文提出了一種更新式存儲策略,定期對第三方平臺公布的釣魚網(wǎng)站進行信息資源采集。在網(wǎng)頁文本特征獲取方面,則利用針對網(wǎng)頁文本的m-TextRank文本關鍵詞抽取算法對網(wǎng)頁文本信息特征進行抽取及儲存。其次,為提高釣魚檢測的精確度和穩(wěn)定性,本文通過及時識別新特征和精確選擇最佳特征子集的方式來優(yōu)化檢測方案,并提出了一種多層啟發(fā)式釣魚網(wǎng)站檢測模型包括特征提取層、特征選擇層以及啟發(fā)式分類層。該模型利用五個特征選擇算法來預處理特征集,并研究了三種基于決策樹的分類算法的性能與效果。實驗結(jié)果表明,使用信息增益算法進行特征選擇并結(jié)合隨機樹分類算法的釣魚網(wǎng)站檢測方法能夠在低時間開銷下達到96%的準確率和95%的召回率。再次,為了研究網(wǎng)頁主題和網(wǎng)頁合法性的相關性以及釣魚網(wǎng)站的主題分布情況,本文提出了基于LDA-SVM的釣魚網(wǎng)頁主題識別算法。該算法通過對網(wǎng)頁文本內(nèi)容進行預處理、Gibbs抽樣、LDA建模、SVM分類、效果評估等步驟建立LDA-SVM主題分類模型從而實現(xiàn)對網(wǎng)頁主題的識別。經(jīng)實驗驗證,釣魚網(wǎng)站的主題識別準確率可達93%。隨后本文根據(jù)上述主題分類模型對經(jīng)過啟發(fā)式檢測的網(wǎng)站進行主題鑒別,為啟發(fā)式釣魚網(wǎng)站的檢測結(jié)果提供佐證。最后,在上述研究基礎上,本文設計并實現(xiàn)了釣魚網(wǎng)站啟發(fā)式檢測系統(tǒng)。該系統(tǒng)主要提供網(wǎng)頁信息采集、合法性檢測以及網(wǎng)頁主題識別的功能。系統(tǒng)測試結(jié)果表明,系統(tǒng)能夠滿足對未知網(wǎng)站的合法性檢測需求,整體滿足預期目標。
[Abstract]:Phishing website is a kind of network attack way that contains malicious cheating information in the web page and induces Internet users to submit personal information to steal their privacy information and even personal property. In order to improve the accuracy of fishing site detection and reduce the dependence on third-party tools and resources, this paper studies the heuristic detection technology of fishing site and the technology of phishing page theme recognition. Firstly, this paper studies the key technologies of web content preprocessing. In the aspect of data acquisition and storage, this paper proposes a new storage strategy to collect information resources of phishing websites published by the third party platform periodically. In the aspect of web page text feature extraction, the m-TextRank text keyword extraction algorithm is used to extract and store the web page text information feature. Secondly, in order to improve the accuracy and stability of fishing detection, this paper optimizes the detection scheme by identifying new features in time and selecting the best feature subset accurately. A multi-layer heuristic phishing site detection model is proposed, which includes feature extraction layer, feature selection layer and heuristic classification layer. The model uses five feature selection algorithms to preprocess feature sets, and studies the performance and effect of three classification algorithms based on decision tree. The experimental results show that the fishing site detection method based on information gain algorithm and random tree classification algorithm can achieve 96% accuracy and 95% recall rate in low time cost. Thirdly, in order to study the correlation between the topic and the legitimacy of the web page and the distribution of the topic of the phishing website, this paper proposes a phishing page theme recognition algorithm based on LDA-SVM. The algorithm establishes the LDA-SVM topic classification model by preprocessing the web text content, Gibbs sampling, LDA modeling, SVM classification and effect evaluation. After experimental verification, fishing site theme recognition accuracy can be as high as 933. Then, according to the above topic classification model, the subject identification of heuristic websites is carried out to provide evidence for the detection results of heuristic phishing websites. Finally, on the basis of the above research, this paper designs and implements a heuristic detection system for fishing websites. The system mainly provides the functions of web page information collection, legitimacy detection and page theme recognition. The system test results show that the system can meet the legitimacy of the unknown website detection requirements, the overall satisfaction of the expected objectives.
【學位授予單位】：哈爾濱工業(yè)大學
【學位級別】：碩士
【學位授予年份】：2017
【分類號】：TP393.08

【參考文獻】

相關期刊論文前2條

1 裴英博;劉曉霞;;文本分類中改進型CHI特征選擇方法的研究[J];計算機工程與應用;2011年04期

2 王琦,唐世渭,楊冬青,王騰蛟;基于DOM的網(wǎng)頁主題信息自動提取[J];計算機研究與發(fā)展;2004年10期

相關碩士學位論文前1條

1 史國強;基于RBF神經(jīng)網(wǎng)絡的網(wǎng)頁分類技術研究[D];中國石油大學;2011年

，

本文編號：2408362

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://sikaile.net/guanlilunwen/ydhl/2408362.html

上一篇：微博用戶的興趣及性格分析
下一篇：基于RADIUS的校園網(wǎng)絡分布式認證計費系統(tǒng)的設計與實現(xiàn)

論文發(fā)表

·知網(wǎng)|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于啟發(fā)式的釣魚網(wǎng)站檢測技術的研究與實現(xiàn)