天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于網(wǎng)絡爬蟲的虛假網(wǎng)頁主動智能檢測

發(fā)布時間:2018-02-10 12:50

  本文關(guān)鍵詞: 虛假網(wǎng)頁檢測 主動檢測 網(wǎng)頁特征提取 深度學習算法 機器學習算法 出處:《華北電力大學》2015年碩士論文 論文類型:學位論文


【摘要】:網(wǎng)絡釣魚是通過給用戶投遞來自企業(yè)組織或者金融機構(gòu)的欺騙性垃圾郵件,引誘用戶泄露個人私密隱私信息的一種攻擊方式。最常見的方式是將用戶引誘到與目標正常網(wǎng)頁十分類似的虛假網(wǎng)頁上,并竊取受害者在其網(wǎng)頁上保存的個人私密信息。近幾年來隨著虛假網(wǎng)頁的危害越來越嚴重,虛假網(wǎng)頁檢測作為一種反釣魚技術(shù)與措施被人們廣泛關(guān)注。本文提出一種基于網(wǎng)絡爬蟲的虛假網(wǎng)頁主動智能檢測系統(tǒng),在得到與目標網(wǎng)站相似網(wǎng)頁的基礎上,通過提取相似網(wǎng)頁的特征并對特征向量利用Autoencoder進行降維預處理,最后再利用BVM分類器檢測辨別虛假網(wǎng)頁。首先,由于被動檢測的滯后性,論文采用主動檢測模式,即使用編輯距離計算出種子站點與目標站點URL地址相似的網(wǎng)頁。其次,在得到相似網(wǎng)頁的基礎上,對這些網(wǎng)頁分別進行特征提取,虛假網(wǎng)頁的檢測結(jié)果很大程度上取決于網(wǎng)站特征的提取,本文較全面的提取了網(wǎng)頁的文檔特征和拓撲特征,并且充實了特征元素的種類,在對網(wǎng)頁的文本特征和源碼分析的基礎上,提出了更加準確全面的虛假網(wǎng)頁特征向量,然后利用Autoencoder對其特征向量進行降維預處理,使處理后的特征向量更加符合分類器的要求,并且提高了虛假網(wǎng)頁檢測的精度。再次,論文利用機器學習算法BVM構(gòu)建了虛假網(wǎng)頁主動智能檢測分類器,給出了基于BVM的虛假網(wǎng)頁的智能檢測的步驟和實驗結(jié)果,并分析了算法的優(yōu)缺點。通過大量的實驗,得出本文提出的基于BVM虛假網(wǎng)頁主動智能檢測方法具有較高的精確度并且有較短的消耗時間。最后,本文用Java Web技術(shù)實現(xiàn)了一個基于網(wǎng)絡爬蟲的虛假網(wǎng)頁主動智能檢測系統(tǒng),本系統(tǒng)采用B/S結(jié)構(gòu)設計,展示了系統(tǒng)的架構(gòu)設計和系統(tǒng)的各功能界面。
[Abstract]:Phishing is by sending users deceptive spam from business organizations or financial institutions. An attack that induces users to reveal private and private information. The most common way is to lure users to fake pages that are very similar to the normal pages of the target. And steal the personal and private information that victims keep on their web pages. In recent years, with the damage of fake web pages becoming more and more serious, As an anti-phishing technique and measure, false web page detection has been paid more and more attention. In this paper, an active intelligent detection system based on web crawler is proposed. By extracting the features of similar web pages and using Autoencoder to reduce the dimension of feature vectors, finally using BVM classifier to detect and identify false pages. Firstly, due to the lag of passive detection, the active detection mode is adopted in this paper. Even if the web pages with similar URL addresses of the seed site and the target site are calculated by using the edit distance. Secondly, on the basis of obtaining the similar pages, the features of these pages are extracted respectively. The detection results of false web pages largely depend on the extraction of website features. In this paper, the document features and topological features of web pages are extracted comprehensively, and the types of feature elements are enriched. Based on the analysis of the text features and the source code of the web pages, a more accurate and comprehensive feature vector of the false web page is proposed, and then the dimension reduction of the feature vector is preprocessed by using Autoencoder to make the processed feature vector more in line with the requirements of the classifier. And improve the accuracy of false web page detection. Thirdly, this paper uses machine learning algorithm BVM to construct a false web page active intelligent detection classifier, and gives the steps and experimental results of the false web page intelligent detection based on BVM. The advantages and disadvantages of the algorithm are analyzed. Through a large number of experiments, it is concluded that the active intelligent detection method proposed in this paper based on BVM false web pages has high accuracy and short time consumption. Finally, In this paper, an active intelligent detection system of false web pages based on web crawler is implemented by using Java Web technology. The system is designed with the structure of B / S, which shows the architecture design of the system and the functional interface of the system.
【學位授予單位】:華北電力大學
【學位級別】:碩士
【學位授予年份】:2015
【分類號】:TP393.092

【參考文獻】

相關(guān)期刊論文 前2條

1 黃亮;趙澤茂;梁興開;;基于編輯距離的Web數(shù)據(jù)挖掘[J];計算機應用;2012年06期

2 張樂;劉忠;張建強;任雄偉;;基于自編碼神經(jīng)網(wǎng)絡的裝備體系評估指標約簡方法[J];中南大學學報(自然科學版);2013年10期

相關(guān)碩士學位論文 前1條

1 李濤賢;基于最近鄰及相似度測量檢測釣魚網(wǎng)頁技術(shù)的研究[D];南京郵電大學;2012年

,

本文編號:1500554

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/guanlilunwen/ydhl/1500554.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶841fe***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com