基于特征提取的網(wǎng)絡(luò)測量數(shù)據(jù)集構(gòu)建方法研究
本文關(guān)鍵詞:基于特征提取的網(wǎng)絡(luò)測量數(shù)據(jù)集構(gòu)建方法研究 出處:《新疆大學(xué)》2017年碩士論文 論文類型:學(xué)位論文
更多相關(guān)文章: 流氓證書(Rogue Certificate) 數(shù)據(jù)集構(gòu)建 特征提取 Isomap
【摘要】:互聯(lián)網(wǎng)的出現(xiàn)與普及,給人們帶了極大便利,同時也讓我們承擔(dān)著來自互聯(lián)網(wǎng)的威脅和被欺詐的風(fēng)險。近年來,流氓證書被惡意頒發(fā)的事件時常發(fā)生,如果流氓證書被不法分子獲取并部署至釣魚/欺詐網(wǎng)站之中,用戶個人信息被竊取的風(fēng)險會大大增加,會造成個人財產(chǎn)的損失以及相關(guān)企業(yè)信譽度的下降,F(xiàn)階段流氓證書主要是憑借人工來進(jìn)行識別的,實現(xiàn)流氓證書自動識別是十分必要的。針對流氓證書難以識別,以及暫時缺乏有效的流氓證書數(shù)據(jù)集的現(xiàn)狀,本文以流氓證書為研究對象,主要完成以下三方面的工作:(1)合作研究并構(gòu)建流氓證書原始數(shù)據(jù)集:以構(gòu)建流氓證書數(shù)據(jù)集為目標(biāo),結(jié)合網(wǎng)絡(luò)測量獲得的真實數(shù)字證書數(shù)據(jù)和Frankencert工具生成的流氓證書仿真數(shù)據(jù),通過小組探討對數(shù)字證書的字段以及流氓證書的特點進(jìn)行調(diào)研分析,以數(shù)字證書的字段和流氓證書特點為基礎(chǔ)來確定流氓證書的特征字段,通過去除異常證書等預(yù)處理工作,結(jié)合基本指標(biāo)構(gòu)建了37維的原始流氓證書數(shù)據(jù)集(73萬樣本量)。(2)改進(jìn)特征提取算法并構(gòu)建新的指標(biāo)模型:根據(jù)傳統(tǒng)Isomap算法不足,提出了改進(jìn)后的算法MM-Isomap;算法著重增加對樣本點類別的考量問題,即通過縮小類內(nèi)距離,擴(kuò)大類間距離的方法來提升分類的效果。論文通過準(zhǔn)確度與流氓證書識別的精準(zhǔn)度、召回率以及F值作為評估指標(biāo),進(jìn)行了算法最優(yōu)參數(shù)選擇和算法效果評估。通過應(yīng)用于流氓證書原始數(shù)據(jù)集的,得到特征提取后18維的流氓證書指標(biāo)屬性模型。(3)驗證指標(biāo)屬性模型的有效性并合作構(gòu)建開放數(shù)據(jù)集:驗證有效性方面進(jìn)行了兩部分實驗,一是應(yīng)用向量機(jī)(SVM)、J4.8決策樹以及BP神經(jīng)網(wǎng)絡(luò)三種分類算法對流氓證書原始數(shù)據(jù)集的有效性進(jìn)行了評估;二是評估了特征提取后新指標(biāo)模型的有效性。同時結(jié)合小組另一名同學(xué)特征選擇的工作,一同構(gòu)建了“特征選擇(22維)+特征提取(18維)“后的流氓證書開放數(shù)據(jù)集,這為進(jìn)一步展開流氓證書的研究,提供基礎(chǔ)數(shù)據(jù)集支撐。
[Abstract]:The emergence and popularity of the Internet, to people with great convenience, but also let us bear the threat from the Internet and the risk of fraud. In recent years, rogue certificates are often issued by malicious incidents. If rogue certificates are obtained by criminals and deployed to phishing / fraud sites, the risk of personal information being stolen by users is greatly increased. Will cause the loss of personal property and the decline in the credibility of related enterprises. At this stage, rogue certificates are mainly based on artificial identification. It is very necessary to realize the automatic identification of hooligan certificates. In view of the fact that hooligan certificates are difficult to recognize and the data set of hooligan certificates is lacking for the time being, this paper takes hooligan certificates as the research object. Mainly completed the following three aspects of work: 1) Cooperation research and construction of rogue certificate raw data set: to build rogue certificate data set as the goal. Combining the real digital certificate data obtained by network measurement and the simulation data of rogue certificate generated by Frankencert tool. Through the investigation and analysis of the field of the digital certificate and the characteristics of the rogue certificate, this paper determines the characteristic field of the rogue certificate based on the field of the digital certificate and the characteristics of the rogue certificate. By removing abnormal certificates and other preprocessing work. Based on the basic indexes, a 37-dimensional original rogue certificate data set (730,000 sample size) is constructed to improve the feature extraction algorithm and build a new index model: according to the shortcomings of the traditional Isomap algorithm. An improved algorithm, MM-Isomapa, is proposed. The algorithm focuses on increasing the consideration of sample points, that is, by reducing the distance between classes and expanding the distance between classes to improve the effectiveness of classification. The accuracy and accuracy of the identification of rogue certificates are adopted in this paper. Recall rate and F value are used as evaluation indexes to select optimal parameters of the algorithm and evaluate the effectiveness of the algorithm. The algorithm is applied to the raw data set of rogue certificate. Get the 18-dimensional rogue certificate index attribute model after feature extraction. 3) verify the validity of the index attribute model and cooperate to build an open data set: verify the validity of the two parts of the experiment. The first is to evaluate the validity of the original data set of rogue certificate by using three classification algorithms of vector machine SVMU J4.8 decision tree and BP neural network. The second is to evaluate the effectiveness of the new index model after feature extraction. At the same time, combined with the work of feature selection of another student in the group. The open data set of rogue certificate after feature selection (22 dimension) feature extraction (18 dimension) is constructed together, which provides basic data set support for further research on rogue certificate.
【學(xué)位授予單位】:新疆大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:TP309
【參考文獻(xiàn)】
相關(guān)期刊論文 前7條
1 ;CNNIC發(fā)布第39次《中國互聯(lián)網(wǎng)絡(luò)發(fā)展?fàn)顩r統(tǒng)計報告》[J];中國信息安全;2017年02期
2 謝國民;單敏柱;付華;;基于IsoMap和MBFO-SVR的瓦斯涌出量動態(tài)預(yù)測研究[J];傳感技術(shù)學(xué)報;2016年07期
3 郭依正;朱偉興;馬長華;陳晨;;基于Isomap和支持向量機(jī)算法的俯視群養(yǎng)豬個體識別[J];農(nóng)業(yè)工程學(xué)報;2016年03期
4 楊秀鋒;彭慧;周曉鋒;;一種改進(jìn)的ISOMAP分類算法[J];計算機(jī)應(yīng)用與軟件;2015年08期
5 張少龍;鞏知樂;廖海斌;;融合LLE和ISOMAP的非線性降維方法[J];計算機(jī)應(yīng)用研究;2014年01期
6 程起才;王洪元;吳小俊;劉鎖蘭;;一種基于ISOMAP的分類算法[J];控制與決策;2011年06期
7 牛燕華;任新華;畢經(jīng)平;;Internet網(wǎng)絡(luò)測量方式綜述[J];計算機(jī)應(yīng)用與軟件;2006年07期
相關(guān)博士學(xué)位論文 前2條
1 侯勇;特征提取與集成學(xué)習(xí)算法的研究及應(yīng)用[D];北京科技大學(xué);2015年
2 潘鋒;特征提取與特征選擇技術(shù)研究[D];南京航空航天大學(xué);2011年
相關(guān)碩士學(xué)位論文 前2條
1 孫麗萍;流形學(xué)習(xí)算法ISOMAP的改進(jìn)與實現(xiàn)[D];大連理工大學(xué);2010年
2 王超;基于流形學(xué)習(xí)的有監(jiān)督降維方法研究[D];中國科學(xué)技術(shù)大學(xué);2009年
,本文編號:1416895
本文鏈接:http://sikaile.net/shoufeilunwen/xixikjs/1416895.html