基于遷移學(xué)習(xí)的P2P流量識別研究
發(fā)布時間:2018-07-31 16:31
【摘要】:隨著基于P2P技術(shù)的互聯(lián)網(wǎng)應(yīng)用的大規(guī)模發(fā)展和用戶數(shù)量的激增,由于P2P技術(shù)對網(wǎng)絡(luò)資源的消耗,數(shù)據(jù)傳輸網(wǎng)絡(luò)在建設(shè)和維護上面臨著越來越大的壓力。如何管理好P2P應(yīng)用,使之能夠在現(xiàn)有網(wǎng)絡(luò)資源下健康發(fā)展是國內(nèi)外專家學(xué)者關(guān)注的熱點問題。 P2P流量識別是管理好P2P應(yīng)用的基礎(chǔ),其研究一直沒有中斷過,目前主要的算法有基于端口的檢測識別技術(shù)、基于內(nèi)容的掃描識別技術(shù),以及基于流量特征的識別技術(shù),各項技術(shù)在一定程度上解決了P2P流量識別的問題,但都有各自的缺陷。 機器學(xué)習(xí)算法是當(dāng)今計算機領(lǐng)域的熱門研究方向,機器學(xué)習(xí)算法是一類從數(shù)據(jù)中自動分析獲得規(guī)律,并利用規(guī)律對未知數(shù)據(jù)進行預(yù)測的算法。目前已有不少機器學(xué)習(xí)算法能夠?qū)2P流量進行有效識別,但是都要基于大量的手工標記的訓(xùn)練樣本,且這些樣本在網(wǎng)絡(luò)情況快速變化后難以重復(fù)利用。 本論文在遷移學(xué)習(xí)這一全新的機器學(xué)習(xí)框架下,結(jié)合傳統(tǒng)機器學(xué)習(xí)算法提出新的技術(shù)方案來解決P2P流量識別問題,這類新算法可以在少量手工標記樣本的情況下獲得較好的識別正確率。本論文的主要貢獻和創(chuàng)新之包括以下三點: 第一、對文本分類領(lǐng)域的基于自適應(yīng)提升的遷移學(xué)習(xí)方法進行了研究,將其引入P2P流量識別領(lǐng)域,并提出了更注重實時性的改進算法;谧赃m應(yīng)提升的遷移學(xué)習(xí)是一種在文本分類領(lǐng)域中使用的遷移學(xué)習(xí)方法,本論文將其與P2P流量識別特點相結(jié)合,通過調(diào)整輔助數(shù)據(jù)的權(quán)重,使其更有針對性的遷移到源數(shù)據(jù)中,組成綜合訓(xùn)練集對分類器進行訓(xùn)練,最終得到一個可靠的P2P識別器。在此基礎(chǔ)上,本論文還通過使用基于迭代錯誤率的輔助數(shù)據(jù)動態(tài)裁剪技術(shù),去除了與源數(shù)據(jù)相差過大的輔助數(shù)據(jù),加快了迭代速度,減少了時間消耗。仿真實驗證明改進后的算法更具有實時性和應(yīng)用性。 第二、將傳統(tǒng)的K近鄰法與遷移學(xué)習(xí)框架相結(jié)合,提出了一種基于K近鄰的遷移學(xué)習(xí)方法,將其用于P2P流量識別領(lǐng)域并在復(fù)雜度方面該改進了算法。該算法利用K近鄰法篩選輔助數(shù)據(jù),去除與源數(shù)據(jù)相差較大的輔助數(shù)據(jù),使與源數(shù)據(jù)更相似的輔助數(shù)據(jù)與源數(shù)據(jù)組成綜合訓(xùn)練集,共同訓(xùn)練可靠的P2P流量識別分類器。在此基礎(chǔ)上,本論文還通過奇異值分解進行預(yù)分組,減少了K近鄰法部分的計算量,仿真實驗也證實了該算法的有效性,以及改進算法可以增強整個算法的實時性。 第三、建立了一套簡易的基于Java和Web的P2P流量識別系統(tǒng),方便算法和數(shù)據(jù)集的檢驗和交流。該系統(tǒng)在上述兩種算法的基礎(chǔ)上,以Web為界面,Java語言為核心實現(xiàn)了這兩種算法,并將其公開,使用者可以上傳自己的數(shù)據(jù)集加以識別或下載他人的數(shù)據(jù)集,為P2P流量識別算法的交流提供了一個有效的平臺。
[Abstract]:With the large-scale development of P2P technology based Internet application and the rapid increase of the number of users, data transmission network is facing more and more pressure in construction and maintenance because of the consumption of P2P technology to network resources. How to manage P2P applications well and enable them to develop healthily under the existing network resources is a hot issue that experts and scholars at home and abroad pay close attention to. P2P traffic identification is the foundation of managing P2P applications, and its research has not been interrupted. At present, the main algorithms are port based detection and identification technology, content-based scanning recognition technology, and traffic feature recognition technology. To some extent, each technology solves the problem of P2P traffic identification, but each has its own defects. Machine learning algorithm is a hot research direction in the field of computer nowadays. Machine learning algorithm is a kind of algorithm which can automatically analyze and obtain laws from data and use them to predict unknown data. At present, there are many machine learning algorithms that can effectively identify P2P traffic, but they are all based on a large number of manually labeled training samples, and these samples are difficult to reuse after the rapid change of network conditions. In this paper, under the new machine learning framework of migration learning, combined with the traditional machine learning algorithm, a new technical scheme is proposed to solve the P2P traffic identification problem. This new algorithm can obtain better recognition accuracy in the case of a small number of manually labeled samples. The main contributions and innovations of this thesis are as follows: first, the paper studies the migration learning method based on adaptive lifting in the field of text classification, and introduces it into the field of P2P traffic identification. An improved algorithm which pays more attention to real-time is put forward. Transfer learning based on adaptive lifting is a migration learning method used in the field of text classification. This paper combines it with the characteristics of P2P traffic identification and adjusts the weight of auxiliary data. So that it can migrate to the source data more pertinently, form the comprehensive training set to train the classifier, and finally get a reliable P2P recognizer. On this basis, this paper also uses the auxiliary data dynamic clipping technology based on iterative error rate to remove the auxiliary data which is too different from the source data, accelerate the iteration speed and reduce the time consumption. Simulation results show that the improved algorithm is more real-time and applicable. Secondly, by combining the traditional K-nearest neighbor method with the transfer learning framework, a K-nearest neighbor based transfer learning method is proposed, which is applied to P2P traffic identification and improves the algorithm in terms of complexity. The algorithm uses K-nearest neighbor method to filter the auxiliary data, removes the auxiliary data which is different from the source data, and makes the auxiliary data and the source data more similar to the source data to form a comprehensive training set, together to train a reliable P2P traffic classifier. On this basis, the algorithm is pregrouped by singular value decomposition, which reduces the computational cost of the K-nearest neighbor method. The simulation results show that the algorithm is effective and the improved algorithm can enhance the real-time performance of the whole algorithm. Thirdly, a simple peer-to-peer traffic identification system based on Java and Web is established to facilitate the verification and communication of algorithms and data sets. On the basis of the above two algorithms, the system realizes these two algorithms with Web as the core language, and exposes them. Users can upload their own data sets to identify or download the data sets of others. It provides an effective platform for the exchange of P2P traffic identification algorithms.
【學(xué)位授予單位】:北京郵電大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2014
【分類號】:TP18;TP393.02
本文編號:2156157
[Abstract]:With the large-scale development of P2P technology based Internet application and the rapid increase of the number of users, data transmission network is facing more and more pressure in construction and maintenance because of the consumption of P2P technology to network resources. How to manage P2P applications well and enable them to develop healthily under the existing network resources is a hot issue that experts and scholars at home and abroad pay close attention to. P2P traffic identification is the foundation of managing P2P applications, and its research has not been interrupted. At present, the main algorithms are port based detection and identification technology, content-based scanning recognition technology, and traffic feature recognition technology. To some extent, each technology solves the problem of P2P traffic identification, but each has its own defects. Machine learning algorithm is a hot research direction in the field of computer nowadays. Machine learning algorithm is a kind of algorithm which can automatically analyze and obtain laws from data and use them to predict unknown data. At present, there are many machine learning algorithms that can effectively identify P2P traffic, but they are all based on a large number of manually labeled training samples, and these samples are difficult to reuse after the rapid change of network conditions. In this paper, under the new machine learning framework of migration learning, combined with the traditional machine learning algorithm, a new technical scheme is proposed to solve the P2P traffic identification problem. This new algorithm can obtain better recognition accuracy in the case of a small number of manually labeled samples. The main contributions and innovations of this thesis are as follows: first, the paper studies the migration learning method based on adaptive lifting in the field of text classification, and introduces it into the field of P2P traffic identification. An improved algorithm which pays more attention to real-time is put forward. Transfer learning based on adaptive lifting is a migration learning method used in the field of text classification. This paper combines it with the characteristics of P2P traffic identification and adjusts the weight of auxiliary data. So that it can migrate to the source data more pertinently, form the comprehensive training set to train the classifier, and finally get a reliable P2P recognizer. On this basis, this paper also uses the auxiliary data dynamic clipping technology based on iterative error rate to remove the auxiliary data which is too different from the source data, accelerate the iteration speed and reduce the time consumption. Simulation results show that the improved algorithm is more real-time and applicable. Secondly, by combining the traditional K-nearest neighbor method with the transfer learning framework, a K-nearest neighbor based transfer learning method is proposed, which is applied to P2P traffic identification and improves the algorithm in terms of complexity. The algorithm uses K-nearest neighbor method to filter the auxiliary data, removes the auxiliary data which is different from the source data, and makes the auxiliary data and the source data more similar to the source data to form a comprehensive training set, together to train a reliable P2P traffic classifier. On this basis, the algorithm is pregrouped by singular value decomposition, which reduces the computational cost of the K-nearest neighbor method. The simulation results show that the algorithm is effective and the improved algorithm can enhance the real-time performance of the whole algorithm. Thirdly, a simple peer-to-peer traffic identification system based on Java and Web is established to facilitate the verification and communication of algorithms and data sets. On the basis of the above two algorithms, the system realizes these two algorithms with Web as the core language, and exposes them. Users can upload their own data sets to identify or download the data sets of others. It provides an effective platform for the exchange of P2P traffic identification algorithms.
【學(xué)位授予單位】:北京郵電大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2014
【分類號】:TP18;TP393.02
【參考文獻】
相關(guān)期刊論文 前9條
1 徐雅斌;李艷平;劉曦子;;一個基于云計算的P2P流量識別系統(tǒng)模型的研究[J];電信科學(xué);2012年10期
2 柴寶仁;谷文成;牛占云;周宏君;王克生;;基于Boosting算法的垃圾郵件過濾方法研究[J];北京理工大學(xué)學(xué)報;2013年01期
3 王丹,魏紅;P2P模式的系統(tǒng)結(jié)構(gòu)研究[J];沈陽航空工業(yè)學(xué)院學(xué)報;2003年02期
4 徐鵬;劉瓊;林森;;基于支持向量機的Internet流量分類研究[J];計算機研究與發(fā)展;2009年03期
5 黎俊鋒;朱鋒峰;;基于樣本密度的FCM改進算法[J];科學(xué)技術(shù)與工程;2007年04期
6 胡愛娜;;基于MapReduce的分布式EM算法的研究與應(yīng)用[J];科技通報;2013年06期
7 鄒臘梅;肖基毅;龔向堅;;Web文本挖掘技術(shù)研究[J];情報雜志;2007年02期
8 魯剛;張宏莉;葉麟;;P2P流量識別[J];軟件學(xué)報;2011年06期
9 譚駿;陳興蜀;杜敏;;基于特征加權(quán)與最近鄰法的P2P協(xié)議識別算法[J];四川大學(xué)學(xué)報(工程科學(xué)版);2011年04期
,本文編號:2156157
本文鏈接:http://sikaile.net/guanlilunwen/ydhl/2156157.html
最近更新
教材專著