基于無監(jiān)督學(xué)習(xí)的P2P流量識別技術(shù)的研究
發(fā)布時間:2019-06-06 06:25
【摘要】:隨著P2P網(wǎng)絡(luò)技術(shù)的發(fā)展,P2P應(yīng)用越來越廣泛,而對P2P流量的識別是P2P技術(shù)研究者一直所追求的。由于應(yīng)用越來越多,從而對P2P流量的識別也越來越困難。 本文從介紹P2P技術(shù)著手,分析了幾種典型的P2P流量識別技術(shù),從這些技術(shù)的優(yōu)缺點(diǎn)中提出一種改進(jìn)的算法,這種改進(jìn)的算法是基于無監(jiān)督學(xué)習(xí)的一種聚類算法。本文首先從數(shù)據(jù)包級和數(shù)據(jù)流級方面分析了P2P流量統(tǒng)計(jì)特征,從而選取了P2P流中包大小的平均方差值、P2P流所持續(xù)的時間、P2P流中包大小的變換率、P2P流中數(shù)據(jù)包的平均字節(jié)數(shù)、以及下載與上傳速度比等五種適合本文算法實(shí)驗(yàn)的特征屬性,以此作為后文DBK算法的實(shí)驗(yàn)驗(yàn)證。其次,本文簡單介紹了K means算法以及DBSCAN算法的優(yōu)缺點(diǎn),,在此基礎(chǔ)上加以改進(jìn),從而得到基于DBSCAN改進(jìn)的K means算法(即DBK算法),并在算法初始點(diǎn)的尋找過程中加入貝葉斯信息準(zhǔn)則,得到BIC核心點(diǎn)作為初始節(jié)點(diǎn),再通過K means算法進(jìn)行聚類。 最后,本文對DBK算法進(jìn)行了實(shí)驗(yàn),與K means算法和DBSCAN算法進(jìn)行比較,從準(zhǔn)確率以及誤判率等方面得出結(jié)論。結(jié)果顯示:DBK算法的運(yùn)行時間比較長但是它相對另外兩個算法的外存訪問次數(shù)以及它的平均準(zhǔn)確率比較好,平均誤判率相對較低。由此說明本文算法具有比較好的準(zhǔn)確率以及較低的誤判率,從而得出本文的改進(jìn)算法是一種有效并且可行的算法。
[Abstract]:With the development of P2P network technology, P2P applications are becoming more and more extensive, and P2P traffic identification has been pursued by P2P technology researchers. As there are more and more applications, it is more and more difficult to identify P2P traffic. This paper introduces P2P technology, analyzes several typical P2P traffic identification technologies, and proposes an improved algorithm from the advantages and disadvantages of these technologies. This improved algorithm is a clustering algorithm based on unsupervised learning. In this paper, the statistical characteristics of P2P traffic are analyzed from the aspect of packet level and data flow level, and the average square difference of packet size in P2P flow, the duration of P2P flow and the conversion rate of packet size in P2P flow are selected. Five characteristic attributes, such as the average number of bytes in P2P stream and the ratio of download to upload speed, are suitable for the experiment of this algorithm, which are used as the experimental verification of the later DBK algorithm. Secondly, this paper briefly introduces the advantages and disadvantages of K 鈮
本文編號:2494137
[Abstract]:With the development of P2P network technology, P2P applications are becoming more and more extensive, and P2P traffic identification has been pursued by P2P technology researchers. As there are more and more applications, it is more and more difficult to identify P2P traffic. This paper introduces P2P technology, analyzes several typical P2P traffic identification technologies, and proposes an improved algorithm from the advantages and disadvantages of these technologies. This improved algorithm is a clustering algorithm based on unsupervised learning. In this paper, the statistical characteristics of P2P traffic are analyzed from the aspect of packet level and data flow level, and the average square difference of packet size in P2P flow, the duration of P2P flow and the conversion rate of packet size in P2P flow are selected. Five characteristic attributes, such as the average number of bytes in P2P stream and the ratio of download to upload speed, are suitable for the experiment of this algorithm, which are used as the experimental verification of the later DBK algorithm. Secondly, this paper briefly introduces the advantages and disadvantages of K 鈮
本文編號:2494137
本文鏈接:http://sikaile.net/guanlilunwen/ydhl/2494137.html
最近更新
教材專著