面向不均衡數(shù)據(jù)的半監(jiān)督網(wǎng)絡(luò)流量分類技術(shù)研究與實現(xiàn)
[Abstract]:Network traffic classification technology is the basis of network management and control, network security, network construction and upgrading, operation management and so on, and its research has important application value. With the rapid development of network technology, the number of network users expands rapidly, the scale of network expands rapidly, and new services emerge constantly. As a result, the network environment is becoming more and more complex, and it is becoming more and more difficult to classify network traffic accurately. Especially, with the wide application of network technology represented by dynamic port number and service encryption, the effectiveness and reliability of the traditional traffic identification method based on port number and load feature matching are decreased. The researchers focused their research on traffic classification based on machine learning. This kind of method can be classified according to the statistical characteristics of the stream, and it can get rid of the dependence on the port number and data load, so it has a wider development prospect. In this paper, two key problems of sample tagging bottleneck and class imbalance in the field of traffic classification based on machine learning are studied. The main work is as follows: 1. A semi-supervised traffic classification algorithm (semi-supervised traffic identification method based on K-means and k-nearest neighbor,KMkNN) based on K-means and k-nearest neighbors is proposed to solve the bottleneck problem and class imbalance problem in traffic classification. The data stream is represented by high dimensional flow statistical feature vector, and a two-level classifier is constructed by using K-means and k-nearest neighbor classification algorithms. Firstly, K-means clustering algorithm is used to cluster the data containing a small number of labeled samples and a large number of unlabeled samples into several clusters. Then, k-nearest neighbor classifier is used to train k-nearest neighbor classifier to classify unknown samples in the cluster, and based on the distribution of labeled samples, the nearest neighbor number k is adjusted adaptively, which overcomes the traditional semi-supervised traffic classification method. Small class sample recognition rate is low or even difficult to find the problem. The theoretical analysis and experimental results show that the proposed method not only maintains a high recognition rate of large class flows, but also improves the recognition rate of small class flows, and can find new applications. 2. An integrated traffic classification algorithm (ensemble classifier based on random subspace,RSEC) based on stochastic feature subsets is proposed to solve the problem that there is redundancy in flow statistics and can be divided into several independent feature subsets. In this algorithm, feature sets are constructed by feature selection based on forward selection (wrapper), then feature subsets are generated by hierarchical random selection, and different base classifiers are obtained by training different feature subsets. Finally, the final result is obtained by combining absolute majority and relative majority to integrate the classification results of each base classifier. Experimental results show that the recognition accuracy and recall rate of the proposed algorithm for large and small class traffic are further improved compared with single classifier KMkNN. An off-line traffic classification system based on machine learning is designed and implemented in C # language. The system uses wireshark software to realize the on-line data acquisition and save to the local for off-line analysis, the flow feature set generation module restores according to the five-tuple information convection, and obtains the flow feature through the statistical message header information. The sample tagging module uses port number matching, load feature matching and manual marking to mark the training samples, and the classification module provides five optional classification algorithms, C4.5 NBK, semi-supervised K-means and KMkNN,RSEC proposed in this paper. Finally, the validity of the system is verified by using the real data collected in the laboratory.
【學(xué)位授予單位】:解放軍信息工程大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2014
【分類號】:TP393.06
【參考文獻】
相關(guān)期刊論文 前10條
1 陶新民;郝思媛;張冬雪;徐鵬;;不均衡數(shù)據(jù)分類算法的綜述[J];重慶郵電大學(xué)學(xué)報(自然科學(xué)版);2013年01期
2 張震;汪斌強;李向濤;黃萬偉;;基于近鄰傳播學(xué)習(xí)的半監(jiān)督流量分類方法[J];自動化學(xué)報;2013年07期
3 楊明;王飛;;一種基于局部隨機子空間的分類集成算法[J];模式識別與人工智能;2012年04期
4 張宏莉;魯剛;;分類不平衡協(xié)議流的機器學(xué)習(xí)算法評估與比較[J];軟件學(xué)報;2012年06期
5 王濤;余順爭;;基于機器學(xué)習(xí)的網(wǎng)絡(luò)流量分類研究進展[J];小型微型計算機系統(tǒng);2012年05期
6 鈕曉娜;郭云飛;張進;;基于機器自學(xué)習(xí)的互聯(lián)網(wǎng)加密業(yè)務(wù)流早期識別[J];計算機工程與設(shè)計;2010年02期
7 王宇;余順爭;;網(wǎng)絡(luò)流量的決策樹分類[J];小型微型計算機系統(tǒng);2009年11期
8 徐鵬;林森;;基于C4.5決策樹的流量分類方法[J];軟件學(xué)報;2009年10期
9 徐鵬;劉瓊;林森;;基于支持向量機的Internet流量分類研究[J];計算機研究與發(fā)展;2009年03期
10 柳斌;李之棠;涂浩;;一種基于半監(jiān)督學(xué)習(xí)的應(yīng)用層流量分類方法[J];微電子學(xué)與計算機;2008年10期
,本文編號:2387678
本文鏈接:http://sikaile.net/guanlilunwen/ydhl/2387678.html