當(dāng)前位置：主頁 > 管理論文 > 移動(dòng)網(wǎng)絡(luò)論文 >

基于機(jī)器學(xué)習(xí)的網(wǎng)絡(luò)流量識(shí)別方法與實(shí)現(xiàn)

發(fā)布時(shí)間：2018-04-21 22:32

本文選題：網(wǎng)絡(luò)流量識(shí)別 + 機(jī)器學(xué)習(xí)��；參考：《山東大學(xué)》2014年碩士論文

【摘要】：隨著計(jì)算機(jī)網(wǎng)絡(luò)技術(shù)的飛速發(fā)展和信息時(shí)代的到來,網(wǎng)絡(luò)使用頻率的不斷增加造成了互聯(lián)網(wǎng)的數(shù)據(jù)流量爆發(fā)式增長；網(wǎng)絡(luò)新應(yīng)用的不斷出現(xiàn)造成了網(wǎng)絡(luò)通信協(xié)議使用更加靈活、混雜；網(wǎng)絡(luò)病毒、竊聽和惡意攻擊等行為不斷增多造成了網(wǎng)絡(luò)安全成為社會(huì)和政府部門關(guān)注的熱點(diǎn)。這些問題可以通過網(wǎng)絡(luò)流量識(shí)別得到很好的解決。因此,網(wǎng)絡(luò)流量識(shí)也越來越受到人們的重視。已經(jīng)有許多不同的流量識(shí)別方法,但從研究和應(yīng)用角度人們?cè)絹碓疥P(guān)注流量識(shí)別的可行性和有效性,即如何快速地處理海量的數(shù)據(jù)和如何正確地識(shí)別網(wǎng)絡(luò)中的各種應(yīng)用。面臨不斷變化的網(wǎng)絡(luò)環(huán)境,本論文主要研究基于機(jī)器學(xué)習(xí)(Machine Learning, ML)的網(wǎng)絡(luò)流量識(shí)別方法,重點(diǎn)采用了后向傳播(Back Propagation, BP)神經(jīng)網(wǎng)絡(luò)和支持向量機(jī)(Support Vector Machine, SVM)兩種監(jiān)督學(xué)習(xí)算法。 BP神經(jīng)網(wǎng)絡(luò)采用分布、并行的網(wǎng)狀結(jié)構(gòu)進(jìn)行訓(xùn)練學(xué)習(xí),使其容錯(cuò)性更高,處理速度更快；BP神經(jīng)網(wǎng)絡(luò)具有很好的非線性映射能力,可以模擬輸入與輸出的非線性關(guān)系；同時(shí),BP神經(jīng)網(wǎng)絡(luò)是通過全局尋優(yōu)的方式進(jìn)行訓(xùn)練的,因此BP網(wǎng)絡(luò)也具有很高的泛化能力。SVM則是針對(duì)小樣本的機(jī)器學(xué)習(xí)方法,并且通過內(nèi)積核函數(shù)將低維樣本空間非線性映射到高維空間,其具有比較完善的理論基礎(chǔ)。SVM采用“轉(zhuǎn)導(dǎo)推理”(Transductive Inference)方法可以很容易的解決非線性多分類問題。SVM的最優(yōu)分類超平面只由邊界上有限的支持向量構(gòu)成,使得SVM方法不僅簡單有效,而且具有很好的魯棒性。這兩種機(jī)器學(xué)習(xí)算法都能夠適應(yīng)網(wǎng)絡(luò)環(huán)境中的大數(shù)據(jù)和多樣性,都能夠快速有效的識(shí)別網(wǎng)絡(luò)流量的應(yīng)用類型。本論文的流量識(shí)別系統(tǒng)是以家庭中的網(wǎng)絡(luò)流為識(shí)別對(duì)象,該系統(tǒng)從功能上分為家庭網(wǎng)關(guān)和后臺(tái)服務(wù)器兩部分。家庭網(wǎng)關(guān)實(shí)時(shí)抓取數(shù)據(jù)包、提取特征,并通過機(jī)器學(xué)習(xí)的方法進(jìn)行流量識(shí)別,然后將識(shí)別結(jié)果傳送給后臺(tái)服務(wù)器；后臺(tái)服務(wù)器將識(shí)別結(jié)果存入數(shù)據(jù)庫,并顯示當(dāng)前網(wǎng)絡(luò)中流量的應(yīng)用類型,便于管理者進(jìn)行監(jiān)管。論文研究的主要貢獻(xiàn)如下： 1、通過對(duì)網(wǎng)絡(luò)流量識(shí)別和機(jī)器學(xué)習(xí)的研究與分析,BP神經(jīng)網(wǎng)絡(luò)能夠適應(yīng)互聯(lián)網(wǎng)的大數(shù)據(jù)和多樣性特點(diǎn),在此基礎(chǔ)上選擇了基于BP神經(jīng)網(wǎng)絡(luò)的流量識(shí)別方法。即選擇三層的BP神經(jīng)網(wǎng)絡(luò)作為實(shí)現(xiàn)方案,其分類能力滿足流量識(shí)別的要求并且結(jié)構(gòu)簡單易于實(shí)現(xiàn)。選擇S型函數(shù)作為BP神經(jīng)網(wǎng)絡(luò)隱含層的轉(zhuǎn)移函數(shù),實(shí)現(xiàn)對(duì)網(wǎng)絡(luò)流特征等輸入信息的非線性映射。雖然BP神經(jīng)網(wǎng)絡(luò)容易陷入誤差曲面的局部極小,但是通過粒子群算法(Particle Swarm Optimization, PSO)尋找具有全局最優(yōu)特性的初始化權(quán)值,保證BP神經(jīng)網(wǎng)絡(luò)訓(xùn)練時(shí)能夠進(jìn)入誤差曲面的全局最小。實(shí)驗(yàn)結(jié)果表明,經(jīng)過PSO算法優(yōu)化的BP神經(jīng)網(wǎng)絡(luò)能夠很快尋找到誤差曲面的全局最小值,并準(zhǔn)確識(shí)別流量的網(wǎng)絡(luò)應(yīng)用類型。 2、仔細(xì)研究SVM解決線性和非線性分類問題的原理,在此基礎(chǔ)上提出了基于SVM的流量識(shí)別方法,將SVM應(yīng)用于網(wǎng)絡(luò)流量識(shí)別領(lǐng)域。選擇徑向基函數(shù)作為SVM的核函數(shù),實(shí)現(xiàn)從低維的網(wǎng)絡(luò)流特征空間向更高維空間的非線性映射。并通過一對(duì)一方法(One-Against-One)構(gòu)造了SVM多值分類器,使SVM能夠識(shí)別多種網(wǎng)絡(luò)應(yīng)用類型。SVM在高維空間中生成最優(yōu)超平面,實(shí)現(xiàn)對(duì)空間的劃分和多種網(wǎng)絡(luò)應(yīng)用的分類,這是一種全局尋優(yōu)的方式因此SVM的識(shí)別方法具有很好的泛化能力。實(shí)驗(yàn)結(jié)果表明,SVM非常適合解決網(wǎng)絡(luò)流量識(shí)別這種非線性多分類問題,而且所需訓(xùn)練樣本少,計(jì)算復(fù)雜度低,能夠進(jìn)行實(shí)時(shí)識(shí)別。 3、在家庭局域網(wǎng)中設(shè)計(jì)和實(shí)現(xiàn)了流量識(shí)別系統(tǒng)。根據(jù)機(jī)器學(xué)習(xí)的系統(tǒng)模型和監(jiān)督學(xué)習(xí)的實(shí)現(xiàn)方法,設(shè)計(jì)了網(wǎng)絡(luò)流量識(shí)別的總體架構(gòu),將其分為實(shí)時(shí)在線流量識(shí)別和離線訓(xùn)練學(xué)習(xí)兩部分,具體過程包含抓取網(wǎng)絡(luò)流的數(shù)據(jù)包,生成網(wǎng)絡(luò)流的特征,選擇訓(xùn)練集和測試集,對(duì)機(jī)器學(xué)習(xí)算法進(jìn)行訓(xùn)練,和測試兩種流量識(shí)別算法的分類效果。在系統(tǒng)實(shí)現(xiàn)方面,將BP神經(jīng)網(wǎng)絡(luò)和SVM的流量識(shí)別算法編寫為程序,并移植到家庭網(wǎng)關(guān)(家庭網(wǎng)關(guān)由路由器搭建)中。在后臺(tái)服務(wù)器的Linux平臺(tái)上搭建Web服務(wù)器和安裝MySQL數(shù)據(jù)庫,實(shí)現(xiàn)家庭網(wǎng)關(guān)與后臺(tái)服務(wù)器之間的交互通信、信息處理和存儲(chǔ)。管理員則可以通過Web瀏覽器登錄后臺(tái)服務(wù)器觀察當(dāng)前家庭網(wǎng)絡(luò)中流量識(shí)別結(jié)果。
[Abstract]:With the rapid development of computer network technology and the arrival of information age, the increasing frequency of network use has caused the explosive growth of the data flow of the Internet. The continuous emergence of new network applications caused the use of network communication protocols to be more flexible and mixed; network viruses, eavesdropping and malicious attacks have been increasing. Network security has become a hot spot of concern in the society and government departments. These problems can be solved well through network traffic identification. Therefore, the network traffic knowledge is also getting more and more attention.
There are many different traffic identification methods, but from the perspective of research and application, people pay more and more attention to the feasibility and effectiveness of traffic identification, that is, how to deal with massive data quickly and how to correctly identify various applications in the network. Facing the changing network environment, this paper mainly studies Machine L based on machine learning. Earning, ML) network traffic identification method, focusing on the backward propagation (Back Propagation, BP) neural network and support vector machine (Support Vector Machine, SVM) of the two supervised learning algorithms.
BP neural network adopts distributed and parallel network structure for training and learning, which makes it more fault-tolerant and faster processing; BP neural network has good nonlinear mapping ability and can simulate the nonlinear relationship between input and output. At the same time, BP neural network is trained through global optimization, so BP network also has The high generalization ability.SVM is a machine learning method for small sample, and maps the low dimensional sample space nonlinear to the high dimension space through the inner product kernel function, and it has a relatively perfect theoretical basis,.SVM can easily solve the nonlinear multi classification problem.SVM using the "Transductive Inference" method. The optimal classification hyperplane is only composed of finite support vectors on the boundary, which makes the SVM method not only simple and effective, but also has good robustness. These two machine learning algorithms can adapt to the large data and diversity in the network environment, and can quickly and effectively identify the application types of network flow.
The flow recognition system in this paper is based on the network flow in the family, which is divided into two parts: the home gateway and the backstage server. The home gateway takes the data packet in real time, extracts the features, and carries out the traffic identification through the machine learning method, and then transmits the recognition results to the backstage server; the background server is transferred to the background server. Storing the results in the database and displaying the application types of traffic in the current network is convenient for managers to supervise. The main contributions of the paper are as follows:
1, through the research and analysis of network traffic identification and machine learning, the BP neural network can adapt to the large data and diversity characteristics of the Internet. On this basis, we choose the flow recognition method based on the BP neural network. That is, the three layer BP neural network is selected as the implementation scheme, and its classification ability meets the requirements of traffic identification and the conclusion is concluded. The S type function is selected as the transfer function of the hidden layer of the BP neural network to realize the nonlinear mapping of the input information such as the network flow characteristics. Although the BP neural network is easy to fall into the local minimum of the error surface, the global optimal characteristic is found by the particle swarm optimization (Particle Swarm Optimization, PSO). The initial weight value ensures that the BP neural network is trained to enter the global minimum of the error surface. The experimental results show that the BP neural network optimized by the PSO algorithm can quickly find the global minimum value of the error surface and identify the network application type of the flow accurately.
2, the principle of SVM to solve linear and nonlinear classification problems is carefully studied. On this basis, a flow recognition method based on SVM is proposed, and SVM is applied to the field of network traffic identification. The radial basis function is selected as the kernel function of the SVM to realize the nonlinear mapping from the characteristic space of the low dimension network flow to the higher dimension space. Method (One-Against-One) constructs a SVM multi value classifier, which enables SVM to identify a variety of network application types.SVM to generate the optimal hyperplane in high dimensional space to realize the partition of space and the classification of various network applications. This is a global optimization method, so the SVM recognition method has a good generalization ability. The experimental results show that SVM is not. It is often suitable for solving the nonlinear multi class problem of network traffic identification. Moreover, it needs less training samples and low computational complexity, and can be used for real-time identification.
3, the flow recognition system is designed and implemented in the home LAN. According to the system model of machine learning and the realization method of supervised learning, the overall architecture of network traffic identification is designed, which is divided into two parts: real-time online traffic identification and off-line training learning. The specific process includes data packets grabbing network flow and generating network flow. Feature, select the training set and test set, train the machine learning algorithm, and test the classification effect of two traffic recognition algorithms. In the system realization, the BP neural network and the SVM traffic recognition algorithm are programmed and transplanted into the home gateway (the home gateway is built by the road device). On the Linux platform of the backstage server, it is built on the backstage server. Build Web server and install MySQL database to realize interactive communication between home gateway and backstage server, information processing and storage. Administrators can log in to backstage server through Web browser to observe current traffic identification results in home network.

【學(xué)位授予單位】：山東大學(xué)
【學(xué)位級(jí)別】：碩士
【學(xué)位授予年份】：2014
【分類號(hào)】：TP393.08;TP181

【參考文獻(xiàn)】

相關(guān)期刊論文前10條

1 徐鵬;劉瓊;林森;;基于支持向量機(jī)的Internet流量分類研究[J];計(jì)算機(jī)研究與發(fā)展;2009年03期

2 陳亮;龔儉;徐選;;應(yīng)用層協(xié)議識(shí)別算法綜述[J];計(jì)算機(jī)科學(xué);2007年07期

3 彭蕓;劉瓊;;Internet流分類方法的比較研究[J];計(jì)算機(jī)科學(xué);2007年08期

4 顧亞祥;丁世飛;;支持向量機(jī)研究進(jìn)展[J];計(jì)算機(jī)科學(xué);2011年02期

5 祁亨年;支持向量機(jī)及其應(yīng)用研究綜述[J];計(jì)算機(jī)工程;2004年10期

6 沈富可;常潘;任肖麗;;基于BP神經(jīng)網(wǎng)絡(luò)的P2P流量識(shí)別研究[J];計(jì)算機(jī)應(yīng)用;2007年S2期

7 徐鵬;林森;劉瓊;;基于決策樹的流量分類方法[J];計(jì)算機(jī)應(yīng)用研究;2008年08期

8 林森;徐鵬;劉瓊;;基于支持向量機(jī)的流量分類方法[J];計(jì)算機(jī)應(yīng)用研究;2008年08期

9 張學(xué)工;關(guān)于統(tǒng)計(jì)學(xué)習(xí)理論與支持向量機(jī)[J];自動(dòng)化學(xué)報(bào);2000年01期

10 梁偉;李晗;;網(wǎng)絡(luò)流量識(shí)別方法研究[J];通信技術(shù);2008年11期

，

本文編號(hào)：1784379

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會(huì)員下載

Download by Member

本文鏈接：http://sikaile.net/guanlilunwen/ydhl/1784379.html

上一篇：開放式海量數(shù)據(jù)處理服務(wù)的計(jì)算完整性研究
下一篇：面向社交數(shù)據(jù)流連續(xù)查詢的基準(zhǔn)評(píng)測

論文發(fā)表

·知網(wǎng)|萬方|維普|龍?jiān)磡省級(jí)|國家級(jí)|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于機(jī)器學(xué)習(xí)的網(wǎng)絡(luò)流量識(shí)別方法與實(shí)現(xiàn)