基于Hadoop的網(wǎng)絡(luò)節(jié)點行為分析
發(fā)布時間:2018-06-21 20:42
本文選題:Hadoop + 大數(shù)據(jù); 參考:《北京郵電大學(xué)》2015年碩士論文
【摘要】:近年來,隨著互聯(lián)網(wǎng)技術(shù)快速的普及和應(yīng)用,網(wǎng)民數(shù)量持續(xù)上升,全國使用計算機辦公的企業(yè)比例為93.1%,大部分企業(yè)走進了信息化的高速公路;ヂ(lián)網(wǎng)的普及程度越來越高,網(wǎng)絡(luò)規(guī)模急劇擴大,網(wǎng)絡(luò)節(jié)點隨之迅速增多;ヂ(lián)網(wǎng)在促進信息交流,提供諸多便利的同時,也帶來了一些新的問題。如在安全性方面存在諸多漏洞,面臨種種網(wǎng)絡(luò)攻擊的威脅。因此,我們對網(wǎng)絡(luò)節(jié)點行為進行研究分析具有重大的意義。 隨著網(wǎng)絡(luò)用戶的不斷增多,產(chǎn)生的網(wǎng)絡(luò)流量急劇增加,對網(wǎng)絡(luò)數(shù)據(jù)的存儲和傳輸要求已經(jīng)遠遠超過了傳統(tǒng)數(shù)據(jù)庫的處理能力。Apache的開源項目Hadoop是一個有效處理海量數(shù)據(jù)的分布式軟件框架,能輕松實現(xiàn)大數(shù)據(jù)的分布式存儲與計算。 本文首先介紹了網(wǎng)絡(luò)節(jié)點行為分析的背景及意義,接著詳細介紹了Hadoop技術(shù)和網(wǎng)絡(luò)行為監(jiān)測分析系統(tǒng)。。之后,本文根據(jù)網(wǎng)絡(luò)會話的通信特點以及流量特征,提出一種新的網(wǎng)絡(luò)會話重組方式—復(fù)合會話,這種會話能更詳細的體現(xiàn)網(wǎng)絡(luò)會話過程的會話特點和報文特征。復(fù)合會話進行采集和預(yù)處理,為本文的實驗與分析提供了數(shù)據(jù)基礎(chǔ)。本文以復(fù)合會話為實驗數(shù)據(jù),對網(wǎng)絡(luò)節(jié)點的流量,訪問用戶數(shù)進行分析,揭示了網(wǎng)絡(luò)節(jié)點的流量和用戶訪問數(shù)分布規(guī)律。針對原始K-means算法對初始聚類中心敏感,以及評價函數(shù)片面考慮簇內(nèi)差異的缺陷,提出一種優(yōu)化的初始簇中心選擇方法和均衡化評價函數(shù)作為算法的改進。實驗表明改進后的算法能有效消除聚類結(jié)果的不穩(wěn)定性,提高了聚類的準確性,之后在Hadoop平臺上對K-means進行分布式實現(xiàn),完成網(wǎng)絡(luò)節(jié)點聚類分析。本文最后使用ARIMA模型對網(wǎng)絡(luò)節(jié)點的流量、訪問用戶數(shù)等參數(shù)進行預(yù)測,具有很好的預(yù)測效果。為了檢測網(wǎng)絡(luò)中的異常網(wǎng)絡(luò)節(jié)點,本文克服以往異常檢測算法的不足,提出了一種新的基于距離與閡值判定的異常檢測算法,此算法具有快速高效、實時更新的特點,對異常網(wǎng)絡(luò)節(jié)點的檢測具有很好的效果,并在工程實踐上具有很好的實用性
[Abstract]:In recent years, with the rapid popularization and application of Internet technology, the number of Internet users continues to rise, the proportion of enterprises using computer office in China is 93.1, most enterprises have entered the information highway. With the increasing popularity of the Internet, the scale of the network expands rapidly, and the number of network nodes increases rapidly. The Internet promotes information exchange, provides many conveniences, but also brings some new problems. For example, there are many vulnerabilities in security and face the threat of various network attacks. Therefore, it is of great significance to study and analyze the behavior of network nodes. With the increasing number of network users, the resulting network traffic increases dramatically. The requirement of network data storage and transmission has exceeded the processing ability of traditional database. Hadoop, an open source project of Apache, is a distributed software framework which can deal with massive data effectively, and can easily realize distributed storage and computation of big data. This paper first introduces the background and significance of network node behavior analysis, then introduces Hadoop technology and network behavior monitoring and analysis system in detail. Then, according to the communication characteristics and traffic characteristics of network sessions, this paper proposes a new network session reorganization method, compound session, which can reflect the conversation characteristics and packet characteristics of the network session process in more detail. The data base of the experiment and analysis is provided by the data acquisition and preprocessing of the composite session. Taking compound session as experimental data, this paper analyzes the traffic of network nodes and the number of users visited, and reveals the distribution of traffic and user visits of network nodes. In view of the original K-means algorithm is sensitive to the initial clustering center and the evaluation function considers the difference within the cluster unilaterally, an optimized initial cluster center selection method and an improved equalization evaluation function are proposed. Experiments show that the improved algorithm can effectively eliminate the instability of clustering results, improve the accuracy of clustering, and then implement K-means distributed on Hadoop platform to complete the clustering analysis of network nodes. In the end, Arima model is used to predict the network nodes' traffic, number of users and so on. In order to detect the abnormal network nodes in the network, this paper overcomes the shortcomings of the previous anomaly detection algorithms, and proposes a new anomaly detection algorithm based on distance and threshold decision, which has the characteristics of fast and efficient, real-time updating. It has a good effect on the detection of abnormal network nodes, and has good practicability in engineering practice.
【學(xué)位授予單位】:北京郵電大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2015
【分類號】:TP393.06
【參考文獻】
相關(guān)期刊論文 前7條
1 唐俊;趙曉娟;;基于蜜罐的主動式網(wǎng)絡(luò)安全防御聯(lián)動模型[J];計算機安全;2009年05期
2 張毅;萬里勇;;基于主動防御的蜜罐技術(shù)研究的綜述[J];廣西輕工業(yè);2011年05期
3 劉勁松;;數(shù)據(jù)挖掘中的現(xiàn)代時間序列分析方法[J];信息技術(shù);2007年07期
4 黃敏;何中市;邢欣來;陳英;;一種新的k-means聚類中心選取算法[J];計算機工程與應(yīng)用;2011年35期
5 汪中;劉貴全;陳恩紅;;一種優(yōu)化初始中心點的K-means算法[J];模式識別與人工智能;2009年02期
6 仝雪姣;孟凡榮;王志曉;;對k-means初始聚類中心的優(yōu)化[J];計算機工程與設(shè)計;2011年08期
7 白斌飛;晏正春;;ARIMA模型在移動通信用戶數(shù)預(yù)測中的應(yīng)用[J];統(tǒng)計教育;2007年05期
,本文編號:2049942
本文鏈接:http://sikaile.net/guanlilunwen/ydhl/2049942.html
最近更新
教材專著