基于Hadoop的網(wǎng)絡(luò)節(jié)點(diǎn)行為分析
發(fā)布時(shí)間:2018-06-21 20:42
本文選題:Hadoop + 大數(shù)據(jù)。 參考:《北京郵電大學(xué)》2015年碩士論文
【摘要】:近年來(lái),隨著互聯(lián)網(wǎng)技術(shù)快速的普及和應(yīng)用,網(wǎng)民數(shù)量持續(xù)上升,全國(guó)使用計(jì)算機(jī)辦公的企業(yè)比例為93.1%,大部分企業(yè)走進(jìn)了信息化的高速公路。互聯(lián)網(wǎng)的普及程度越來(lái)越高,網(wǎng)絡(luò)規(guī)模急劇擴(kuò)大,網(wǎng)絡(luò)節(jié)點(diǎn)隨之迅速增多;ヂ(lián)網(wǎng)在促進(jìn)信息交流,提供諸多便利的同時(shí),也帶來(lái)了一些新的問(wèn)題。如在安全性方面存在諸多漏洞,面臨種種網(wǎng)絡(luò)攻擊的威脅。因此,我們對(duì)網(wǎng)絡(luò)節(jié)點(diǎn)行為進(jìn)行研究分析具有重大的意義。 隨著網(wǎng)絡(luò)用戶的不斷增多,產(chǎn)生的網(wǎng)絡(luò)流量急劇增加,對(duì)網(wǎng)絡(luò)數(shù)據(jù)的存儲(chǔ)和傳輸要求已經(jīng)遠(yuǎn)遠(yuǎn)超過(guò)了傳統(tǒng)數(shù)據(jù)庫(kù)的處理能力。Apache的開(kāi)源項(xiàng)目Hadoop是一個(gè)有效處理海量數(shù)據(jù)的分布式軟件框架,能輕松實(shí)現(xiàn)大數(shù)據(jù)的分布式存儲(chǔ)與計(jì)算。 本文首先介紹了網(wǎng)絡(luò)節(jié)點(diǎn)行為分析的背景及意義,接著詳細(xì)介紹了Hadoop技術(shù)和網(wǎng)絡(luò)行為監(jiān)測(cè)分析系統(tǒng)。。之后,本文根據(jù)網(wǎng)絡(luò)會(huì)話的通信特點(diǎn)以及流量特征,提出一種新的網(wǎng)絡(luò)會(huì)話重組方式—復(fù)合會(huì)話,這種會(huì)話能更詳細(xì)的體現(xiàn)網(wǎng)絡(luò)會(huì)話過(guò)程的會(huì)話特點(diǎn)和報(bào)文特征。復(fù)合會(huì)話進(jìn)行采集和預(yù)處理,為本文的實(shí)驗(yàn)與分析提供了數(shù)據(jù)基礎(chǔ)。本文以復(fù)合會(huì)話為實(shí)驗(yàn)數(shù)據(jù),對(duì)網(wǎng)絡(luò)節(jié)點(diǎn)的流量,訪問(wèn)用戶數(shù)進(jìn)行分析,揭示了網(wǎng)絡(luò)節(jié)點(diǎn)的流量和用戶訪問(wèn)數(shù)分布規(guī)律。針對(duì)原始K-means算法對(duì)初始聚類中心敏感,以及評(píng)價(jià)函數(shù)片面考慮簇內(nèi)差異的缺陷,提出一種優(yōu)化的初始簇中心選擇方法和均衡化評(píng)價(jià)函數(shù)作為算法的改進(jìn)。實(shí)驗(yàn)表明改進(jìn)后的算法能有效消除聚類結(jié)果的不穩(wěn)定性,提高了聚類的準(zhǔn)確性,之后在Hadoop平臺(tái)上對(duì)K-means進(jìn)行分布式實(shí)現(xiàn),完成網(wǎng)絡(luò)節(jié)點(diǎn)聚類分析。本文最后使用ARIMA模型對(duì)網(wǎng)絡(luò)節(jié)點(diǎn)的流量、訪問(wèn)用戶數(shù)等參數(shù)進(jìn)行預(yù)測(cè),具有很好的預(yù)測(cè)效果。為了檢測(cè)網(wǎng)絡(luò)中的異常網(wǎng)絡(luò)節(jié)點(diǎn),本文克服以往異常檢測(cè)算法的不足,提出了一種新的基于距離與閡值判定的異常檢測(cè)算法,此算法具有快速高效、實(shí)時(shí)更新的特點(diǎn),對(duì)異常網(wǎng)絡(luò)節(jié)點(diǎn)的檢測(cè)具有很好的效果,并在工程實(shí)踐上具有很好的實(shí)用性
[Abstract]:In recent years, with the rapid popularization and application of Internet technology, the number of Internet users continues to rise, the proportion of enterprises using computer office in China is 93.1, most enterprises have entered the information highway. With the increasing popularity of the Internet, the scale of the network expands rapidly, and the number of network nodes increases rapidly. The Internet promotes information exchange, provides many conveniences, but also brings some new problems. For example, there are many vulnerabilities in security and face the threat of various network attacks. Therefore, it is of great significance to study and analyze the behavior of network nodes. With the increasing number of network users, the resulting network traffic increases dramatically. The requirement of network data storage and transmission has exceeded the processing ability of traditional database. Hadoop, an open source project of Apache, is a distributed software framework which can deal with massive data effectively, and can easily realize distributed storage and computation of big data. This paper first introduces the background and significance of network node behavior analysis, then introduces Hadoop technology and network behavior monitoring and analysis system in detail. Then, according to the communication characteristics and traffic characteristics of network sessions, this paper proposes a new network session reorganization method, compound session, which can reflect the conversation characteristics and packet characteristics of the network session process in more detail. The data base of the experiment and analysis is provided by the data acquisition and preprocessing of the composite session. Taking compound session as experimental data, this paper analyzes the traffic of network nodes and the number of users visited, and reveals the distribution of traffic and user visits of network nodes. In view of the original K-means algorithm is sensitive to the initial clustering center and the evaluation function considers the difference within the cluster unilaterally, an optimized initial cluster center selection method and an improved equalization evaluation function are proposed. Experiments show that the improved algorithm can effectively eliminate the instability of clustering results, improve the accuracy of clustering, and then implement K-means distributed on Hadoop platform to complete the clustering analysis of network nodes. In the end, Arima model is used to predict the network nodes' traffic, number of users and so on. In order to detect the abnormal network nodes in the network, this paper overcomes the shortcomings of the previous anomaly detection algorithms, and proposes a new anomaly detection algorithm based on distance and threshold decision, which has the characteristics of fast and efficient, real-time updating. It has a good effect on the detection of abnormal network nodes, and has good practicability in engineering practice.
【學(xué)位授予單位】:北京郵電大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2015
【分類號(hào)】:TP393.06
【參考文獻(xiàn)】
相關(guān)期刊論文 前7條
1 唐俊;趙曉娟;;基于蜜罐的主動(dòng)式網(wǎng)絡(luò)安全防御聯(lián)動(dòng)模型[J];計(jì)算機(jī)安全;2009年05期
2 張毅;萬(wàn)里勇;;基于主動(dòng)防御的蜜罐技術(shù)研究的綜述[J];廣西輕工業(yè);2011年05期
3 劉勁松;;數(shù)據(jù)挖掘中的現(xiàn)代時(shí)間序列分析方法[J];信息技術(shù);2007年07期
4 黃敏;何中市;邢欣來(lái);陳英;;一種新的k-means聚類中心選取算法[J];計(jì)算機(jī)工程與應(yīng)用;2011年35期
5 汪中;劉貴全;陳恩紅;;一種優(yōu)化初始中心點(diǎn)的K-means算法[J];模式識(shí)別與人工智能;2009年02期
6 仝雪姣;孟凡榮;王志曉;;對(duì)k-means初始聚類中心的優(yōu)化[J];計(jì)算機(jī)工程與設(shè)計(jì);2011年08期
7 白斌飛;晏正春;;ARIMA模型在移動(dòng)通信用戶數(shù)預(yù)測(cè)中的應(yīng)用[J];統(tǒng)計(jì)教育;2007年05期
,本文編號(hào):2049942
本文鏈接:http://sikaile.net/guanlilunwen/ydhl/2049942.html
最近更新
教材專著