基于Hadoop的海量網(wǎng)絡(luò)流量日志處理技術(shù)研究與實(shí)現(xiàn)
發(fā)布時間:2018-09-19 20:28
【摘要】:伴隨著網(wǎng)絡(luò)的高速發(fā)展,大數(shù)據(jù)時代的降臨,海量網(wǎng)絡(luò)流量數(shù)據(jù)的處理需求也應(yīng)運(yùn)而生。為滿足海量網(wǎng)絡(luò)流量數(shù)據(jù)的處理需求,對網(wǎng)絡(luò)流量進(jìn)行有效、深入地分析,實(shí)現(xiàn)對網(wǎng)絡(luò)流量有力監(jiān)管,需要針對骨干網(wǎng)進(jìn)行網(wǎng)絡(luò)流量日志的高效采集然后再對網(wǎng)絡(luò)流量日志進(jìn)行高效地分析處理。對網(wǎng)絡(luò)流量日志的多維度統(tǒng)計(jì)分析,可以深入了解網(wǎng)絡(luò)的運(yùn)行及使用狀況,以調(diào)整策略提高網(wǎng)絡(luò)質(zhì)量;對網(wǎng)絡(luò)流量日志的深入挖掘分析,可以發(fā)掘用戶上網(wǎng)特點(diǎn)及偏好,可以深入了解用戶需求,以高效服務(wù)提高用戶滿意度。因此,該課題研究了網(wǎng)絡(luò)流量日志的處理技術(shù),并最終實(shí)現(xiàn)了基于Hadoop的HAMANT海量網(wǎng)絡(luò)流量日志分析系統(tǒng)(由關(guān)鍵英文單詞首字母縮寫而成)。 本文首先介紹了課題背景與意義,日志處理技術(shù)現(xiàn)狀,另概述了與課題相關(guān)的一些關(guān)鍵技術(shù),包括大數(shù)據(jù)、DPI、Hadoop、Hbase、數(shù)據(jù)挖掘等。隨后依據(jù)課題需要,結(jié)合應(yīng)用場景對海量網(wǎng)絡(luò)流量日志處理技術(shù)進(jìn)行了需求及功能分析,給出了HAMANT日志分析系統(tǒng)的整體框架,并給出了其中日志采集、日志預(yù)處理、日志存儲、日志統(tǒng)計(jì)分析、日志挖掘分析、報表展示等模塊的詳細(xì)設(shè)計(jì)。最后,進(jìn)行了該系統(tǒng)各項(xiàng)性能測試,并結(jié)合對某重點(diǎn)高校骨干網(wǎng)的海量網(wǎng)絡(luò)流量的處理進(jìn)行了效果展示,證明了本系統(tǒng)對于海量網(wǎng)絡(luò)流量日志的處理能夠達(dá)到較好效果,而且還具有一定可擴(kuò)展性。 本課題對于網(wǎng)絡(luò)流量日志技術(shù)進(jìn)行了較為深入地探究,并最終設(shè)計(jì)出基于Hadoop的HAMANT日志分析系統(tǒng)。該系統(tǒng)對網(wǎng)絡(luò)流量日志采集加入了DPI協(xié)議識別引擎,使網(wǎng)絡(luò)流量日志采集豐富而高效;日志存儲、處理部分采用分布式處理,支持自動備份、容錯,克服了傳統(tǒng)的日志單機(jī)處理計(jì)算速度慢、存儲空間不足、服務(wù)器壓力較大的問題;將數(shù)據(jù)挖掘中的聚類算法進(jìn)行了分布式實(shí)現(xiàn)并加入系統(tǒng),實(shí)現(xiàn)了對于海量網(wǎng)絡(luò)流量日志的深度分析,能發(fā)掘大量網(wǎng)絡(luò)用戶背后所隱藏的上網(wǎng)行為偏好。最后給出了系統(tǒng)性能測試及實(shí)際應(yīng)用實(shí)驗(yàn)分析。
[Abstract]:With the rapid development of network and the advent of big data era, massive network traffic data processing demand also came into being. In order to meet the demand of massive network traffic data processing, the network traffic is analyzed effectively and deeply, and the network traffic can be supervised effectively. It is necessary to collect the network traffic log efficiently for the backbone network and then analyze and process the network traffic log efficiently. The multi-dimensional statistical analysis of network traffic log can deeply understand the operation and usage of the network, adjust the strategy to improve the network quality, and the in-depth mining analysis of the network traffic log can discover the characteristics and preferences of users on the Internet. Can deeply understand the user needs, to improve user satisfaction with efficient services. Therefore, this paper studies the processing technology of network traffic log, and finally realizes the HAMANT massive network traffic log analysis system based on Hadoop (abbreviated by the acronym of key words). This paper first introduces the background and significance of the project, the present situation of log processing technology, and summarizes some key technologies related to the subject, including big data's DPI / Hadoop Hbase, data mining and so on. Then according to the need of the project, combined with the application scene, the requirements and functions of the massive network traffic log processing technology are analyzed, and the overall framework of the HAMANT log analysis system is given, and the log collection, log preprocessing and log storage are also given. Log statistics analysis, log mining analysis, report presentation module detailed design. Finally, the performance tests of the system are carried out, and the effect of dealing with the massive network traffic of a key university backbone network is demonstrated, which proves that the system can achieve better results for the processing of the massive network traffic log. And also has certain expansibility. In this paper, the network traffic log technology is deeply explored, and finally a HAMANT log analysis system based on Hadoop is designed. The system adds DPI protocol recognition engine to the collection of network traffic log, which makes the collection of network traffic log rich and efficient, and the part of log storage and processing adopts distributed processing, supports automatic backup, fault-tolerant, and so on. It overcomes the problems of slow processing speed, insufficient storage space and high pressure of server in traditional log processing, and implements the clustering algorithm in data mining distributed and joins the system. The deep analysis of massive network traffic log is realized, which can discover the hidden behavior preference behind a large number of network users. Finally, the system performance test and practical application experiment analysis are given.
【學(xué)位授予單位】:北京郵電大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2014
【分類號】:TP393.06;TP311.13
[Abstract]:With the rapid development of network and the advent of big data era, massive network traffic data processing demand also came into being. In order to meet the demand of massive network traffic data processing, the network traffic is analyzed effectively and deeply, and the network traffic can be supervised effectively. It is necessary to collect the network traffic log efficiently for the backbone network and then analyze and process the network traffic log efficiently. The multi-dimensional statistical analysis of network traffic log can deeply understand the operation and usage of the network, adjust the strategy to improve the network quality, and the in-depth mining analysis of the network traffic log can discover the characteristics and preferences of users on the Internet. Can deeply understand the user needs, to improve user satisfaction with efficient services. Therefore, this paper studies the processing technology of network traffic log, and finally realizes the HAMANT massive network traffic log analysis system based on Hadoop (abbreviated by the acronym of key words). This paper first introduces the background and significance of the project, the present situation of log processing technology, and summarizes some key technologies related to the subject, including big data's DPI / Hadoop Hbase, data mining and so on. Then according to the need of the project, combined with the application scene, the requirements and functions of the massive network traffic log processing technology are analyzed, and the overall framework of the HAMANT log analysis system is given, and the log collection, log preprocessing and log storage are also given. Log statistics analysis, log mining analysis, report presentation module detailed design. Finally, the performance tests of the system are carried out, and the effect of dealing with the massive network traffic of a key university backbone network is demonstrated, which proves that the system can achieve better results for the processing of the massive network traffic log. And also has certain expansibility. In this paper, the network traffic log technology is deeply explored, and finally a HAMANT log analysis system based on Hadoop is designed. The system adds DPI protocol recognition engine to the collection of network traffic log, which makes the collection of network traffic log rich and efficient, and the part of log storage and processing adopts distributed processing, supports automatic backup, fault-tolerant, and so on. It overcomes the problems of slow processing speed, insufficient storage space and high pressure of server in traditional log processing, and implements the clustering algorithm in data mining distributed and joins the system. The deep analysis of massive network traffic log is realized, which can discover the hidden behavior preference behind a large number of network users. Finally, the system performance test and practical application experiment analysis are given.
【學(xué)位授予單位】:北京郵電大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2014
【分類號】:TP393.06;TP311.13
【參考文獻(xiàn)】
相關(guān)期刊論文 前7條
1 陳亮;龔儉;徐選;;基于特征串的應(yīng)用層協(xié)議識別[J];計(jì)算機(jī)工程與應(yīng)用;2006年24期
2 曹晶華;鄒翔;;校園網(wǎng)網(wǎng)絡(luò)流量日志處理的設(shè)計(jì)與實(shí)現(xiàn)[J];計(jì)算機(jī)時代;2008年10期
3 王珊;王會舉;覃雄派;周p,
本文編號:2251252
本文鏈接:http://sikaile.net/guanlilunwen/ydhl/2251252.html
最近更新
教材專著