基于Hadoop的網(wǎng)絡(luò)流量數(shù)據(jù)處理系統(tǒng)的實(shí)現(xiàn)與應(yīng)用
發(fā)布時(shí)間:2018-07-31 15:32
【摘要】:歷經(jīng)多年發(fā)展,我國互聯(lián)網(wǎng)已成為全球互聯(lián)網(wǎng)發(fā)展的重要組成部分。截止至2013年6月底,中國網(wǎng)民規(guī)模已達(dá)到5.91億,互聯(lián)網(wǎng)普及率約為44.1%。在互聯(lián)網(wǎng)高速發(fā)展的同時(shí),所暴露出來的問題也日益突出。一方面,不斷增加的用戶數(shù)量和層出不窮的新興業(yè)務(wù),使得互聯(lián)網(wǎng)流量數(shù)據(jù)激增,網(wǎng)絡(luò)擁塞的情況日益頻繁,對(duì)網(wǎng)絡(luò)服務(wù)質(zhì)量提出了更高的要求。另一方面,由于互聯(lián)網(wǎng)體系結(jié)構(gòu)的復(fù)雜化,使得對(duì)于互聯(lián)網(wǎng)流量特性、用戶行為特征、新興業(yè)務(wù)的流量特征等問題都還缺乏深入的理解和精確的描述,從而嚴(yán)重影響了互聯(lián)網(wǎng)的進(jìn)一步發(fā)展和網(wǎng)絡(luò)資源的有效利用。與此同時(shí),由于網(wǎng)絡(luò)流量的劇增,傳統(tǒng)的流量分析方法已無法滿足海量數(shù)據(jù)的存儲(chǔ)和處理要求,需要引入更高效、更可靠的方式進(jìn)行處理。而Hadoop正是一個(gè)能夠?qū)A繑?shù)據(jù)進(jìn)行可靠的分布式處理的可擴(kuò)展開源軟件框架,并已經(jīng)被應(yīng)用于越來越多的研究領(lǐng)域。 本文首先介紹了Hadoop的基本概念,包括Hadoop和HBase的工作原理。 隨后,在Hadoop技術(shù)的基礎(chǔ)上,本文提出了網(wǎng)絡(luò)流量處理系統(tǒng)的三層體系結(jié)構(gòu),將網(wǎng)絡(luò)流量的采集、存儲(chǔ)、處理和分析等獨(dú)立的功能整合到一起,形成具備完整功能的網(wǎng)絡(luò)流量處理系統(tǒng)。 接著,本文對(duì)網(wǎng)絡(luò)流量處理系統(tǒng)的數(shù)據(jù)層進(jìn)行了重點(diǎn)研究。先后詳細(xì)介紹了數(shù)據(jù)層的非實(shí)時(shí)組件——基于Hadoop的網(wǎng)絡(luò)流量數(shù)據(jù)控制組件,以及實(shí)時(shí)組件——基于HBase的流記錄控制組件。通過對(duì)這兩個(gè)組件的研究,解決了海量網(wǎng)絡(luò)流量分析領(lǐng)域中的一些重要問題。 最后,本文以智能終端流量特征分析為例對(duì)網(wǎng)絡(luò)流量處理系統(tǒng)的應(yīng)用層進(jìn)行了說明。
[Abstract]:After years of development, China's Internet has become an important part of the global Internet development. By the end of June 2013, China's Internet users had reached 591 million and Internet penetration was about 44.1 percent. In the rapid development of the Internet at the same time, exposed problems are also increasingly prominent. On the one hand, the increasing number of users and emerging services make the Internet traffic data surge, network congestion increasingly frequent, put forward higher requirements for the quality of network service. On the other hand, due to the complexity of Internet architecture, there is a lack of in-depth understanding and accurate description of Internet traffic characteristics, user behavior characteristics, traffic characteristics of emerging services, and so on. This has seriously affected the further development of the Internet and the effective use of network resources. At the same time, due to the rapid increase of network traffic, the traditional traffic analysis method can no longer meet the requirements of mass data storage and processing, so it is necessary to introduce a more efficient and reliable way to process it. Hadoop is a scalable open source software framework which can process massive data reliably and has been applied in more and more research fields. This paper first introduces the basic concepts of Hadoop, including the working principle of Hadoop and HBase. Then, on the basis of Hadoop technology, this paper proposes a three-layer architecture of network traffic processing system, which integrates the independent functions of network traffic collection, storage, processing and analysis. Form a complete function of the network traffic processing system. Then, this paper focuses on the data layer of network traffic processing system. The non-real-time component of the data layer, the network traffic data control component based on Hadoop, and the real-time component, the flow record control component based on HBase, are introduced in detail. Through the research of these two components, some important problems in the field of mass network traffic analysis are solved. Finally, the application layer of network traffic processing system is illustrated with the analysis of intelligent terminal traffic characteristics.
【學(xué)位授予單位】:北京郵電大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2014
【分類號(hào)】:TP393.06
[Abstract]:After years of development, China's Internet has become an important part of the global Internet development. By the end of June 2013, China's Internet users had reached 591 million and Internet penetration was about 44.1 percent. In the rapid development of the Internet at the same time, exposed problems are also increasingly prominent. On the one hand, the increasing number of users and emerging services make the Internet traffic data surge, network congestion increasingly frequent, put forward higher requirements for the quality of network service. On the other hand, due to the complexity of Internet architecture, there is a lack of in-depth understanding and accurate description of Internet traffic characteristics, user behavior characteristics, traffic characteristics of emerging services, and so on. This has seriously affected the further development of the Internet and the effective use of network resources. At the same time, due to the rapid increase of network traffic, the traditional traffic analysis method can no longer meet the requirements of mass data storage and processing, so it is necessary to introduce a more efficient and reliable way to process it. Hadoop is a scalable open source software framework which can process massive data reliably and has been applied in more and more research fields. This paper first introduces the basic concepts of Hadoop, including the working principle of Hadoop and HBase. Then, on the basis of Hadoop technology, this paper proposes a three-layer architecture of network traffic processing system, which integrates the independent functions of network traffic collection, storage, processing and analysis. Form a complete function of the network traffic processing system. Then, this paper focuses on the data layer of network traffic processing system. The non-real-time component of the data layer, the network traffic data control component based on Hadoop, and the real-time component, the flow record control component based on HBase, are introduced in detail. Through the research of these two components, some important problems in the field of mass network traffic analysis are solved. Finally, the application layer of network traffic processing system is illustrated with the analysis of intelligent terminal traffic characteristics.
【學(xué)位授予單位】:北京郵電大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2014
【分類號(hào)】:TP393.06
【相似文獻(xiàn)】
相關(guān)期刊論文 前10條
1 張大方;沈永堅(jiān);黎文偉;;一種基于歷史記錄的網(wǎng)絡(luò)流量數(shù)據(jù)采樣方法[J];湖南大學(xué)學(xué)報(bào)(自然科學(xué)版);2005年06期
2 吳亞東,孫世新;低分辨率小規(guī)模網(wǎng)絡(luò)流量數(shù)據(jù)的混沌特性鑒別[J];計(jì)算機(jī)應(yīng)用研究;2005年09期
3 楊波;劉淵;;基于算術(shù)平均值的網(wǎng)絡(luò)流量數(shù)據(jù)采樣方法[J];微計(jì)算機(jī)信息;2007年24期
4 張瑞;胡蓉;;基于季節(jié)時(shí)間序列模型的網(wǎng)絡(luò)流量實(shí)證分析[J];四川文理學(xué)院學(xué)報(bào);2012年05期
5 唐紅,吳勇軍;利用數(shù)據(jù)倉庫技術(shù)實(shí)現(xiàn)網(wǎng)絡(luò)流量數(shù)據(jù)分析[J];華中科技大學(xué)學(xué)報(bào)(自然科學(xué)版);2003年11期
6 歐陽e,
本文編號(hào):2156017
本文鏈接:http://sikaile.net/guanlilunwen/ydhl/2156017.html
最近更新
教材專著