高速網(wǎng)絡(luò)流量環(huán)境中分布式大數(shù)據(jù)處理模式的性能研究
發(fā)布時(shí)間:2018-05-04 12:38
本文選題:大數(shù)據(jù) + 分布式計(jì)算。 參考:《北京郵電大學(xué)》2016年碩士論文
【摘要】:隨著互聯(lián)網(wǎng)與通信技術(shù)的飛速發(fā)展,網(wǎng)絡(luò)與人們的生活緊密相關(guān)。這些豐富和便捷著人們的互聯(lián)網(wǎng)應(yīng)用,也無(wú)時(shí)無(wú)刻不在產(chǎn)生著大量用戶數(shù)據(jù)的網(wǎng)絡(luò)流量,這些流量中包含著非常有價(jià)值的行為信息。如何能夠在高速網(wǎng)絡(luò)流量環(huán)境中,對(duì)這些數(shù)據(jù)進(jìn)行高效率分析和處理成為了學(xué)術(shù)界和工業(yè)界關(guān)注的焦點(diǎn)。由于目前在分布式大數(shù)據(jù)處理模式上針對(duì)其性能的相關(guān)分析和研究還很匱乏和淺顯,因此有必要進(jìn)一步通過仿真建模和數(shù)據(jù)分析等方法,對(duì)分布式大數(shù)據(jù)處理模式的性能表現(xiàn)進(jìn)行深入的研究。本文首先介紹了高速網(wǎng)絡(luò)流量環(huán)境的特點(diǎn),以及在高速網(wǎng)絡(luò)流量環(huán)境下進(jìn)行海量數(shù)據(jù)處理所面臨的技術(shù)挑戰(zhàn)。最后對(duì)解決大數(shù)據(jù)問題的相關(guān)技術(shù)方案進(jìn)行簡(jiǎn)要說(shuō)明。隨后,對(duì)業(yè)界最廣泛采用的Hadoop大數(shù)據(jù)處理模式的技術(shù)實(shí)現(xiàn)進(jìn)行了深入分析,其中著重對(duì)其性能表現(xiàn)的重要影響因素進(jìn)行了細(xì)致的討論。接下來(lái),提出了通過基于Petri網(wǎng)對(duì)Hadoop進(jìn)行仿真建模的方法,并實(shí)現(xiàn)對(duì)Hadoop的性能表現(xiàn)進(jìn)行預(yù)測(cè)的仿真工具。并且通過比較其仿真結(jié)果與Hadoop在真實(shí)環(huán)境中的實(shí)際測(cè)試數(shù)據(jù),從而證明Hadoop仿真工具的準(zhǔn)確性,高效性和可拓展性。最后,對(duì)Spark這種新興的大數(shù)據(jù)處理模式的產(chǎn)生原因和設(shè)計(jì)思想進(jìn)行了深入分析。以及通過Spark與Hadoop在高速流量網(wǎng)絡(luò)環(huán)境中的實(shí)際測(cè)試性能數(shù)據(jù),對(duì)兩者的性能表現(xiàn)進(jìn)行分析。
[Abstract]:With the rapid development of Internet and communication technology, the network is closely related to people's life. These abundant and convenient people's Internet application, also all the time produces the massive user data the network traffic, these traffic contains the very valuable behavior information. How to efficiently analyze and process these data in high-speed network traffic environment has become the focus of academia and industry. Due to the lack of relevant analysis and research on its performance in the distributed big data processing mode, it is necessary to further use simulation modeling and data analysis methods. The performance of distributed big data processing mode is studied deeply. This paper first introduces the characteristics of high-speed network traffic environment and the technical challenges of mass data processing in high-speed network traffic environment. Finally, the technical solution to big data problem is briefly explained. Then, the technical implementation of Hadoop big data processing mode, which is widely used in the industry, is analyzed in depth, and the important influencing factors of its performance are discussed in detail. Then, the simulation modeling method of Hadoop based on Petri net is put forward, and the simulation tool to predict the performance of Hadoop is realized. By comparing the simulation results with the actual test data of Hadoop in real environment, the accuracy, efficiency and expansibility of Hadoop simulation tools are proved. Finally, this paper analyzes the cause and design idea of Spark, a new treatment mode of big data. The performance of Spark and Hadoop in high speed traffic network is analyzed.
【學(xué)位授予單位】:北京郵電大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2016
【分類號(hào)】:TP311.13
【參考文獻(xiàn)】
相關(guān)期刊論文 前1條
1 ;第34次中國(guó)互聯(lián)網(wǎng)絡(luò)發(fā)展?fàn)顩r統(tǒng)計(jì)報(bào)告[J];互聯(lián)網(wǎng)天地;2014年07期
,本文編號(hào):1843097
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/1843097.html
最近更新
教材專著