基于Hadoop平臺的云計(jì)算構(gòu)建與日志分析
本文選題:云計(jì)算 + Hadoop ; 參考:《哈爾濱理工大學(xué)》2012年碩士論文
【摘要】:云計(jì)算是一種新型計(jì)算模型,它將計(jì)算任務(wù)分布在大量計(jì)算機(jī)構(gòu)成的資源池里,使用戶能夠按需獲取計(jì)算能力、存儲空間和信息服務(wù)。與傳統(tǒng)的數(shù)據(jù)處理模式相比,云計(jì)算技術(shù)可以有效解決大規(guī)模數(shù)據(jù)處理面臨的性能瓶頸問題,提高了數(shù)據(jù)處理的可靠性及可擴(kuò)展性,提高了數(shù)據(jù)處理能力的同時降低了計(jì)算對硬件設(shè)備的要求。本文對云計(jì)算概念、種類、關(guān)鍵技術(shù)等進(jìn)行了重點(diǎn)研究。 Hadoop是一個開源分布式計(jì)算平臺,它專為處理大規(guī)模數(shù)據(jù)和分布式計(jì)算而設(shè)計(jì),是實(shí)現(xiàn)云計(jì)算的主要可選方式之一。Hadoop平臺具有高效、可靠、擴(kuò)展性強(qiáng)等特點(diǎn),它的兩個主要組成部分是Hadoop分布式文件系統(tǒng)HDFS和并行處理模型MapReduce。本文對HDFS的幾個方面:設(shè)計(jì)前提與目標(biāo)、體系結(jié)構(gòu)、保障可靠性措施和提升性能措施以及MapReduce的幾個方面:邏輯模型、編程模型、實(shí)現(xiàn)機(jī)制以及執(zhí)行流程的細(xì)致分析和研究。 在分析了原有海量數(shù)據(jù)處理系統(tǒng)之后,結(jié)合云計(jì)算技術(shù)和Hadoop的優(yōu)勢,建立了一個新的數(shù)據(jù)處理模型,依據(jù)該模型搭建了系統(tǒng)平臺,并使用Web日志作為源數(shù)據(jù)進(jìn)行了平臺上的性能分析。通過實(shí)驗(yàn)對比,總結(jié)了利用云計(jì)算技術(shù),使得日志分析過程在消耗時間上大大縮短,,并且隨著數(shù)據(jù)量的增大,Hadoop平臺的處理能力和數(shù)據(jù)存儲能力也在適應(yīng)著數(shù)據(jù)量的變化,恰恰體現(xiàn)了云計(jì)算技術(shù)在處理大規(guī)模數(shù)據(jù)時計(jì)算能力、存儲空間等按需提高的優(yōu)勢。基于Hadoop平臺的云計(jì)算環(huán)境在處理大規(guī)模數(shù)據(jù)方面解決了傳統(tǒng)數(shù)據(jù)處理方法計(jì)算能力與存儲能力的性能瓶頸問題,并且良好的可擴(kuò)展性使得這種能力可以靈活的使用。
[Abstract]:Cloud computing is a new computing model, which distributes computing tasks in a large number of computer resource pools, enabling users to acquire computing power, storage space and information services on demand. Compared with the traditional data processing mode, cloud computing technology can effectively solve the performance bottleneck of large-scale data processing, and improve the reliability and scalability of data processing. The ability of data processing is improved and the requirement of computing hardware is reduced. This paper focuses on cloud computing concepts, categories, key technologies and so on. Hadoop is an open source distributed computing platform, which is specially designed to deal with large-scale data and distributed computing. It is one of the main options to implement cloud computing. Hadoop platform has the characteristics of high efficiency, reliability and expansibility. Its two main components are Hadoop distributed file system HDFS and parallel processing model MapReduce. This paper analyzes and studies several aspects of HDFS: design premise and goal, architecture, measures to guarantee reliability and improve performance, and several aspects of MapReduce: logical model, programming model, implementation mechanism and execution flow. After analyzing the original massive data processing system, combining the advantages of cloud computing technology and Hadoop, a new data processing model is established, and the system platform is built according to the model. Web log is used as the source data to analyze the performance of the platform. Through the comparison of experiments, this paper summarizes the use of cloud computing technology, which makes the log analysis process greatly shorten the consumption time, and with the increase of the amount of data, the processing capacity and data storage capacity of Hadoop platform are also adapted to the change of data volume. Cloud computing technology in processing large-scale data computing power, storage space and other advantages on demand. The cloud computing environment based on Hadoop platform solves the performance bottleneck problem of traditional data processing method computing ability and storage ability in dealing with large-scale data, and the good scalability makes this ability can be used flexibly.
【學(xué)位授予單位】:哈爾濱理工大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2012
【分類號】:TP3
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 陳濤;;云計(jì)算理論及技術(shù)研究[J];重慶交通大學(xué)學(xué)報(bào)(社會科學(xué)版);2009年04期
2 李俊茹;楊國林;;海量數(shù)據(jù)分布式處理的研究與實(shí)現(xiàn)[J];電腦開發(fā)與應(yīng)用;2009年06期
3 林樂然;陳德龍;;基于云計(jì)算的分布式企業(yè)搜索引擎研究[J];電腦知識與技術(shù);2009年33期
4 吳吉義;平玲娣;潘雪增;李卓;;云計(jì)算:從概念到平臺[J];電信科學(xué);2009年12期
5 歐亮;朱永慶;何曉明;鄒潔;;云計(jì)算技術(shù)在泛在網(wǎng)絡(luò)中的應(yīng)用前景分析[J];電信科學(xué);2010年06期
6 張健;曹薊光;;互聯(lián)網(wǎng)中云計(jì)算技術(shù)研究[J];電信網(wǎng)技術(shù);2009年10期
7 房秉毅;張?jiān)朴?陳清金;;云計(jì)算環(huán)境下統(tǒng)一SaaS平臺[J];電信網(wǎng)技術(shù);2011年05期
8 孫牧;;云端的小飛象—Hadoop[J];程序員;2008年10期
9 蔣建洪;;主要分布式搜索引擎技術(shù)的研究[J];科學(xué)技術(shù)與工程;2007年10期
10 洪沙;楊深遠(yuǎn);;云計(jì)算關(guān)鍵技術(shù)及基于Hadoop的云計(jì)算模型研究[J];軟件導(dǎo)刊;2010年09期
相關(guān)碩士學(xué)位論文 前6條
1 朱珠;基于Hadoop的海量數(shù)據(jù)處理模型研究和應(yīng)用[D];北京郵電大學(xué);2008年
2 付志超;基于Map/Reduce的分布式智能搜索引擎框架研究[D];武漢理工大學(xué);2008年
3 張建梁;基于云計(jì)算的語義搜索引擎研究[D];復(fù)旦大學(xué);2009年
4 鄧自立;云計(jì)算中的網(wǎng)絡(luò)拓?fù)湓O(shè)計(jì)和Hadoop平臺研究[D];中國科學(xué)技術(shù)大學(xué);2009年
5 肖斐;虛擬化云計(jì)算中資源管理的研究與實(shí)現(xiàn)[D];西安電子科技大學(xué);2010年
6 馬強(qiáng);基于MapReduce的復(fù)雜結(jié)構(gòu)數(shù)據(jù)處理[D];復(fù)旦大學(xué);2010年
本文編號:1939780
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/1939780.html