當(dāng)前位置：主頁 > 科技論文 > 計(jì)算機(jī)應(yīng)用論文 >

Compatible Study of Hadoop for Efficient Analyzing and Proce

發(fā)布時(shí)間：2021-01-02 04:02

　　在利用計(jì)算機(jī)的同時(shí),數(shù)據(jù)不斷產(chǎn)生和積累。導(dǎo)致的問題是在哪里保存這些數(shù)據(jù)?過去解決此問題,存儲(chǔ)成本過大。然而,由于近來技術(shù)的發(fā)展,存儲(chǔ)費(fèi)用已減少。大數(shù)據(jù)是數(shù)據(jù)集的集合,而數(shù)據(jù)集的規(guī)模更大且涉及面更廣,使用傳統(tǒng)的數(shù)據(jù)庫管理工具很難處理。同時(shí),使用傳統(tǒng)方法處理大量數(shù)據(jù)集非常耗時(shí),因此,比傳統(tǒng)方法更快,效率更高的Hadoop框架被廣泛使用。主要目標(biāo)是對(duì)不斷產(chǎn)生的數(shù)據(jù)進(jìn)行處理,效率更高,耗時(shí)更少,并且不用存儲(chǔ)數(shù)據(jù)。數(shù)據(jù)主要分為三類:結(jié)構(gòu)化數(shù)據(jù)、非結(jié)構(gòu)化數(shù)據(jù)和半結(jié)構(gòu)化數(shù)據(jù)。為了處理這些巨大的數(shù)據(jù)集,Hadoop中提供了不同類型的框架。我們主要關(guān)注Pig、Hive和Impala這三個(gè)不同的框架,圍繞如何有效分析結(jié)構(gòu)化數(shù)據(jù)集并減少結(jié)構(gòu)化數(shù)據(jù)集的時(shí)間消耗展開系統(tǒng)研究。我們通過將三種Hadoop框架應(yīng)用于兩個(gè)不同的數(shù)據(jù)集進(jìn)行實(shí)驗(yàn)比較,檢查數(shù)據(jù)處理效率。具體來說,我們?cè)贖ive,Pig和Impala上執(zhí)行類似的任務(wù)并完成實(shí)驗(yàn)結(jié)果評(píng)測(cè)。結(jié)果表明,Impala比Hive和Pig效率更高,因?yàn)閳?zhí)行任務(wù)所需的時(shí)間更少。

【文章來源】：西南科技大學(xué)四川省

【文章頁數(shù)】：59 頁

【學(xué)位級(jí)別】：碩士

【文章目錄】：
摘要
Abstract
CHAPTER1 INTRODUCTION
    1.1 Introduction
    1.2 Big Data Definitions
    1.3 Research Background
        1.3.1 Big Data Applications
        1.3.2 Challenges of Big Data
        1.3.3 Apache Hadoop
        1.3.4 Hadoop Environment
        1.3.5 Hadoop Architecture and Design
        1.3.6 Hadoop Distributed File System（HDFS）
        1.3.7 MapReduce
        1.3.8 Hadoop Ecosystem
    1.4 Objective of Research
    1.5 Contributions and Significance of Research
CHAPTER2 Related Work/Review of Literature
    2.1 INTRODUCTION
    2.2 Review of Literature
Chapter3 Methodology
    3.1 Completely Unstructured Data
    3.2 Semi-Structured Data
    3.3 Structured Data
    3.4 Estimation Technique
    3.5 Apache PIG-based Calculating
    3.6 Apache HIVE-based Data Storage
    3.7 Apache IMPALA-based Data Management
Chapter4 Experiment and Results
    4.1 Dataset
    4.2 System Requirements
    4.3 Apache Pig
        4.3.1 Contents of our Input File
        4.3.2 Copying the Input File
        4.3.3 Executing the Pig commands on File
        4.3.4 Mapper and Reducer Running Job
        4.3.5 Output
    4.4 Apache Hive
        4.4.1 Create Table and Loading the Data
        4.4.2 Query Execution
        4.4.3 Mapper and Reducer Running Job
    4.5 Apache Impala
        4.5.1 Contents of Input File
        4.5.2 Create Table and Loading the Data
        4.5.3 Query Execution
        4.5.4 Output
    4.6 Comparison of Results（Pig,Hive Impala）
Chapter5 Conclusion and Future Work
    5.1 Conclusion
    5.2 Future Work
Reference
ACKNOWLEDGEMENTS
Academic Achievements
DEDICATION

本文編號(hào)：2952612

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會(huì)員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/shengwushengchang/2952612.html

上一篇：深度學(xué)習(xí)在胎盤超聲圖像及皮膚鏡圖像識(shí)別中的研究
下一篇：基于Unity3D的虛擬博物館漫游及虛擬現(xiàn)實(shí)技術(shù)研究

論文發(fā)表

·知網(wǎng)|萬方|維普|龍?jiān)磡省級(jí)|國家級(jí)|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

Compatible Study of Hadoop for Efficient Analyzing and Proce