基于Hadoop的農(nóng)業(yè)大數(shù)據(jù)處理系統(tǒng)研究
發(fā)布時間:2018-04-09 10:38
本文選題:大數(shù)據(jù) 切入點:農(nóng)業(yè)大數(shù)據(jù) 出處:《河南師范大學(xué)》2017年碩士論文
【摘要】:我國地域廣闊、生態(tài)類型復(fù)雜多樣、作物種類更是豐富繁多。因而,我國的農(nóng)業(yè)數(shù)據(jù)也是種類多樣、體量巨大。由于傳統(tǒng)農(nóng)業(yè)的局限性,各類農(nóng)業(yè)數(shù)據(jù)一直沒有被重視、充分的利用起來。隨著農(nóng)業(yè)信息化的推進和農(nóng)業(yè)現(xiàn)代化水平的提高,各類農(nóng)業(yè)數(shù)據(jù)開始受到人們的重視,發(fā)揮著越來越重要的作用,用于指導(dǎo)農(nóng)業(yè)生產(chǎn)。隨著物聯(lián)網(wǎng)等技術(shù)在農(nóng)業(yè)上大量使用,農(nóng)業(yè)數(shù)據(jù)的數(shù)據(jù)量呈幾何遞增,傳統(tǒng)的數(shù)據(jù)處理方式已不能滿足農(nóng)業(yè)數(shù)據(jù)的處理需求。農(nóng)業(yè)數(shù)據(jù)已經(jīng)逐漸滿足大數(shù)據(jù)的基本特性,成為農(nóng)業(yè)大數(shù)據(jù)。由于農(nóng)業(yè)自身的特點使得農(nóng)業(yè)大數(shù)據(jù)具有大量、多維、動態(tài)等特征。如何合理高效的應(yīng)對農(nóng)業(yè)大數(shù)據(jù)的發(fā)展,是一個非常重要的問題。大數(shù)據(jù)技術(shù)的飛速發(fā)展可以很好地解決農(nóng)業(yè)大數(shù)據(jù)所面臨的諸多難題。而最受關(guān)注的大數(shù)據(jù)處理平臺,無疑是谷歌公司的Hadoop。Hadoop是一個開源的、可運行于大規(guī)模集群上的分布式計算平臺,其實現(xiàn)了MapReduce計算模型,得到了廣泛地應(yīng)用并逐漸成為大數(shù)據(jù)的代名詞。MapReduce是由Google公司最早提出的,是一種并行編程模型,可用于大規(guī)模數(shù)據(jù)集的并行運算,是Google的核心計算模型[1]。Map函數(shù)、Reduce函數(shù)是MapReduce模型的核心,它們都利用key,value的數(shù)據(jù)結(jié)構(gòu)將將復(fù)雜的數(shù)據(jù)處理任務(wù)分布到各個計算機節(jié)點上,并利用分布式并行架構(gòu)來處理海量的復(fù)雜數(shù)據(jù)。本文對大數(shù)據(jù)的特點進行分析,根據(jù)農(nóng)業(yè)大數(shù)據(jù)的特點,對現(xiàn)有的農(nóng)業(yè)大數(shù)據(jù)處理系統(tǒng)的優(yōu)勢和不足進行分析和改進,設(shè)計了基于Hadoop平臺的農(nóng)業(yè)大數(shù)據(jù)處理系統(tǒng)。本文對經(jīng)典的數(shù)據(jù)挖掘進行了簡要的介紹,并針對MapReduce架構(gòu)對相應(yīng)算法的并行化進行分析。將CART算法針對MapReduce架構(gòu)進行并行化改進,并對該算法進行相應(yīng)的優(yōu)化。最后,將數(shù)據(jù)在系統(tǒng)中運行,驗證該系統(tǒng)的可行性以及算法改進后具有更高的性能。
[Abstract]:China has a vast area, complex ecological types, and a wide variety of crops.As a result, the agricultural data of our country is also diverse, the volume is huge.Because of the limitation of traditional agriculture, all kinds of agricultural data have not been paid attention to and fully utilized.With the development of agricultural informatization and the improvement of agricultural modernization, people begin to pay attention to all kinds of agricultural data and play a more and more important role in guiding agricultural production.With the extensive use of the Internet of things and other technologies in agriculture, the amount of agricultural data is increasing geometrically, and the traditional data processing methods can not meet the needs of agricultural data processing.Agricultural data has gradually satisfied big data's basic characteristics, become agricultural big data.Due to the characteristics of agriculture itself, agricultural big data has a large number of, multidimensional, dynamic and other characteristics.How to deal with the development of agricultural big data reasonably and efficiently is a very important problem.The rapid development of big data's technology can solve many difficult problems faced by agricultural big data.And the most concerned big data processing platform, undoubtedly, is Google's Hadoop.Hadoop, an open source distributed computing platform that can run on large clusters. It implements the MapReduce computing model.MapReduce, which has been widely used and has gradually become big data's pronoun, was first put forward by Google Company. It is a parallel programming model that can be used in parallel operation of large-scale data sets.It is the core computing model of Google [1] .Map function and reduce function is the core of MapReduce model. They all use the data structure of key value to distribute complex data processing tasks to each computer node.A distributed parallel architecture is used to deal with large amounts of complex data.This paper analyzes the characteristics of big data, analyzes and improves the advantages and disadvantages of the existing agricultural big data processing system according to the characteristics of agricultural big data, and designs an agricultural big data processing system based on Hadoop platform.This paper briefly introduces the classical data mining, and analyzes the parallelization of the corresponding algorithms based on the MapReduce architecture.The CART algorithm is parallelized to the MapReduce architecture, and the algorithm is optimized accordingly.Finally, the data is run in the system to verify the feasibility of the system and the improved algorithm has higher performance.
【學(xué)位授予單位】:河南師范大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:S126;TP311.13
【參考文獻】
相關(guān)期刊論文 前10條
1 王偉;;大數(shù)據(jù)環(huán)境下的管理信息系統(tǒng)課程在線教學(xué)改革探索[J];福建電腦;2017年01期
2 林克全;;基于模擬用戶行為的自動巡檢系統(tǒng)[J];數(shù)字技術(shù)與應(yīng)用;2017年01期
3 黎玲萍;毛克彪;付秀麗;馬瑩;王芳;劉R,
本文編號:1726089
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/1726089.html
最近更新
教材專著