基于Hadoop的海量工程數(shù)據(jù)處理技術(shù)研究

發(fā)布時間：2018-07-03 11:56

本文選題：海量工程數(shù)據(jù)處理 + Hadoop�。� 參考：《北京交通大學(xué)》2013年碩士論文

【摘要】：摘要：隨著工程項(xiàng)目信息化程度越來越高,海量的工程數(shù)據(jù)隨之產(chǎn)生,然而傳統(tǒng)的工程數(shù)據(jù)存儲技術(shù)無法滿足越來越高的數(shù)據(jù)存儲質(zhì)量的要求。近年來,云計算技術(shù)在工業(yè)界和學(xué)術(shù)界共同推動下取得了巨大的發(fā)展,大量的云計算系統(tǒng)投入使用。其中,Hadoop平臺被廣泛用來開發(fā)云計算程序。Hadoop最大的優(yōu)點(diǎn)就是實(shí)現(xiàn)了并行化對應(yīng)用開發(fā)者的透明處理,應(yīng)用開發(fā)者可以像開發(fā)普通程序一樣來開發(fā)云計算的應(yīng)用系統(tǒng),而集群的并行化則由Hadoop底層自動完成。本文基于Hadoop技術(shù)來研究工程領(lǐng)域海量數(shù)據(jù)的處理問題,主要采用Hadoop的HDFS分布式文件系統(tǒng)以及MapReduce分布式處理模型來支持海量工程數(shù)據(jù)的存儲和處理。海量工程數(shù)據(jù)的處理過程主要分為兩大部分：海量數(shù)據(jù)的存儲過程與計算分析過程。針對工程領(lǐng)域數(shù)據(jù)存儲的問題,本文分析設(shè)計了基于Hadoop的HDFS分布式文件系統(tǒng)的工程數(shù)據(jù)存儲系統(tǒng)。該系統(tǒng)基于Java7文件監(jiān)控器設(shè)計了的文件識別算法,該算法可以快速、準(zhǔn)確地監(jiān)控并識別客戶端本地文件目錄變化信息。配合基于Quartz的調(diào)度器的存儲作業(yè)調(diào)度以及HDFS文件操作API接口的調(diào)用,實(shí)現(xiàn)了跨平臺文件同步功能。經(jīng)過在云仿真平臺存儲系統(tǒng)中應(yīng)用測試,表明該系統(tǒng)具有較好的通用性、高效性以及經(jīng)濟(jì)性。所設(shè)計的文件同步方法較好地完成了文件同步任務(wù),解決了云仿真平臺存儲系統(tǒng)中核心的文件同步問題,提供了快速、正確的文件同步功能。針對海量工程數(shù)據(jù)計算分析的問題,本文基于Hadoop技術(shù)另外一個核心技術(shù)MapReduce分布式處理模型,以城市海量噪聲數(shù)據(jù)為應(yīng)用對象提出了海量數(shù)據(jù)分析處理模型,為城市社區(qū)噪聲監(jiān)測系統(tǒng)提供海量數(shù)據(jù)處理服務(wù)。針對城市噪聲數(shù)據(jù)特點(diǎn),提出了由四個部分組成的數(shù)據(jù)處理過程,分別是數(shù)據(jù)清洗、數(shù)據(jù)預(yù)處理、數(shù)據(jù)處理以及數(shù)據(jù)可視化。基于該處理模型,對城市噪聲監(jiān)測采集系統(tǒng)采集到的海量噪聲數(shù)據(jù)進(jìn)行存儲,將存儲后的數(shù)據(jù)進(jìn)行測試分析處理,實(shí)現(xiàn)了移動計算與海量工程數(shù)據(jù)存儲及分析的結(jié)合。測試結(jié)果表明該分布式處理模型快速、準(zhǔn)確、有效地完成了噪聲數(shù)據(jù)處理的任務(wù)。最后對基于Hadoop的海量工程數(shù)據(jù)的存儲與計算分析的應(yīng)用情況進(jìn)行了研究成果總結(jié),并對下一步工作進(jìn)行了展望。
[Abstract]:Absrtact: with the increasing degree of engineering project information, massive engineering data is produced. However, the traditional engineering data storage technology can not meet the requirements of higher and higher quality of data storage. In recent years, cloud computing technology has made great progress under the promotion of industry and academia, and a large number of cloud computing systems have been put into use. Among them, Hadoop platform is widely used to develop cloud computing programs. Hadoop has the greatest advantage of parallelizing the transparent processing of application developers. Application developers can develop cloud computing applications like common programs. The parallelization of cluster is accomplished automatically by Hadoop bottom layer. Based on Hadoop technology, this paper studies the problem of mass data processing in engineering field. Hadoop's HDFS distributed file system and MapReduce distributed processing model are used to support the storage and processing of mass engineering data. The processing process of mass engineering data is divided into two parts: the stored process of mass data and the process of calculation and analysis. Aiming at the problem of data storage in engineering field, this paper analyzes and designs the engineering data storage system of HDFS distributed file system based on Hadoop. The system is based on the file recognition algorithm designed by Java 7 file monitor. The algorithm can quickly and accurately monitor and recognize the local file directory change information on the client side. Combined with the storage job scheduling of Quartz based scheduler and the call of HDFS file operation API interface, the function of file synchronization across platforms is realized. Through the application test in the cloud simulation platform storage system, it shows that the system has good generality, high efficiency and economy. The designed method of file synchronization completes the task of file synchronization, solves the problem of file synchronization in the storage system of cloud simulation platform, and provides a fast and correct function of file synchronization. Aiming at the problem of computing and analyzing mass engineering data, based on another core technology of Hadoop technology, MapReduce distributed processing model, this paper proposes a mass data analysis and processing model based on urban mass noise data as the application object. To provide massive data processing services for urban community noise monitoring system. According to the characteristics of urban noise data, a data processing process consisting of four parts is proposed, which is data cleaning, data preprocessing, data processing and data visualization. Based on the model, the mass noise data collected by the urban noise monitoring and acquisition system are stored, and the stored data are tested and analyzed. The combination of mobile computing and mass engineering data storage and analysis is realized. The test results show that the distributed processing model is fast, accurate and effective in noise data processing. Finally, the application of Hadoop based massive engineering data storage and computing analysis is summarized, and the future work is prospected.
【學(xué)位授予單位】：北京交通大學(xué)
【學(xué)位級別】：碩士
【學(xué)位授予年份】：2013
【分類號】：TP333

【參考文獻(xiàn)】

中國期刊全文數(shù)據(jù)庫前10條

1 李s，

本文編號：2093554

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/jisuanjikexuelunwen/2093554.html

上一篇：基于FPGA的智能網(wǎng)絡(luò)接口設(shè)計
下一篇：高職計算機(jī)信息管理專業(yè)教學(xué)模式改革探索

論文發(fā)表

·知網(wǎng)|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于Hadoop的海量工程數(shù)據(jù)處理技術(shù)研究