基于Hadoop的證據(jù)保全系統(tǒng)的研究與實現(xiàn)

發(fā)布時間：2018-06-08 15:57

本文選題：云服務 + Hadoop��；參考：《電子科技大學》2014年碩士論文

【摘要】：隨著互聯(lián)網(wǎng)和移動互聯(lián)網(wǎng)的飛速發(fā)展,數(shù)據(jù)已經(jīng)呈現(xiàn)出指數(shù)級增長的態(tài)勢。面對海量數(shù)據(jù)帶來的挑戰(zhàn),國內(nèi)外各大互聯(lián)網(wǎng)公司紛紛將云計算的概念應用到商業(yè)服務中,并推出了各自的云服務。云服務是將各種計算資源和商業(yè)應用程序以互聯(lián)網(wǎng)為基礎提供給用戶的服務,這些服務將數(shù)據(jù)的處理過程從個人計算機或服務器轉(zhuǎn)移到互聯(lián)網(wǎng)的數(shù)據(jù)中心,從而減少用戶在硬件、軟件和專業(yè)技能方面的投資。目前云服務已經(jīng)被廣泛應用到各個商業(yè)場景中,并發(fā)展成為一個非常成熟的商業(yè)服務模式。本文基于Hadoop,主要完成以下工作:1,設計并實現(xiàn)針對于云服務的證據(jù)保全系統(tǒng)。證據(jù)保全系統(tǒng)主要實現(xiàn)以下功能:首先在云服務商與用戶之間架設網(wǎng)關服務器,根據(jù)云服務商指定的過濾條件獲取所有用戶對于指定的云服務API的HTTP請求,并提取出用戶特征信息。用戶特征信息主要包括:用戶名稱、用戶發(fā)起請求的時間、用戶所在地域、用戶請求的云服務API以及云服務API的參數(shù)。然后網(wǎng)關將用戶特征信息導入到數(shù)據(jù)分析系統(tǒng),數(shù)據(jù)分析系統(tǒng)將按照云服務商指定的數(shù)據(jù)分析條件對用戶特征信息進行分析,分析結(jié)果將以報表的形式展示給云服務商,最后數(shù)據(jù)分析系統(tǒng)將根據(jù)云服務商指定的歸檔條件將用戶特征信息歸檔到存儲系統(tǒng)以永久保存。考慮到云服務商的用戶基數(shù)巨大,證據(jù)保全系統(tǒng)所要處理的數(shù)據(jù)量預計將維持在PB級別,因此證據(jù)保全系統(tǒng)將采用云計算平臺Hadoop作為數(shù)據(jù)分析系統(tǒng)和存儲系統(tǒng)的底層實現(xiàn)。2,證據(jù)保全系統(tǒng)定期根據(jù)多種歸檔條件將用戶特征信息歸檔存儲在HDFS(Hadoop分布式文件系統(tǒng))中。歸檔操作根據(jù)不同的歸檔條件將用戶特征信息劃分成大量文件,其中既存在大量文件長度在GB級的大文件,也存在大量KB級的小文件。而HDFS是針對大文件存儲而設計的,大量小文件的存儲將導致HDFS集群整體性能降低。因此,本文將通過仔細閱讀Hadoop源碼,分析導致HDFS存儲大量小文件后性能降低的原因,并在此基礎上提出HDFS客戶端聚合索引策略,在客戶端對小文件進行聚合并建立索引,以實現(xiàn)對HDFS小文件存儲的優(yōu)化。
[Abstract]:With the rapid development of the Internet and mobile Internet, the data has shown exponential growth trend. In the face of the challenges brought by massive data, the concept of cloud computing has been applied to business services and their own cloud services have been introduced by the major Internet companies at home and abroad. Cloud services are services that provide users with Internet-based computing resources and business applications that move the processing of data from personal computers or servers to the Internet's data centers. This reduces user investment in hardware, software, and professional skills. Cloud service has been widely used in various business scenarios, and has developed into a very mature business service model. Based on Hadoop, this paper mainly completes the following work: 1, designs and implements the evidence preservation system for cloud services. The main functions of the evidence preservation system are as follows: first, the gateway server is set up between the cloud service provider and the user, and the HTTP request of all users for the specified cloud service API is obtained according to the filtering condition specified by the cloud service provider. The user characteristic information is extracted. The user characteristic information mainly includes: user name, the time when the user initiated the request, the location of the user, the cloud service API requested by the user and the parameters of the cloud service API. Then the gateway will import the user characteristic information into the data analysis system, and the data analysis system will analyze the user characteristic information according to the data analysis conditions specified by the cloud service provider, and the analysis results will be displayed to the cloud service provider in the form of a report form. Finally, the data analysis system will file the user characteristic information into the storage system according to the archiving condition specified by the cloud service provider for permanent storage. Considering that the cloud service provider has a large user base, the amount of data to be processed by the evidence preservation system is expected to remain at the PB level, Therefore, the cloud computing platform Hadoop will be used as the underlying implementation of the data analysis system and storage system, and the evidence preservation system will store the user feature information in the HDFSU Hadoop distributed file system according to a variety of archiving conditions. According to different archiving conditions, the archiving operation divides the user characteristic information into a large number of files, in which there are not only a large number of large files in GB level, but also a large number of small files of KB level. HDFS is designed for large file storage, and a large number of small files storage will lead to the overall performance of HDFS cluster degradation. Therefore, this article will read the Hadoop source code carefully, analyze the reason that causes HDFS to store a large number of small files, and on this basis put forward HDFS client aggregation index strategy, aggregate and index small files on the client side. To realize the optimization of HDFS small file storage.
【學位授予單位】：電子科技大學
【學位級別】：碩士
【學位授予年份】：2014
【分類號】：TP393.09;TP311.13

【參考文獻】

相關期刊論文前1條

1 張建勛;古志民;鄭超;;云計算研究進展綜述[J];計算機應用研究;2010年02期

相關碩士學位論文前1條

1 泰冬雪;基于Hadoop的海量小文件處理方法的研究[D];遼寧大學;2011年

，

本文編號：1996350

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://sikaile.net/guanlilunwen/ydhl/1996350.html

上一篇：支持服務關聯(lián)的Web服務選擇技術研究
下一篇：網(wǎng)管系統(tǒng)中基于谷歌地圖的拓撲可視化實現(xiàn)

論文發(fā)表

·知網(wǎng)|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于Hadoop的證據(jù)保全系統(tǒng)的研究與實現(xiàn)