基于Hadoop的證據(jù)保全系統(tǒng)的研究與實(shí)現(xiàn)
發(fā)布時(shí)間:2018-06-08 15:57
本文選題:云服務(wù) + Hadoop; 參考:《電子科技大學(xué)》2014年碩士論文
【摘要】:隨著互聯(lián)網(wǎng)和移動(dòng)互聯(lián)網(wǎng)的飛速發(fā)展,數(shù)據(jù)已經(jīng)呈現(xiàn)出指數(shù)級(jí)增長(zhǎng)的態(tài)勢(shì)。面對(duì)海量數(shù)據(jù)帶來(lái)的挑戰(zhàn),國(guó)內(nèi)外各大互聯(lián)網(wǎng)公司紛紛將云計(jì)算的概念應(yīng)用到商業(yè)服務(wù)中,并推出了各自的云服務(wù)。云服務(wù)是將各種計(jì)算資源和商業(yè)應(yīng)用程序以互聯(lián)網(wǎng)為基礎(chǔ)提供給用戶的服務(wù),這些服務(wù)將數(shù)據(jù)的處理過(guò)程從個(gè)人計(jì)算機(jī)或服務(wù)器轉(zhuǎn)移到互聯(lián)網(wǎng)的數(shù)據(jù)中心,從而減少用戶在硬件、軟件和專業(yè)技能方面的投資。目前云服務(wù)已經(jīng)被廣泛應(yīng)用到各個(gè)商業(yè)場(chǎng)景中,并發(fā)展成為一個(gè)非常成熟的商業(yè)服務(wù)模式。本文基于Hadoop,主要完成以下工作:1,設(shè)計(jì)并實(shí)現(xiàn)針對(duì)于云服務(wù)的證據(jù)保全系統(tǒng)。證據(jù)保全系統(tǒng)主要實(shí)現(xiàn)以下功能:首先在云服務(wù)商與用戶之間架設(shè)網(wǎng)關(guān)服務(wù)器,根據(jù)云服務(wù)商指定的過(guò)濾條件獲取所有用戶對(duì)于指定的云服務(wù)API的HTTP請(qǐng)求,并提取出用戶特征信息。用戶特征信息主要包括:用戶名稱、用戶發(fā)起請(qǐng)求的時(shí)間、用戶所在地域、用戶請(qǐng)求的云服務(wù)API以及云服務(wù)API的參數(shù)。然后網(wǎng)關(guān)將用戶特征信息導(dǎo)入到數(shù)據(jù)分析系統(tǒng),數(shù)據(jù)分析系統(tǒng)將按照云服務(wù)商指定的數(shù)據(jù)分析條件對(duì)用戶特征信息進(jìn)行分析,分析結(jié)果將以報(bào)表的形式展示給云服務(wù)商,最后數(shù)據(jù)分析系統(tǒng)將根據(jù)云服務(wù)商指定的歸檔條件將用戶特征信息歸檔到存儲(chǔ)系統(tǒng)以永久保存。考慮到云服務(wù)商的用戶基數(shù)巨大,證據(jù)保全系統(tǒng)所要處理的數(shù)據(jù)量預(yù)計(jì)將維持在PB級(jí)別,因此證據(jù)保全系統(tǒng)將采用云計(jì)算平臺(tái)Hadoop作為數(shù)據(jù)分析系統(tǒng)和存儲(chǔ)系統(tǒng)的底層實(shí)現(xiàn)。2,證據(jù)保全系統(tǒng)定期根據(jù)多種歸檔條件將用戶特征信息歸檔存儲(chǔ)在HDFS(Hadoop分布式文件系統(tǒng))中。歸檔操作根據(jù)不同的歸檔條件將用戶特征信息劃分成大量文件,其中既存在大量文件長(zhǎng)度在GB級(jí)的大文件,也存在大量KB級(jí)的小文件。而HDFS是針對(duì)大文件存儲(chǔ)而設(shè)計(jì)的,大量小文件的存儲(chǔ)將導(dǎo)致HDFS集群整體性能降低。因此,本文將通過(guò)仔細(xì)閱讀Hadoop源碼,分析導(dǎo)致HDFS存儲(chǔ)大量小文件后性能降低的原因,并在此基礎(chǔ)上提出HDFS客戶端聚合索引策略,在客戶端對(duì)小文件進(jìn)行聚合并建立索引,以實(shí)現(xiàn)對(duì)HDFS小文件存儲(chǔ)的優(yōu)化。
[Abstract]:With the rapid development of the Internet and mobile Internet, the data has shown exponential growth trend. In the face of the challenges brought by massive data, the concept of cloud computing has been applied to business services and their own cloud services have been introduced by the major Internet companies at home and abroad. Cloud services are services that provide users with Internet-based computing resources and business applications that move the processing of data from personal computers or servers to the Internet's data centers. This reduces user investment in hardware, software, and professional skills. Cloud service has been widely used in various business scenarios, and has developed into a very mature business service model. Based on Hadoop, this paper mainly completes the following work: 1, designs and implements the evidence preservation system for cloud services. The main functions of the evidence preservation system are as follows: first, the gateway server is set up between the cloud service provider and the user, and the HTTP request of all users for the specified cloud service API is obtained according to the filtering condition specified by the cloud service provider. The user characteristic information is extracted. The user characteristic information mainly includes: user name, the time when the user initiated the request, the location of the user, the cloud service API requested by the user and the parameters of the cloud service API. Then the gateway will import the user characteristic information into the data analysis system, and the data analysis system will analyze the user characteristic information according to the data analysis conditions specified by the cloud service provider, and the analysis results will be displayed to the cloud service provider in the form of a report form. Finally, the data analysis system will file the user characteristic information into the storage system according to the archiving condition specified by the cloud service provider for permanent storage. Considering that the cloud service provider has a large user base, the amount of data to be processed by the evidence preservation system is expected to remain at the PB level, Therefore, the cloud computing platform Hadoop will be used as the underlying implementation of the data analysis system and storage system, and the evidence preservation system will store the user feature information in the HDFSU Hadoop distributed file system according to a variety of archiving conditions. According to different archiving conditions, the archiving operation divides the user characteristic information into a large number of files, in which there are not only a large number of large files in GB level, but also a large number of small files of KB level. HDFS is designed for large file storage, and a large number of small files storage will lead to the overall performance of HDFS cluster degradation. Therefore, this article will read the Hadoop source code carefully, analyze the reason that causes HDFS to store a large number of small files, and on this basis put forward HDFS client aggregation index strategy, aggregate and index small files on the client side. To realize the optimization of HDFS small file storage.
【學(xué)位授予單位】:電子科技大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2014
【分類號(hào)】:TP393.09;TP311.13
【參考文獻(xiàn)】
相關(guān)期刊論文 前1條
1 張建勛;古志民;鄭超;;云計(jì)算研究進(jìn)展綜述[J];計(jì)算機(jī)應(yīng)用研究;2010年02期
相關(guān)碩士學(xué)位論文 前1條
1 泰冬雪;基于Hadoop的海量小文件處理方法的研究[D];遼寧大學(xué);2011年
,本文編號(hào):1996350
本文鏈接:http://sikaile.net/guanlilunwen/ydhl/1996350.html
最近更新
教材專著