基于云存儲的數(shù)據(jù)流處理技術(shù)的研究
本文關(guān)鍵詞:基于云存儲的數(shù)據(jù)流處理技術(shù)的研究 出處:《武漢理工大學》2013年碩士論文 論文類型:學位論文
更多相關(guān)文章: 云存儲 數(shù)據(jù)流處理 HDFS Map/Reduce
【摘要】:自2006年Google提出云計算概念以來,云計算從備受業(yè)界質(zhì)疑的概念炒作成為如今越來越成熟的技術(shù)服務(wù)形態(tài)。在云計算提供的眾多服務(wù)類型里,存儲服務(wù)成為我們最為直接使用的一種云計算服務(wù),并發(fā)展成為業(yè)界獨立研究的領(lǐng)域,目前眾多的IT巨頭都在云存儲領(lǐng)域進行布局。云存儲是為大數(shù)據(jù)時代而生的,如何更加高效、快速、安全的進行海量數(shù)據(jù)的存儲、管理和處理,仍然是吸引眾多IT人士不斷研究的課題。如今,在云存儲的后端,Hadoop作為最適合大數(shù)據(jù)處理的開源技術(shù),正被廣泛的研究和使用。但由于Hadoop的興起時間不長,其還存在著一些設(shè)計缺陷,并且由于眾多云存儲服務(wù)提供商會根據(jù)服務(wù)的類別和數(shù)據(jù)中心的實際情況,需要對Hadoop進行改進,以便提供更好的服務(wù)。 本文的研究內(nèi)容包括云存儲中數(shù)據(jù)流處理關(guān)鍵技術(shù)Hadoop,Hadoop是把數(shù)據(jù)以流的方式來進行處理的。通過對Hadoop平臺中核心組件HDFS分布式文件系統(tǒng)的框架和執(zhí)行流程的研究,針對其單一主控節(jié)點NameNode的設(shè)計缺陷,提出了一種主控節(jié)點壓力分解的方法,在可接受的性能損失范圍內(nèi),減輕了HDFS架構(gòu)中單一主控節(jié)點的訪問壓力,對系統(tǒng)架構(gòu)做出了一定的修改,使其在總體上能夠承擔更多的訪問請求,并且降低了單一節(jié)點在過量負載時造成的不穩(wěn)定性甚至是崩潰的風險,使系統(tǒng)的健壯性進一步提高。另外,本文對HDFS的元數(shù)據(jù)信息進行二次備份設(shè)計,進一步提高了系統(tǒng)的可靠性。 本文還對Hadoop的另一核心組件Map/Reduce的數(shù)據(jù)流處理機制進行研究,針對其過于消耗資源的缺點,提出了一種在特定情況下能夠有效的降低Map/Reduce的資源消耗的優(yōu)化方式。本文針對這些特殊情況下,對元數(shù)據(jù)的數(shù)據(jù)結(jié)構(gòu)進行改進,使Map/Reduce在進行數(shù)據(jù)流處理之前能夠先從HDFS獲得元數(shù)據(jù)信息,進行數(shù)據(jù)塊的精準定位,過濾掉不必要的數(shù)據(jù)處理,使HDFS對Map/Reduce的支持進一步提高,能夠有效的降低數(shù)據(jù)處理過程中的資源消耗,避免了資源的浪費。 在本文的最后,通過多次實驗,使優(yōu)化后的系統(tǒng)與原始架構(gòu)中數(shù)據(jù)處理情況進行比較。實驗數(shù)據(jù)表明,改進后的系統(tǒng)在平衡資源消耗以及負載壓力情況下,能夠達到我們預(yù)期的結(jié)果。 本文得到國家自然科學基金項目(批準號:60970064)的資助。
[Abstract]:Since Google put forward the concept of cloud computing in 2006, cloud computing has become a more and more mature technology service form from the concept that has been questioned by the industry. Storage service has become the most direct use of cloud computing services, and has developed into an independent field of research in the industry. At present, a large number of IT giants are in the cloud storage field layout. Cloud storage is for the era of big data, how to more efficient, fast, safe storage, management and processing of massive data. Today, Hadoop on the back end of cloud storage is best suited to big data's open source technology. Is being widely studied and used, but because the rise of Hadoop time is not long, it also has some design defects. And because many cloud storage service providers need to improve Hadoop to provide better service according to the type of service and the actual situation of data center. The research content of this paper includes Hadoop, the key technology of data stream processing in cloud storage. Hadoop deals with the data in the way of stream. Through the research of the framework and execution flow of HDFS distributed file system, the core component of Hadoop platform. Aiming at the design defect of NameNode, a method of pressure decomposition is proposed, which is in the range of acceptable performance loss. It reduces the access pressure of the single master node in the HDFS architecture and makes some modifications to the system architecture so that it can take on more access requests on the whole. It also reduces the risk of instability or even crash caused by a single node in excess load, and further improves the robustness of the system. In addition, this paper designs the secondary backup of HDFS metadata information. The reliability of the system is further improved. This paper also studies the data flow processing mechanism of Map/Reduce, another core component of Hadoop, aiming at its shortcomings of consuming too much resources. This paper proposes an optimization method that can effectively reduce the resource consumption of Map/Reduce under certain circumstances. This paper improves the data structure of metadata under these special circumstances. The Map/Reduce can obtain metadata information from HDFS before processing data flow, locate the data block accurately and filter out unnecessary data processing. The support of HDFS to Map/Reduce can be further improved, which can effectively reduce the resource consumption in the process of data processing and avoid the waste of resources. At the end of this paper, through many experiments, the optimized system is compared with the data processing in the original architecture. The experimental data show that the improved system is balanced in the case of resource consumption and load pressure. Be able to achieve the desired results. This paper is supported by the National Natural Science Foundation of China (Grant No.: 60970064).
【學位授予單位】:武漢理工大學
【學位級別】:碩士
【學位授予年份】:2013
【分類號】:TP333
【參考文獻】
相關(guān)期刊論文 前10條
1 蒙安泰;;分布式文件系統(tǒng)中元數(shù)據(jù)管理機制的研究[J];電腦知識與技術(shù);2011年35期
2 錢宏蕊;;云存儲技術(shù)發(fā)展及應(yīng)用[J];電信工程技術(shù)與標準化;2012年04期
3 吳海佳;陳衛(wèi)衛(wèi);劉鵬;董繼光;;云存儲系統(tǒng)中基于更新日志的元數(shù)據(jù)緩存同步策略[J];電信科學;2011年09期
4 劉正偉;文中領(lǐng);張海濤;;云計算和云數(shù)據(jù)管理技術(shù)[J];計算機研究與發(fā)展;2012年S1期
5 任宇寧;;云計算時代的存儲技術(shù)——云存儲[J];科技傳播;2012年03期
6 冀素琴;石洪波;衛(wèi)潔;;基于Map Reduce的Bagging貝葉斯文本分類[J];計算機工程;2012年16期
7 鄧鵬;李枚毅;何誠;;Namenode單點故障解決方案研究[J];計算機工程;2012年21期
8 傅穎勛;羅圣美;舒繼武;;安全云存儲系統(tǒng)與關(guān)鍵技術(shù)綜述[J];計算機研究與發(fā)展;2013年01期
9 方少卿;周劍;張明新;;基于Map/Reduce的改進選擇算法在云計算的Web數(shù)據(jù)挖掘中的研究[J];計算機應(yīng)用研究;2013年02期
10 徐小龍;周靜嵐;楊庚;;一種基于數(shù)據(jù)分割與分級的云存儲數(shù)據(jù)隱私保護機制[J];計算機科學;2013年02期
相關(guān)碩士學位論文 前2條
1 葉雄杰;基于云存儲的移動視頻監(jiān)控系統(tǒng)研究[D];廣東工業(yè)大學;2011年
2 李寬;基于HDFS的分布式Namenode節(jié)點模型的研究[D];華南理工大學;2011年
,本文編號:1374367
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/1374367.html