一種分層次數(shù)據(jù)去冗技術(shù)研究
本文選題:去冗系統(tǒng) 切入點(diǎn):分層次架構(gòu) 出處:《電子科技大學(xué)》2013年碩士論文
【摘要】:隨著企業(yè)和個(gè)人用戶數(shù)據(jù)迅速增長(zhǎng),對(duì)數(shù)據(jù)中心的存儲(chǔ)能力要求越來(lái)越高。統(tǒng)計(jì)顯示在這些海量數(shù)據(jù)中,有相當(dāng)?shù)囊徊糠质侨哂鄶?shù)據(jù),如何檢測(cè)并刪除這些冗余數(shù)據(jù),提高數(shù)據(jù)中心存儲(chǔ)性能已經(jīng)變得越發(fā)迫切,也非常具有實(shí)用價(jià)值。 本文一開始介紹了去冗的一些背景知識(shí),分析了各大主要廠商去冗產(chǎn)品,介紹了相關(guān)的技術(shù),在此基礎(chǔ)上完成了以下工作: 首先設(shè)計(jì)了一種分層次的去冗余架構(gòu),采用控制服務(wù)器和信息服務(wù)器分離的方法,使其分別用于事務(wù)處理和文件元數(shù)據(jù)存放。在信息服務(wù)器中,數(shù)據(jù)分層存放:文件指紋信息常駐內(nèi)存,分塊數(shù)據(jù)的元數(shù)據(jù)置于固態(tài)硬盤或者磁盤,真實(shí)文件數(shù)據(jù)存放于廉價(jià)的存儲(chǔ)設(shè)備,從而合理利用內(nèi)存和磁盤空間,提高效率。 其次在預(yù)處理模塊中,把數(shù)據(jù)進(jìn)行分類處理,提出一種基于字節(jié)的最大遞增序列分塊算法,,即BFMIS算法,有效解決不定長(zhǎng)分塊中的硬分塊問(wèn)題。針對(duì)去冗系統(tǒng)中關(guān)鍵的數(shù)據(jù)碰撞難題,對(duì)經(jīng)典的SHA-1算法進(jìn)行優(yōu)化,改進(jìn)SHA-1算法中的步函數(shù),增強(qiáng)消息修改的擴(kuò)展程度,并增加消息摘要的長(zhǎng)度,提高SHA-1算法的抗碰撞性,降低去冗系統(tǒng)的誤刪率。提出多維Bloom Filter算法,對(duì)普通BloomFilter算法進(jìn)行位數(shù)組擴(kuò)展,降低其誤判率,解決海量數(shù)據(jù)冗余檢測(cè)問(wèn)題,并增強(qiáng)Bloom Filter算法在分布式環(huán)境下的動(dòng)態(tài)伸縮性,提高整個(gè)去冗系統(tǒng)的擴(kuò)展性。 論文闡述RFID網(wǎng)絡(luò)中標(biāo)簽數(shù)據(jù)冗余問(wèn)題以及CLIF,INPFM去冗機(jī)制,并把分層次去冗框架應(yīng)用于RFID網(wǎng)絡(luò)中,把RFID標(biāo)簽數(shù)據(jù)作為經(jīng)過(guò)預(yù)處理后的元數(shù)據(jù)信息,進(jìn)行分層組織和去冗。 最后進(jìn)行了實(shí)驗(yàn)測(cè)試。結(jié)果表明,優(yōu)化后的SHA-1算法有效的提高了整體抗碰撞性;多維Bloom Filter算法有效降低了誤判率,提升了動(dòng)態(tài)伸縮性;多層次RFID去冗算法在時(shí)間效率和去冗率方面都優(yōu)于已有的算法,但存在一定數(shù)量的誤判;系統(tǒng)整體的吞吐量和去冗率都達(dá)到了預(yù)期的目標(biāo)。
[Abstract]:With the rapid growth of enterprise and personal user data, the storage capacity of data centers is becoming more and more demanding. Statistics show that a considerable part of these massive data is redundant data, how to detect and delete these redundant data, Improving the storage performance of data centers has become increasingly urgent and of great practical value. At the beginning of this paper, we introduce some background knowledge of de-redundancy, analyze the main manufacturers' deredundant products, and introduce the related technologies. On this basis, we have completed the following work:. Firstly, a hierarchical deredundancy architecture is designed, which is used to separate the control server from the information server, which is used for transaction processing and file metadata storage, respectively. Data hierarchical storage: file fingerprint information resident memory, block data on solid state hard disk or disk, real file data stored in cheap storage device, so that reasonable use of memory and disk space, improve efficiency. Secondly, in the preprocessing module, the data is classified and processed, and a block algorithm of the largest increment sequence based on bytes, that is, the BFMIS algorithm, is proposed. Aiming at the key data collision problem in the deredundant system, the classical SHA-1 algorithm is optimized, the step function in the SHA-1 algorithm is improved, and the extension of message modification is enhanced. It also increases the length of message digest, improves the anti-collision performance of SHA-1 algorithm, and reduces the error-deletion rate of de-redundancy system. A multi-dimensional Bloom Filter algorithm is proposed to extend the bit-array of common BloomFilter algorithm to reduce its error rate, and to solve the problem of redundant detection of mass data. The dynamic scalability of Bloom Filter algorithm in distributed environment is enhanced, and the extensibility of the whole deredundant system is improved. In this paper, the problem of label data redundancy in RFID network and the delamination mechanism of CLIF-INPFM are described. The hierarchical delamination framework is applied to RFID network, and the RFID tag data is used as the metadata information after preprocessing to organize and deredundancy. Finally, the experimental results show that the optimized SHA-1 algorithm can effectively improve the overall anti-collision performance, and the multidimensional Bloom Filter algorithm can effectively reduce the misjudgment rate and improve the dynamic scalability. The multilevel RFID de-redundancy algorithm is superior to the existing algorithms in terms of time efficiency and de-redundancy rate, but there is a certain number of misjudgment, and the overall throughput and de-redundancy rate of the system have achieved the expected goal.
【學(xué)位授予單位】:電子科技大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2013
【分類號(hào)】:TP333
【參考文獻(xiàn)】
相關(guān)期刊論文 前6條
1 肖明忠,代亞非,李曉明;拆分型Bloom Filter[J];電子學(xué)報(bào);2004年02期
2 蔣邵崗;譚杰;;RFID中間件數(shù)據(jù)處理與過(guò)濾方法的研究[J];計(jì)算機(jī)應(yīng)用;2008年10期
3 王燦;秦志光;王娟;蔡博;;基于文件相似性分簇的重復(fù)數(shù)據(jù)消除模型[J];計(jì)算機(jī)應(yīng)用研究;2012年05期
4 敖莉;舒繼武;李明強(qiáng);;重復(fù)數(shù)據(jù)刪除技術(shù)[J];軟件學(xué)報(bào);2010年05期
5 吳永祥;;射頻識(shí)別(RFID)技術(shù)研究現(xiàn)狀及發(fā)展展望[J];微計(jì)算機(jī)信息;2006年32期
6 王文闖;郭鳳宇;;基于動(dòng)態(tài)時(shí)間窗的射頻識(shí)別中間件數(shù)據(jù)過(guò)濾算法[J];信息與電子工程;2009年03期
相關(guān)重要報(bào)紙文章 前1條
1 楊洋;[N];網(wǎng)絡(luò)世界;2009年
相關(guān)碩士學(xué)位論文 前2條
1 高夢(mèng)穎;存儲(chǔ)系統(tǒng)中多維元數(shù)據(jù)索引的高效更新方法研究[D];華中科技大學(xué);2011年
2 王錦;RSA加密算法的研究[D];沈陽(yáng)工業(yè)大學(xué);2006年
本文編號(hào):1662156
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/1662156.html