內存索引的壓縮存儲及優(yōu)化研究
發(fā)布時間:2018-07-14 15:22
【摘要】:隨著計算機和數據庫技術的迅猛發(fā)展,人類已進入信息時代,需要存儲的數據大大增長,已遠遠超出單臺服務器的承受范圍。為了滿足數據的檢索需要,,大型索引系統往往建立在分布式系統之上,但在某些需要高響應低延遲與處理靈活性的場景下,分布式系統具有其本質性的困難。因此提升單機的存儲及處理能力,特別是對于高配置服務器來說有不可替代的意義。 針對現代服務器的硬件架構與內存資源的稀缺性,本文提出了一種內存索引數據結構LC-Tree。針對CPU高速緩存、分支預測、多核下內存?zhèn)喂蚕淼扔布卣髡{整優(yōu)化LC-Tree數據結構實現與內存布局。通過構造一個邏輯上的256叉樹作為上層結構,分支節(jié)點結構利用位圖索引、直接索引等方式迅速定位到底層節(jié)點。底層葉子節(jié)點在內存中連續(xù)排放有利于使用數據壓縮算法節(jié)省有限的內存資源。 LC-Tree數據結構在實現上,結合計算機硬件特性與壓縮算法,有效地在壓縮率、解壓時間和動態(tài)性能之間達到平衡,通過實現索引動態(tài)更新來支持數據實時更新。針對無法放入單機的大規(guī)模數據的存儲和檢索,根據業(yè)務場景與分布式系統的設計原則,提出索引存儲的分布式解決方案,以滿足大數據下的數據檢索需求。
[Abstract]:With the rapid development of computer and database technology, mankind has entered the information age, and the data that needs to be stored has greatly increased, far beyond the bearing range of a single server. In order to meet the needs of data retrieval, large index systems are often built on distributed systems, but in some scenarios that require high response and low latency and processing flexibility, distributed systems are inherently difficult. Therefore, improving the storage and processing capability of single machine, especially for high configuration server, has irreplaceable significance. Aiming at the scarcity of memory resources and hardware architecture of modern server, this paper proposes a memory indexed data structure LC-Tree. The implementation of LC-Tree data structure and memory layout are optimized for CPU cache, branch prediction and memory pseudo-sharing under multi-core. By constructing a logical 256 fork tree as the upper structure, the branch node structure uses bitmap index and direct index to locate the underlying node quickly. Continuous discharge of leaf nodes in memory can save limited memory resources by using data compression algorithm. LC-Tree data structure is implemented in combination with computer hardware characteristics and compression algorithm. There is a balance between decompression time and dynamic performance, and real-time updating of data is supported by dynamic update of index. According to the design principle of business scene and distributed system, the distributed solution of index storage is proposed to meet the data retrieval requirements in big data.
【學位授予單位】:武漢理工大學
【學位級別】:碩士
【學位授予年份】:2014
【分類號】:TP311.13;TP333
本文編號:2122076
[Abstract]:With the rapid development of computer and database technology, mankind has entered the information age, and the data that needs to be stored has greatly increased, far beyond the bearing range of a single server. In order to meet the needs of data retrieval, large index systems are often built on distributed systems, but in some scenarios that require high response and low latency and processing flexibility, distributed systems are inherently difficult. Therefore, improving the storage and processing capability of single machine, especially for high configuration server, has irreplaceable significance. Aiming at the scarcity of memory resources and hardware architecture of modern server, this paper proposes a memory indexed data structure LC-Tree. The implementation of LC-Tree data structure and memory layout are optimized for CPU cache, branch prediction and memory pseudo-sharing under multi-core. By constructing a logical 256 fork tree as the upper structure, the branch node structure uses bitmap index and direct index to locate the underlying node quickly. Continuous discharge of leaf nodes in memory can save limited memory resources by using data compression algorithm. LC-Tree data structure is implemented in combination with computer hardware characteristics and compression algorithm. There is a balance between decompression time and dynamic performance, and real-time updating of data is supported by dynamic update of index. According to the design principle of business scene and distributed system, the distributed solution of index storage is proposed to meet the data retrieval requirements in big data.
【學位授予單位】:武漢理工大學
【學位級別】:碩士
【學位授予年份】:2014
【分類號】:TP311.13;TP333
【參考文獻】
相關期刊論文 前10條
1 趙園春;李成名;趙春宇;;基于R樹的分布式并行空間索引機制研究[J];地理與地理信息科學;2007年06期
2 闞君滿;;基于改進哈夫曼編碼的全文索引結構壓縮算法[J];吉林大學學報(信息科學版);2011年05期
3 趙鵬;一種基于壓縮的全文本數據庫倒排索引方法[J];黑龍江大學自然科學學報;2005年03期
4 駱吉洲;李建中;;一種索引結構的壓縮存儲及其查詢處理技術[J];計算機工程與應用;2007年08期
5 何小苑;閔華清;;基于聚類的Hilbert R-樹空間索引算法[J];計算機工程;2009年09期
6 張明波,陸鋒,申排偉,程昌秀;R樹家族的演變和發(fā)展[J];計算機學報;2005年03期
7 王梅;楊思簫;樂嘉錦;;列存儲數據庫中壓縮位圖索引技術[J];計算機工程;2012年18期
8 管建和;甘劍峰;;基于Lucene全文檢索引擎的應用研究與實現[J];計算機工程與設計;2007年02期
9 陳占龍;吳信才;謝忠;吳亮;;分布式空間數據索引機制研究[J];微電子學與計算機;2007年10期
10 鄭麗英;數據結構Trie及其應用[J];現代計算機(專業(yè)版);2004年08期
相關博士學位論文 前1條
1 潘鵬;時空數據庫的索引機制及查詢策略研究[D];華中科技大學;2007年
本文編號:2122076
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/2122076.html