HBase下的高效時空分類索引
發(fā)布時間:2018-05-21 10:58
本文選題:流數(shù)據(jù) + HBase ; 參考:《小型微型計算機(jī)系統(tǒng)》2017年06期
【摘要】:海量流數(shù)據(jù)具有體量大、更新速度快、多維度、多屬性等特點,其存儲和查詢是近年來學(xué)術(shù)界和工業(yè)界的研究熱點之一.HBase系統(tǒng)為海量流數(shù)據(jù)的存儲管理提供了一套具有高可擴(kuò)展性的技術(shù)方法和系統(tǒng)平臺.然而HBase僅支持主鍵索引,導(dǎo)致非主鍵數(shù)據(jù)查詢效率較低,尤其是對于多維的數(shù)據(jù).針對交通流數(shù)據(jù)場景提出一種具有高插入和查詢效率的索引結(jié)構(gòu)TA-index.TA-index考慮數(shù)據(jù)訪問時的時間和空間局部性,從而更準(zhǔn)確地獲得數(shù)據(jù)的特征,通過對時間和空間的不同分類索引,減少索引的數(shù)據(jù)量,提供實時的數(shù)據(jù)分析能力.實驗表明該算法效率比現(xiàn)有算法更優(yōu),而且具有高可擴(kuò)展性,可以同時支持高吞吐量和高效多維查詢.
[Abstract]:Mass stream data has the characteristics of large volume, fast updating speed, multi-dimensional, multi-attribute, etc. Its storage and query is one of the hot research topics in academia and industry in recent years. HBase system provides a set of technical methods and system platform with high scalability for the storage and management of massive stream data. However, HBase only supports primary key index, which leads to low efficiency of non-primary key data query, especially for multidimensional data. For traffic flow data scene, an index structure, TA-index.TA-index with high insertion and query efficiency, is proposed, which takes into account the temporal and spatial locality of data access, so as to obtain the features of the data more accurately, and through the different classification indexes of time and space. Reduce the amount of data in the index and provide real-time data analysis capabilities. Experimental results show that the proposed algorithm is more efficient and scalable than the existing algorithms, and can support both high throughput and efficient multidimensional queries.
【作者單位】: 南京航空航天大學(xué)計算機(jī)技術(shù)與科學(xué)學(xué)院;
【基金】:國家自然科學(xué)基金項目(61373015)資助
【分類號】:TP311.13;U495
,
本文編號:1918920
本文鏈接:http://sikaile.net/kejilunwen/daoluqiaoliang/1918920.html
教材專著