定向多探頭隨機(jī)超平面局部敏感哈希
發(fā)布時間:2018-02-13 10:45
本文關(guān)鍵詞: 局部敏感哈希 多探頭 余弦相似搜索 出處:《華中師范大學(xué)》2017年碩士論文 論文類型:學(xué)位論文
【摘要】:隨著人類步入大數(shù)據(jù)時代,人們的衣食住行都離不開信息與數(shù)據(jù)。相似性搜索是大數(shù)據(jù)研究的一個重要方向。數(shù)據(jù)的分析與處理往往離不開對高維數(shù)據(jù)的匹配與查找。針對于高維數(shù)據(jù)的大規(guī)模相似性搜索,局部敏感哈希被認(rèn)為是一個行之有效的方法且同時被廣泛的運用到各個領(lǐng)域中。在最近的幾年,許多有關(guān)局部敏感哈希的改進(jìn)算法相繼被提出。在使用局部敏感哈希時,為了提高搜索的質(zhì)量,往往需要消耗大量的空間來存儲哈希表。為了克服這個缺陷,相關(guān)的研究者提出了基于多探頭的局部敏感哈希。此方法在很大程度上提高了哈希表的利用率,從而節(jié)省了存儲空間。對于多探頭局部敏感哈希,現(xiàn)在最常用的兩類探尋方式分別為步進(jìn)式多探頭局部敏感哈希與定向多探頭局部敏感哈希。相比于步進(jìn)式多探頭局部敏感哈希,定向多探頭局部敏感哈希搜索速度更快,所需探頭數(shù)量更少。然而如今定向多探頭探尋方式是基于E2LSH。這意味著它只能適用于歐式距離。對于余弦相似系數(shù),步進(jìn)式多探頭局部敏感哈希是唯一可執(zhí)行的多探頭探尋方式。提出一種基于定向多探頭探尋方式的局部敏感哈希用以彌補余弦相似系數(shù)下定向探尋方式的空白。同時給出一套完整的數(shù)學(xué)理論證明作為算法的理論依據(jù)。為了說明該算法的優(yōu)越性,采用了多個公開數(shù)據(jù)集分別對定向多探頭隨機(jī)超平面局部敏感哈希與步進(jìn)式多探頭隨機(jī)超平面局部敏感哈希進(jìn)行了實驗。根據(jù)實驗結(jié)果,前者相比于后者在達(dá)到相同的召回率時需消耗更少的探頭與搜索時間。算法相比于原始局部敏感哈希有更高的空間利用率。另外定向探尋方式僅僅改變了探尋順序并沒有改變存儲空間中哈希表的數(shù)據(jù)結(jié)構(gòu)。這意味著該算法與現(xiàn)有數(shù)據(jù)結(jié)構(gòu)并未發(fā)生沖突,可以直接被使用于現(xiàn)有的局部敏感哈希算法中。從而在減少了空間消耗的同時也保證了搜索的速度。
[Abstract]:As the human race entered the big data era, Similarity search is an important research direction of big data. Data analysis and processing are often inseparable from the matching and searching of high-dimensional data. Scale similarity search, Locally sensitive hashing is considered to be an effective method and is widely used in various fields. In recent years, many improved local sensitive hashing algorithms have been proposed one after another. In order to improve the quality of the search, it is often necessary to consume a large amount of space to store the hash table. In this paper, the local sensitive hashing based on multi-probe is proposed. This method can greatly improve the utilization rate of hash table and save storage space. The two most commonly used search methods are stepwise multi-probe local sensitive hash and directional multi-probe local sensitive hash respectively. Compared with step multi-probe local sensitive hash, directional multi-probe local sensitive hash search is faster than step multi-probe local sensitive hash. The number of probes required is even smaller. However, today's directional multiprobe search is based on E2LSH. this means that it can only be applied to Euclidean distance. For cosine similarity coefficients, Stepwise multi-probe local sensitive hashing is the only executable multi-probe searching method. A local sensitive hash based on directional multi-probe search is proposed to make up for the blank of orientation search under cosine similarity coefficient. At the same time, a complete set of mathematical theory proof is given as the theoretical basis of the algorithm. Several open datasets are used to test the random hyperplane local sensitive hashes with directional multiprobe and the stepwise multiprobe random hyperplane local sensitive hashes respectively. According to the experimental results, Compared with the latter, the former requires less probe and search time to reach the same recall rate. The algorithm has higher space utilization than the original locally sensitive hash. In addition, the directional search only changes the searching sequence. The order does not change the data structure of the hash table in the storage space. This means that the algorithm does not conflict with the existing data structure. It can be directly used in the existing local sensitive hash algorithms, thus reducing the space consumption and ensuring the speed of search.
【學(xué)位授予單位】:華中師范大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:TP391.3;TP311.13
【參考文獻(xiàn)】
相關(guān)期刊論文 前7條
1 鐘川;陳軍;;基于精確歐氏局部敏感哈希的改進(jìn)協(xié)同過濾推薦算法[J];計算機(jī)工程;2017年02期
2 原默晗;唐晉韜;王挺;;一種高效的分布式相似短文本聚類算法[J];計算機(jī)與數(shù)字工程;2016年05期
3 王忠偉;陳葉芳;錢江波;陳華輝;;基于LSH的高維大數(shù)據(jù)k近鄰搜索算法[J];電子學(xué)報;2016年04期
4 鄭奇斌;刁興春;曹建軍;周星;許永平;;結(jié)合局部敏感哈希的k近鄰數(shù)據(jù)填補算法[J];計算機(jī)應(yīng)用;2016年02期
5 韓軍輝;盧暾;丁向華;顧寧;;基于局部敏感哈希技術(shù)的能耗社區(qū)實時推薦系統(tǒng)[J];小型微型計算機(jī)系統(tǒng);2015年11期
6 戴上平;馮鵬;劉盛英杰;舒紅;;基于余弦距離的局部敏感哈希的KNN算法在中文文本上的快速分類[J];計算機(jī)工程與科學(xué);2015年10期
7 曹玉東;劉艷洋;賈旭;王冬霞;;基于改進(jìn)的局部敏感哈希算法實現(xiàn)圖像型垃圾郵件過濾[J];計算機(jī)應(yīng)用研究;2016年06期
,本文編號:1507979
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/1507979.html
最近更新
教材專著