M2LSH:基于LSH的高維數(shù)據(jù)近似最近鄰查找算法
發(fā)布時間:2018-12-07 19:32
【摘要】:在許多應(yīng)用中,LSH(Locality Sensitive Hashing)以及各種變體,是解決近似最近鄰問題的有效算法之一.雖然這些算法能夠很好地處理分布比較均勻的高維數(shù)據(jù),但從設(shè)計方案來看,都沒有針對數(shù)據(jù)分布不均勻的情況做相應(yīng)的優(yōu)化.針對這一問題,本文提出了一種新的基于LSH的解決方案(M2LSH,2 Layers Merging LSH),對于數(shù)據(jù)分布不均勻的情況依然能得到一個比較好的查詢效果.首先,將數(shù)據(jù)存放到具有計數(shù)功能的組合哈希向量表示的哈希桶中,然后通過二次哈希將這些桶號投影到一維空間,在此空間根據(jù)各個桶中存放的數(shù)據(jù)個數(shù)合并相鄰哈希桶,使得新哈希桶中的數(shù)據(jù)量能夠大致均衡.查詢時僅訪問有限個哈希桶,就能找到較優(yōu)結(jié)果.本文給出了詳細的理論分析,并通過實驗驗證了M2LSH的性能,不僅能減少訪問時間,也可提高結(jié)果的正確率.
[Abstract]:In many applications, LSH (Locality Sensitive Hashing) and various variants are one of the effective algorithms to solve the approximate nearest neighbor problem. Although these algorithms can deal with the high dimensional data with uniform distribution well, from the point of view of the design scheme, there is no corresponding optimization for the non-uniform distribution of the data. In order to solve this problem, a new solution based on LSH (M2LSH2 Layers Merging LSH),) is proposed in this paper. First, the data is stored in the hash bucket represented by the combined hash vector with counting function, and then these bucket numbers are projected to one dimensional space by the quadratic hash, where the adjacent hash buckets are merged according to the number of data stored in each bucket. So that the new hash bucket in the amount of data can be roughly balanced. The query can only access a limited number of hash buckets to find a better result. In this paper, a detailed theoretical analysis is given, and the performance of M2LSH is verified by experiments, which can not only reduce the access time, but also improve the accuracy of the results.
【作者單位】: 寧波大學(xué)信息科學(xué)與工程學(xué)院;
【基金】:國家自然科學(xué)基金(No.61472194,No.61572266) 浙江省自然科學(xué)基金(No.LY13F020040,No.LY16F020003) 寧波市自然科學(xué)基金(No.2014A610023,No.2015A610119)
【分類號】:TP301.6
,
本文編號:2367724
[Abstract]:In many applications, LSH (Locality Sensitive Hashing) and various variants are one of the effective algorithms to solve the approximate nearest neighbor problem. Although these algorithms can deal with the high dimensional data with uniform distribution well, from the point of view of the design scheme, there is no corresponding optimization for the non-uniform distribution of the data. In order to solve this problem, a new solution based on LSH (M2LSH2 Layers Merging LSH),) is proposed in this paper. First, the data is stored in the hash bucket represented by the combined hash vector with counting function, and then these bucket numbers are projected to one dimensional space by the quadratic hash, where the adjacent hash buckets are merged according to the number of data stored in each bucket. So that the new hash bucket in the amount of data can be roughly balanced. The query can only access a limited number of hash buckets to find a better result. In this paper, a detailed theoretical analysis is given, and the performance of M2LSH is verified by experiments, which can not only reduce the access time, but also improve the accuracy of the results.
【作者單位】: 寧波大學(xué)信息科學(xué)與工程學(xué)院;
【基金】:國家自然科學(xué)基金(No.61472194,No.61572266) 浙江省自然科學(xué)基金(No.LY13F020040,No.LY16F020003) 寧波市自然科學(xué)基金(No.2014A610023,No.2015A610119)
【分類號】:TP301.6
,
本文編號:2367724
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2367724.html
最近更新
教材專著