基于堆疊乘積量化的最近鄰檢索

發(fā)布時(shí)間：2018-07-24 22:03

【摘要】：近年來,大數(shù)據(jù)、人工智能和物聯(lián)網(wǎng)技術(shù)得到飛速的發(fā)展,圖像、視頻等高維數(shù)據(jù)正呈現(xiàn)爆炸性增長(zhǎng)。在這些海量的高維數(shù)據(jù)中查找目標(biāo)數(shù)據(jù)也隨之變得耗時(shí)和低效。為了解決上述問題,近似最近鄰的概念及各種算法被陸續(xù)提出,并成為圖像檢索、機(jī)器學(xué)習(xí)、數(shù)據(jù)挖掘等多種應(yīng)用中的一類基本算法。其中的乘積量化算法(PQ)具備內(nèi)存消耗低,查詢效率高等優(yōu)點(diǎn),被證明是解決高維空間近似最近鄰查找的最有效算法之一�；诮�(jīng)典乘積量化算法的不足,近年來不少學(xué)者對(duì)乘積量化算法進(jìn)行了優(yōu)化改進(jìn),由于乘積量化算法中存在向量子空間中特征點(diǎn)分布不均勻的問題,優(yōu)化的乘積量化算法(OPQ)被提出來,用于優(yōu)化空間特征點(diǎn)的重新分配。由于對(duì)特征向量進(jìn)行簡(jiǎn)單劃分,會(huì)導(dǎo)致子向量間的相互獨(dú)立,有學(xué)者提出了加法量化算法(AQ)以解決這個(gè)問題。為解決特征向量在進(jìn)行量化時(shí),存在量化誤差較大的問題,堆疊量化算法(SQ)通過迭代地對(duì)誤差進(jìn)行量化,以進(jìn)一步降低量化誤差。本文中,我們提出了一種新的量化算法來做近似最近鄰查找:堆疊乘積量化算法。這種量化算法融合了堆疊量化算法的量化誤差低和乘積量化算法內(nèi)存消耗低的優(yōu)點(diǎn)。該算法的核心思想為:第一步將高維的特征向量劃分成維度相同的低維子特征向量,在每個(gè)子向量空間中進(jìn)行k-means聚類量化;第二步將特征向量與量化后對(duì)應(yīng)的編碼單詞做差得到對(duì)應(yīng)的誤差向量;第三步把誤差向量看成特征向量,以此進(jìn)行劃分子向量、子向量分別量化、求誤差向量的操作;迭代第三步直到達(dá)到終止條件,從而產(chǎn)生一組從粗糙到精細(xì)的分層子碼本。對(duì)分層子碼本進(jìn)行笛卡兒乘積得到整體向量的碼本,進(jìn)而可以得到原始向量的編碼表示。本文在經(jīng)典的SIFT1M、GIST1M和ConvNet1M-128數(shù)據(jù)集進(jìn)行實(shí)驗(yàn)設(shè)計(jì)及驗(yàn)證。大量的實(shí)驗(yàn)表明,在不少算法性能指標(biāo)上,比如量化誤差、召回率、可擴(kuò)展性等,本文算法與經(jīng)典的量化方法相比都有較大優(yōu)勢(shì)。而且,對(duì)比乘積量化方法,我們的方法能夠產(chǎn)生精確度更高的子碼本;對(duì)比堆疊量化算法,我們的方法消耗內(nèi)存少且具備較好的擴(kuò)展性。
[Abstract]:In recent years, big data, artificial intelligence and Internet of things technology have been rapid development, image, video and other high-dimensional data are explosive growth. Finding target data in these massive high-dimensional data becomes time consuming and inefficient. In order to solve the above problems, the concept of approximate nearest neighbor and various algorithms have been proposed one after another, and become a kind of basic algorithms in image retrieval, machine learning, data mining and other applications. The product quantization algorithm (PQ) has the advantages of low memory consumption and high query efficiency. It has been proved to be one of the most effective algorithms for approximate nearest neighbor search in high dimensional space. Due to the shortcomings of the classical product quantization algorithm, many scholars have improved the product quantization algorithm in recent years, because of the problem of uneven distribution of the feature points in the product quantization algorithm. (OPQ), an optimized product quantization algorithm, is proposed to optimize the redistribution of spatial feature points. Because the simple partition of the feature vectors leads to the independence of the subvectors, some scholars have proposed the additive quantization algorithm (AQ) to solve this problem. In order to solve the problem of large quantization error in the quantization of eigenvector, the stacked quantization algorithm (SQ) quantizes the error iteratively to further reduce the quantization error. In this paper, we propose a new quantization algorithm to do approximate nearest neighbor lookup: stack product quantization algorithm. This quantization algorithm combines the advantages of low quantization error of stack quantization algorithm and low memory consumption of product quantization algorithm. The key idea of the algorithm is as follows: firstly, the high-dimensional feature vectors are divided into low-dimensional subvectors with the same dimension, and the k-means clustering quantization is carried out in each subvector space. In the second step, the feature vector and the corresponding coded word after quantization are devided to the corresponding error vector, the third step regards the error vector as the feature vector, and then divides the subvector, quantizes the subvector, and calculates the operation of error vector. Iterate the third step until the termination condition is reached to produce a set of hierarchical subcodebooks from rough to fine. The codebook of the global vector is obtained by Cartesian product of the layered subcodebook, and the coding representation of the original vector can be obtained. In this paper, the classical SIFT1M GIST1M and ConvNet1M-128 data sets are designed and verified. A large number of experiments show that this algorithm has more advantages than classical quantization methods in many performance indexes such as quantization error recall rate and scalability. In addition, our method can produce more accurate subcodebook by contrast product quantization. Compared with stack quantization algorithm, our method consumes less memory and has better expansibility.
【學(xué)位授予單位】：大連海事大學(xué)
【學(xué)位級(jí)別】：碩士
【學(xué)位授予年份】：2017
【分類號(hào)】：TP391.41

【相似文獻(xiàn)】

相關(guān)期刊論文前10條

1 張桂榕;;反向最近鄰查詢研究綜述[J];電腦知識(shí)與技術(shù);2011年28期

2 周屹;;不確定對(duì)象的反向最近鄰查詢研究[J];黑龍江工程學(xué)院學(xué)報(bào)(自然科學(xué)版);2012年04期

3 劉永山,薄樹奎,張強(qiáng),郝忠孝;多對(duì)象的最近鄰查詢[J];計(jì)算機(jī)工程;2004年11期

4 郝忠孝;劉永山;;空間對(duì)象的反最近鄰查詢[J];計(jì)算機(jī)科學(xué);2005年11期

5 王淼;郝忠孝;;不確定性對(duì)象的反向最近鄰查詢[J];計(jì)算機(jī)工程;2010年10期

6 張旭;何向南;金澈清;周傲英;;面向不確定圖的k最近鄰查詢[J];計(jì)算機(jī)研究與發(fā)展;2011年10期

7 楊澤雪;郝忠孝;;空間數(shù)據(jù)庫(kù)中的障礙反向最近鄰查詢[J];計(jì)算機(jī)工程與應(yīng)用;2011年34期

8 王丹丹;郝忠孝;;道路網(wǎng)絡(luò)中的多類型K最近鄰查詢[J];計(jì)算機(jī)工程與應(yīng)用;2012年03期

9 鄧瑾;周梅;;基于R樹及其變種的最近鄰查詢研究[J];現(xiàn)代計(jì)算機(jī);2013年09期

10 朱婧;;平面中點(diǎn)對(duì)一般多邊形的最近鄰查詢研究[J];科技通報(bào);2014年01期

相關(guān)會(huì)議論文前10條

1 張曉峰;王麗珍;肖清;趙麗紅;;基于概念劃分的連續(xù)最近鄰查詢研究[A];NDBC2010第27屆中國(guó)數(shù)據(jù)庫(kù)學(xué)術(shù)會(huì)議論文集(B輯)[C];2010年

2 管猛;張剡;柏文陽;;基于地表的連續(xù)可見最近鄰查詢方法[A];NDBC2010第27屆中國(guó)數(shù)據(jù)庫(kù)學(xué)術(shù)會(huì)議論文集(B輯)[C];2010年

3 陳璐;高云君;柳晴;陳剛;;受限相互最近鄰查詢處理[A];第29屆中國(guó)數(shù)據(jù)庫(kù)學(xué)術(shù)會(huì)議論文集（B輯）（NDBC2012）[C];2012年

4 盛梅紅;沙朝鋒;宮學(xué)慶;嵇曉;周傲英;;道路網(wǎng)絡(luò)環(huán)境中的多對(duì)象最近鄰查詢[A];第二十三屆中國(guó)數(shù)據(jù)庫(kù)學(xué)術(shù)會(huì)議論文集（研究報(bào)告篇）[C];2006年

5 劉月清;章勇;;一種改進(jìn)的動(dòng)態(tài)最近鄰聚類算法[A];全國(guó)自動(dòng)化新技術(shù)學(xué)術(shù)交流會(huì)會(huì)議論文集（一）[C];2005年

6 李傳文;谷峪;李芳芳;于戈;;一種障礙空間中不確定對(duì)象的連續(xù)最近鄰查詢方法[A];NDBC2010第27屆中國(guó)數(shù)據(jù)庫(kù)學(xué)術(shù)會(huì)議論文集A輯一[C];2010年

7 劉星毅;;基于歐式距離的最近鄰改進(jìn)算法[A];廣西計(jì)算機(jī)學(xué)會(huì)2010年學(xué)術(shù)年會(huì)論文集[C];2010年

8 劉先康;梁菁;任杰;蔣光慶;;修正最近鄰模糊分類算法在艦船目標(biāo)識(shí)別中的應(yīng)用[A];全國(guó)第4屆信號(hào)和智能信息處理與應(yīng)用學(xué)術(shù)會(huì)議論文集[C];2010年

9 劉俊嶺;孫煥良;;多維度量空間中發(fā)現(xiàn)相互kNN(英文)[A];NDBC2010第27屆中國(guó)數(shù)據(jù)庫(kù)學(xué)術(shù)會(huì)議論文集A輯二[C];2010年

10 余小高;;P2P環(huán)境中k最近鄰搜索算法研究[A];2009年全國(guó)開放式分布與并行計(jì)算機(jī)學(xué)術(shù)會(huì)議論文集(下冊(cè))[C];2009年

相關(guān)博士學(xué)位論文前8條

1 魏本昌;基于內(nèi)容的大規(guī)模圖像檢索技術(shù)研究[D];華中科技大學(xué);2015年

2 楊澤雪;空間連接及最近鄰變體查詢研究[D];哈爾濱理工大學(xué);2014年

3 孫冬璞;時(shí)空數(shù)據(jù)庫(kù)多類型最近鄰查詢的研究[D];哈爾濱理工大學(xué);2010年

4 王建峰;基于哈希的最近鄰查找[D];中國(guó)科學(xué)技術(shù)大學(xué);2015年

5 張得天;時(shí)間依賴路網(wǎng)高效k最近鄰查詢混搭機(jī)制的研究[D];中國(guó)科學(xué)技術(shù)大學(xué);2014年

6 杜欽生;高維空間的K最近鄰查詢及連接問題研究[D];吉林大學(xué);2015年

7 張軍旗;支持最近鄰查找的高維空間索引[D];復(fù)旦大學(xué);2007年

8 李艷紅;路網(wǎng)中移動(dòng)對(duì)象最近鄰及反向最近鄰查詢處理研究[D];華中科技大學(xué);2011年

相關(guān)碩士學(xué)位論文前10條

1 楊根茂;基于哈希加速的近似最近鄰檢索算法研究[D];浙江大學(xué);2015年

2 原s，

本文編號(hào)：2142792

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會(huì)員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2142792.html

上一篇：就業(yè)導(dǎo)向的高校高層次軟件服務(wù)外包人才培養(yǎng)機(jī)制建設(shè)
下一篇：基于聚類和決策樹的鏈路預(yù)測(cè)方法

論文發(fā)表

·知網(wǎng)|萬方|維普|龍?jiān)磡省級(jí)|國(guó)家級(jí)|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于堆疊乘積量化的最近鄰檢索