面向檢索的圖像深度表示和編碼
本文選題:基于內(nèi)容的圖像檢索 + 深度神經(jīng)網(wǎng)絡(luò) ; 參考:《中國(guó)科學(xué)技術(shù)大學(xué)》2017年碩士論文
【摘要】:隨著移動(dòng)互聯(lián)網(wǎng)時(shí)代的到來(lái)和多媒體技術(shù)不斷快速的發(fā)展,互聯(lián)網(wǎng)上擁有著以圖像為代表的海量多媒體數(shù)據(jù),并且每天都在快速增加著。而在互聯(lián)網(wǎng)上對(duì)于這些海量的圖像數(shù)據(jù),通常的做法是將其轉(zhuǎn)化為二進(jìn)制比特流來(lái)進(jìn)行存儲(chǔ)以便節(jié)約存儲(chǔ)空間。而對(duì)于互聯(lián)網(wǎng)上的一個(gè)重要的圖像數(shù)據(jù)使用場(chǎng)景:基于內(nèi)容的圖像檢索(Content BasedImage Retrieval,CBIR),常見(jiàn)的做法也是提取圖像特征向量,將特征向量轉(zhuǎn)化為二進(jìn)制比特流并存儲(chǔ)。這兩部分產(chǎn)生的碼字分別存儲(chǔ),在多媒體數(shù)據(jù)量爆炸的今天,需要消耗大量的系統(tǒng)計(jì)算和存儲(chǔ)資源。圖像壓縮部.分和圖像特征可以使用同一套碼字嗎?如果可以使用相同的碼字來(lái)實(shí)現(xiàn)碼字復(fù)用的話,在海量數(shù)據(jù)的背景下,可以節(jié)約大量的系統(tǒng)計(jì)算和存儲(chǔ)資源的消耗。對(duì)于圖像壓縮而言,存在很多例如標(biāo)準(zhǔn)jPEG等的經(jīng)典圖像壓縮標(biāo)準(zhǔn)。這些壓縮的方式是保留圖像中的主要信息量而丟棄圖像中的那些相對(duì)不重要的信息量,從而實(shí)現(xiàn)圖像壓縮的目的。圖像檢索亦然,提取出的圖像特征向量也是保存著圖像的主要信息量,因此可以用于檢索這一任務(wù)。既然雙方的工作都是提取圖像的主要信息,并將其使用二進(jìn)制比特進(jìn)行存儲(chǔ)。那么存儲(chǔ)兩種信息量之間必然存在一定的信息冗余,如何減少甚至消除兩者之間存在的信息冗余,減少對(duì)于存儲(chǔ)的二進(jìn)制碼字所需要的系統(tǒng)資源消耗,是本研究的目的。衡量信息冗余是否被消除的方式有兩種:一種是在工作性能沒(méi)有降低的情況下,編碼出的碼流大小小于單獨(dú)壓縮碼流和特征碼流之和;另一種則是在編碼碼流大小等于單獨(dú)壓縮和特征碼流之和時(shí),提升了碼流的工作性能。在本研究中我們選擇了第二種衡量方式作為研究目的和實(shí)驗(yàn)方法。為了解決該問(wèn)題,我們提出了一種統(tǒng)一的圖像深度編碼方式。針對(duì)互聯(lián)網(wǎng)上典型的基于內(nèi)容的圖像檢索場(chǎng)景,圖像搜索引擎,壓縮和恢復(fù)的對(duì)象都是小尺度縮略圖。使用深度神經(jīng)網(wǎng)絡(luò)對(duì)輸入的圖像進(jìn)行編碼,使得編碼出的碼字在既可以重建出原縮略圖像的情況下,也能直接的被用于圖像檢索。檢索時(shí)不同圖像之間的相似度使用二進(jìn)制碼字間的漢明距離來(lái)定義。這樣得到的編碼系統(tǒng)實(shí)現(xiàn)了碼字的復(fù)用,從而減少了圖像壓縮與檢索兩者之間的信息冗余。首先我們訓(xùn)練一個(gè)可以用于縮略圖壓縮的卷積神經(jīng)網(wǎng)絡(luò)編碼器,其將縮略圖壓縮為二進(jìn)制比特流,并且可以通過(guò)解碼器將這些二進(jìn)制比特流解壓恢復(fù)出原縮略圖。接著我們會(huì)訓(xùn)練一個(gè)用用用來(lái)提取圖像特征的深度神經(jīng)網(wǎng)絡(luò),并將特征加以量化。量化后的二進(jìn)制特征一方面可以進(jìn)制比特的方式存儲(chǔ),另一方面也可以用于圖像檢索。然后我們將兩部分的網(wǎng)絡(luò)加以結(jié)合,使用基于內(nèi)容的圖像檢索中的三元組圖像數(shù)據(jù)對(duì)網(wǎng)絡(luò)進(jìn)行參數(shù)微調(diào),使得兩部分產(chǎn)生的整體碼字均使用到圖像檢索的工作中。在相關(guān)的實(shí)驗(yàn)測(cè)試中,對(duì)于圖像壓縮部分,我們訓(xùn)練的統(tǒng)一編碼系統(tǒng)可以將32 × 32 × 3的縮略圖壓縮至壓縮比5.3,與標(biāo)準(zhǔn)JPEG相比,在恢復(fù)重建效果相當(dāng)時(shí),壓縮效率高于標(biāo)準(zhǔn)JPEG。,而在基于內(nèi)容的圖像檢索的實(shí)驗(yàn)中,統(tǒng)一的編碼系統(tǒng)得到的碼字檢索效果優(yōu)于僅僅使用圖像特征提取器得到的二進(jìn)制特征向量。這樣在沒(méi)有使用額外碼字空間的情況下,提升了在圖像檢索方面的效果,相對(duì)而言減少了圖像壓縮與檢索兩者之間的信息冗余。我們的工作為圖像同時(shí)壓縮和檢索指明了一個(gè)非常有前景的方向。
[Abstract]:With the advent of the era of mobile Internet and the rapid development of multimedia technology, the Internet has a large number of multimedia data represented by images, which are rapidly increasing every day. On the Internet, the common practice is to convert them into binary bitstreams for storage so that they can be stored in the Internet. Saving storage space. For an important image data use scene on the Internet: Content BasedImage Retrieval (CBIR), the common practice is to extract image feature vectors, transform the feature vectors into binary bitstreams and store them. These two parts are stored respectively in the number of multimedia numbers. A large amount of system computing and storage resources need to be consumed today. Image compression. Can the same code be used in the image compression division. If the same code word can be used to reuse the code word, a large amount of system computing and storage resources can be saved in the background of massive data. In terms of shrinkage, there are many classic image compression standards such as standard jPEG. These compression methods are to retain the main amount of information in the image and discard the relatively unimportant amount of information in the image so as to achieve the purpose of image compression. Image retrieval is also an image feature vector extracted from the image as well as the main letter of the image. It can be used to retrieve this task. Since both sides work to extract the main information of the image and use the binary bits to store it, then there must be some information redundancy between the two kinds of information, how to reduce or even eliminate the redundant information stored between the two, and reduce the binary storage binary. There are two ways to measure whether information redundancy is eliminated. One is that the size of the coded stream is less than the sum of the single compressed and characteristic stream, and the other is that the size of the coded stream is equal to the single compression and the feature stream. In order to improve the performance of the stream, we choose second methods of measurement in this study as research purposes and experimental methods. In order to solve this problem, we propose a unified image depth coding method. The object is the small scale thumbnail. Using the deep neural network to encode the input image, the coded codeword can be used directly to the image retrieval when the original abbreviated image can be rebuilt. The similarity between the different images is defined by the Hamming distance between the two input code words. The coding system realizes the reuse of the code word, thus reducing the information redundancy between the image compression and the retrieval. First, we train a convolutional neural network coder that can be used for the compression of the thumbnail, which compresses the thumbnail to the binary bit stream and can be decompressed and recovered by the decoder by the decoder. The original thumbnail. Then we will train a deep neural network used to extract the features of the image and quantify the features. The quantized binary features can be stored in the form of a bit, on the other hand, and the other can be used for image retrieval. Then we combine the two parts of the network to use a content based graph. Like the three tuple image data in the retrieval, the parameters of the network are adjusted to make the total codewords produced by the two parts are used in the work of image retrieval. In the related experimental tests, the unified coding system we trained can compress the 32 * 32 * 3 contraction to the compression ratio 5.3 for the image compression part, compared with the standard JPEG, When the restoration and reconstruction effect is equal, the compression efficiency is higher than the standard JPEG., and in the content based image retrieval experiment, the codeword retrieval effect of the unified coding system is better than the binary feature vector only obtained by using the image feature extractor. The effect of the cable reduces the information redundancy between image compression and retrieval. Our work indicates a very promising direction for image compression and retrieval.
【學(xué)位授予單位】:中國(guó)科學(xué)技術(shù)大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類(lèi)號(hào)】:TP391.41
【相似文獻(xiàn)】
相關(guān)期刊論文 前10條
1 趙信宇;朱曉蕊;余錦全;;單幅圖像深度信息的提取[J];制造業(yè)自動(dòng)化;2010年03期
2 李樂(lè);張茂軍;熊志輝;徐瑋;;基于內(nèi)容理解的單幅靜態(tài)街景圖像深度估計(jì)[J];機(jī)器人;2011年02期
3 廖均梅;龍建忠;張小琴;;基于頻域特征的圖像深度信息提取方法[J];自動(dòng)化與儀器儀表;2012年06期
4 張蓓蕾;劉洪瑋;;基于馬爾可夫隨機(jī)場(chǎng)的單目圖像深度估計(jì)[J];微型電腦應(yīng)用;2010年11期
5 牛連丁;趙志杰;金雪松;孫華東;王海濤;;基于支持向量機(jī)的圖像深度提取方法[J];哈爾濱商業(yè)大學(xué)學(xué)報(bào)(自然科學(xué)版);2012年05期
6 王平;安平;王奎;張兆楊;;基于區(qū)域融合的單視點(diǎn)圖像深度信息提取[J];電視技術(shù);2011年19期
7 袁紅星;吳少群;朱仁祥;胡勁松;安鵬;;利用深度傳感器大數(shù)據(jù)的單目圖像深度估計(jì)[J];計(jì)算機(jī)輔助設(shè)計(jì)與圖形學(xué)學(xué)報(bào);2013年12期
8 張蓓蕾;孫韶媛;武江偉;谷小婧;;基于DRF-MAP模型的單目圖像深度估計(jì)的改進(jìn)算法[J];紅外技術(shù);2009年12期
9 鄧小玲;倪江群;代芬;李震;;基于LLOM的單目圖像深度圖估計(jì)算法[J];計(jì)算機(jī)應(yīng)用研究;2012年11期
10 李國(guó)平;劉華冠;李長(zhǎng)春;張?zhí)旌?;基于機(jī)器視覺(jué)的物料袋圖像深度信息的提取[J];濟(jì)南大學(xué)學(xué)報(bào)(自然科學(xué)版);2010年02期
相關(guān)碩士學(xué)位論文 前4條
1 Grigorev Aleksei;基于深度學(xué)習(xí)的單目圖像深度估計(jì)[D];哈爾濱工業(yè)大學(xué);2016年
2 張擎宇;面向檢索的圖像深度表示和編碼[D];中國(guó)科學(xué)技術(shù)大學(xué);2017年
3 張蓓蕾;基于馬爾可夫場(chǎng)理論的單目圖像深度估計(jì)研究[D];東華大學(xué);2010年
4 陳婷婷;基于紋理特征概率模型的圖像深度信息提取方法[D];哈爾濱商業(yè)大學(xué);2015年
,本文編號(hào):1800786
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/1800786.html