基于稀疏編碼理論的圖像多標(biāo)簽排序算法研究

發(fā)布時間：2018-09-05 18:42

【摘要】：在當(dāng)今高速互聯(lián)網(wǎng)時代,眾多數(shù)字影像設(shè)備的普及,加上互聯(lián)網(wǎng)技術(shù)的進(jìn)步,互聯(lián)網(wǎng)圖像如今在我們的生活中扮演著越來越重要的角色,網(wǎng)絡(luò)圖像搜索也已經(jīng)成為一個計(jì)算機(jī)視覺領(lǐng)域內(nèi)非常活躍和相當(dāng)具有挑戰(zhàn)性的研究課題。需要指出的是,與十年前的情況不同,現(xiàn)在的互聯(lián)網(wǎng)使得數(shù)字圖像可以很容易地創(chuàng)建、上傳、共享和分布在互聯(lián)網(wǎng)上。比如Facebook,YouTube,Flickr等社群媒體允許圖像的上載者提供一組能夠描述該圖像的關(guān)鍵詞(亦稱Social Tags),后系統(tǒng)利用這些關(guān)鍵詞來索引圖像,由于圖像的語義標(biāo)注是通過網(wǎng)絡(luò)由用戶共同協(xié)作來完成的,因此這類圖像集合也被稱為Collaboratively-TaggedImages。這些標(biāo)注信息一方面可以直接作為Web圖像索引,同時也可被用于自動圖像標(biāo)注技術(shù)研究中的訓(xùn)練樣本。由于Flickr等圖像共享網(wǎng)站上存在著大量的帶標(biāo)簽的圖像集合,這種基于社群標(biāo)注(Social Tagging)的共享方式將會極大地改善互聯(lián)網(wǎng)海量圖像組織及檢索的性能,因而如何更加有效地利用這些帶標(biāo)簽的圖像集合成為改善自動圖像標(biāo)注性能的關(guān)鍵問題之一。需要指出的是,用戶通常是按照隨機(jī)的順序(Random Order)來上傳圖像對應(yīng)的標(biāo)簽,即用戶所提交的標(biāo)簽集合往往并不按照標(biāo)簽與圖像內(nèi)容的語義相關(guān)性(Tag Relevance)的大小進(jìn)行排序。另外,用戶標(biāo)注的關(guān)鍵詞標(biāo)簽集合中存在大量的噪聲標(biāo)簽(Noisy Tags),而目前Flickr尚沒有提供基于相關(guān)性(Relevance-based Ranking)的檢索排序機(jī)制。這種標(biāo)簽集合隨機(jī)排序的特性制約了海量圖像檢索性能的更進(jìn)一步的應(yīng)用。目前Flickr圖像共享網(wǎng)站提供了兩種圖像排序方式:1.Most Recent:即按照用戶上傳圖像的時間戳排序;2.Most Interesting:即按照用戶點(diǎn)擊率、評論數(shù)量等排序,但Flickr目前尚不能提供按照語義相關(guān)度檢索的模式。如何依據(jù)關(guān)鍵詞表征圖像的相關(guān)度(Relevance)大小實(shí)現(xiàn)標(biāo)簽排序(Tag Ranking)成為了新的研究熱點(diǎn)。換句話說,盡管基于社群標(biāo)注的共享方式極大地改善了互聯(lián)網(wǎng)海量圖像組織及檢索的性能,然而用戶通常是按照隨機(jī)的順序(Random Order)來上傳圖像對應(yīng)的標(biāo)簽,即用戶所提交的標(biāo)簽集合往往并不按照標(biāo)簽與圖像內(nèi)容的語義相關(guān)性(Tag Relevance)的大小進(jìn)行排序。這種標(biāo)簽集合隨機(jī)排序的特性制約了海量圖像檢索性能的更進(jìn)一步的應(yīng)用,因此標(biāo)簽排序正逐漸成為多媒體研究領(lǐng)域的一個新的熱點(diǎn)。需要指出的是,經(jīng)過語義相關(guān)性排序后的圖像集合,可以作為表征語義關(guān)鍵詞的有效訓(xùn)練樣本,更好地解決區(qū)域?qū)訄D像標(biāo)注中的小樣本學(xué)習(xí)問題。如前所述,社群標(biāo)簽在互聯(lián)網(wǎng)上蔚然成風(fēng),已經(jīng)成為一種捕述、歸類、檢索內(nèi)容的流行的方式,并已經(jīng)在實(shí)際的社群媒體系統(tǒng)的管理和檢索中獲得了成功的應(yīng)用。鑒于社群標(biāo)簽對于網(wǎng)絡(luò)圖像檢索的重大意義,越來越多的研究人員針對社群圖像標(biāo)簽展開研究。盡管用戶為社群圖像提供了標(biāo)簽來描述圖像的內(nèi)容,由于這些標(biāo)簽是由不同的文化背景、知識結(jié)構(gòu)的網(wǎng)絡(luò)用戶按照自己對圖像內(nèi)容的主觀理解手工輸入的,因此社群圖像的標(biāo)簽的質(zhì)量尚不能直接作為可靠的圖像索引關(guān)鍵詞進(jìn)行基于關(guān)鍵詞的圖像檢索。目前社群圖像的標(biāo)簽主要存在著標(biāo)簽排列的無序性、標(biāo)簽內(nèi)容的不精確性等問題,因此帶標(biāo)簽的社群圖像語義理解主要圍繞著改善標(biāo)簽排序和標(biāo)簽內(nèi)容的精準(zhǔn)度等方面展開。目前已有部分研究機(jī)構(gòu)(如MSRA)針對Tag Ranking問題展開研究。由于一幅圖像可能同時標(biāo)注有若干個語義語義概念標(biāo)記,這是一個典型的多標(biāo)記學(xué)習(xí)問題,圖像本身具有一定程度的語義歧義性。而實(shí)現(xiàn)標(biāo)簽集合按照語義相關(guān)度進(jìn)行排序,則抽象為一個典型的多標(biāo)記排序(Multi-Label Ranking)問題。目前針對多標(biāo)記學(xué)習(xí)的研究較多,而針對多標(biāo)記排序問題的研究還相對較少。與多標(biāo)記排序問題相似的研究包括樣本典型性排序(Typicality Ranking)和標(biāo)簽排序(Tag Ranking)。現(xiàn)有的針對標(biāo)簽排序(Tag Ranking)的算法大多著力于基于標(biāo)簽相關(guān)度的標(biāo)簽排序(Relevance-based Tag Ranking)。直觀地,給定一幅圖像和標(biāo)注的標(biāo)簽集合,若標(biāo)簽集合中的某個標(biāo)簽A的相關(guān)度高于標(biāo)簽B,則說明給定圖像表征標(biāo)簽A的典型性高于表征標(biāo)簽B的典型性,即標(biāo)簽A更能表征圖像的語義內(nèi)容。換言之,通過計(jì)算得到的給定圖像的K近鄰圖像子集中,標(biāo)簽A在子集中出現(xiàn)的頻率要更高。這類算法主要有兩種代表性工作。(1)基于統(tǒng)計(jì)模型(Statistical Modeling)的排序算法;(2)基于數(shù)據(jù)驅(qū)動(Data-driven)的算法�；诮y(tǒng)計(jì)模型的排序算法利用核密度估計(jì)思想估計(jì)出圖像中每個標(biāo)簽與圖像本身的語義相關(guān)度,其本質(zhì)就是估計(jì)樣本的典型性(Typicality),如果圖像中表征某個語義標(biāo)簽的區(qū)域的低層視覺特征較為典型,即其與其它標(biāo)有相同標(biāo)簽的區(qū)域的特征向量在特征空間中較為接近,則該標(biāo)簽的語義相關(guān)度就高;同時考慮到標(biāo)簽之間的語義相關(guān)性,采用隨機(jī)游走算法改善排序結(jié)果,實(shí)現(xiàn)最終的標(biāo)簽排序。然而算法使用基于全局低層視覺特征來表征具有多標(biāo)簽語義的圖像,因此無法較好地估計(jì)出每個標(biāo)簽在特征空間的密度值。而基于數(shù)據(jù)驅(qū)動的排序算法通過簡單的圖像全局特征匹配得到給定圖像的近鄰圖像子集,通過近鄰?fù)镀辈呗?Neighbor-voting)統(tǒng)計(jì)其標(biāo)簽序列中每個關(guān)鍵詞的出現(xiàn)頻度,按照頻度的高低實(shí)現(xiàn)對標(biāo)簽序列的排序。與基于統(tǒng)計(jì)模型的排序算法不同,基于數(shù)據(jù)驅(qū)動的排序算法在選擇待排序圖像的近鄰樣本集合時,只單純地使用圖像的視覺特征,而無需考量圖像的標(biāo)簽信息。直觀地,由于算法較為簡單,因此基于近鄰?fù)镀睓C(jī)制的標(biāo)簽排序算法在海量圖像數(shù)據(jù)集上體現(xiàn)出了較好的擴(kuò)展性。但需要指出的是,由于這類算法忽略了標(biāo)簽之間的語義相關(guān)性,因此其排序性能不甚理想;其次,該算法亦采用全局視覺特征來表征圖像,因此在圖像相似性度量上無法取得令人滿意的效果。據(jù)此,本文提出了一種改進(jìn)的圖像多標(biāo)簽排序算法,通過引入信號處理領(lǐng)域的稀疏表示理論,將近鄰圖像檢索問題轉(zhuǎn)化為稀疏重構(gòu)問題,以此來改進(jìn)近鄰圖像集合選取的語義相關(guān)性,進(jìn)而改善圖像多標(biāo)簽排序算法的性能。近年來,壓縮感知(Compressed Sensing)和特征選擇的理論與方法結(jié)合,用來對圖像形成更加有效的稀疏表示(Sparse Representation),成為計(jì)算機(jī)視覺和機(jī)器學(xué)習(xí)領(lǐng)域的研究熱點(diǎn)問題。斯坦福大學(xué)的Tibshirani和加州大學(xué)伯克利分校的Breiman等人幾乎同時提出了對特征選擇稀疏施以(?)1-范數(shù)約束的Lasso思想,以促使被選擇出來的特征盡可能稀疏,提高數(shù)據(jù)處理過程的可解釋性(interpretable)和精度。以Lasso為代表的變量選擇方法已成為統(tǒng)計(jì)學(xué)在分析高維數(shù)據(jù)所采用的主流手段。因此,可在稀疏表示基礎(chǔ)上研究圖像語義理解的理論與方法。本文所提出的基于稀疏表示的圖像多標(biāo)簽排序算法的具體思路如下:首先,該算法本質(zhì)上歸屬于基于語義相關(guān)度排序的圖像多標(biāo)簽排序算法。給定一幅待排序測試圖像,以及海量已標(biāo)注社群圖像集合。我們將這幅待排序測試圖像看成是一個待重構(gòu)的測試樣本,而將海量已標(biāo)注社群圖像集合看成是過完備字典。通過稀疏表示理論,我們可以認(rèn)為,待重構(gòu)的測試樣本可以由該過完備字典中的少數(shù)樣本稀疏重構(gòu)得到,并基于學(xué)習(xí)得出的稀疏系數(shù)向量來表征子彈中的每個已標(biāo)注圖像與測試樣本圖像的語義相似度和相關(guān)性。因此,所學(xué)習(xí)得到的稀疏系數(shù)向量中的每個維度表示了測試樣本圖像與字典中每個已標(biāo)注圖像的語義相關(guān)性。最終基于所學(xué)習(xí)得到的語義相關(guān)性來獲得測試圖像的近鄰圖像子集,并利用近鄰?fù)镀辈呗詠斫y(tǒng)計(jì)標(biāo)簽序列中每個關(guān)鍵詞的出現(xiàn)頻度,按照頻度的高低實(shí)現(xiàn)對標(biāo)簽序列的排序。此外,本文所提的算法同時考慮到標(biāo)記之間的語義相關(guān)性(即共生關(guān)系),采用隨機(jī)游走算法改善排序結(jié)果,實(shí)現(xiàn)最終的標(biāo)記排序。我們利用MATLAB編程語言實(shí)現(xiàn)了本文所提的算法,并且在NUS-WIDE圖像數(shù)據(jù)集上進(jìn)行了實(shí)驗(yàn)驗(yàn)證。通過與經(jīng)典的基于K近鄰的標(biāo)簽排序算法進(jìn)行比較,驗(yàn)證了我們所提出的基于稀疏表示的圖像標(biāo)簽排序算法的有效性。
[Abstract]:In today's high-speed Internet era, the popularity of many digital imaging devices, coupled with the advancement of Internet technology, Internet images are playing an increasingly important role in our lives. Network image search has become a very active and challenging research topic in the field of computer vision. Unlike a decade ago, the Internet now makes it easy to create, upload, share, and distribute digital images on the Internet. Social media, such as Facebook, YouTube, Flickr, allow image uploaders to provide a set of keywords (also known as Social Tags) that describe the image. To index images, the semantic annotation of images is accomplished by the cooperation of users through the network, so this kind of image set is also called Collaborative-Tagged Images. There are a large number of tagged image sets on image sharing websites such as R. This sharing method based on social tagging will greatly improve the performance of mass image organization and retrieval on the Internet. Therefore, how to use these tagged image sets more effectively is the key to improve the performance of automatic image annotation. One of the problems is that users usually upload tags corresponding to images in Random Order, i.e. the set of tags submitted by users is not always sorted according to the size of tag-to-image semantic relevance (Tag Relevance). At present, Flickr does not provide a Relevance-based Ranking-based retrieval sorting mechanism. The random sorting of label sets restricts further application of massive image retrieval performance. Most Interesting: that is, according to the click rate of users, the number of comments and so on, but Flickr can not provide the retrieval mode according to semantic relevance at present. In other words, although community-based annotation sharing greatly improves the performance of mass image organization and retrieval on the Internet, users usually upload tags corresponding to images in random order, i.e. the set of tags submitted by users is not always in accordance with the semantic relevance between tags and image content (Tag R). The random ordering of label sets restricts the further application of massive image retrieval performance. Therefore, label ordering is becoming a new hotspot in multimedia research field. It should be pointed out that the image set ordered by semantic correlation can be used as the key to represent semantics. As mentioned earlier, community labeling has become a popular way to capture, classify, and retrieve content on the Internet, and has been successfully applied in the management and retrieval of real social media systems. Although users provide tags to describe the content of a community image, because these tags are from different cultural backgrounds, network users with knowledge structures have their own subjective understanding of the content of the image. The label quality of community image can not be directly used as a reliable image indexing keyword for keyword-based image retrieval. At present, the label of community image mainly exists the disorder of label arrangement and the imprecision of label content, so the semantic understanding of labeled community image mainly focuses on. Some research institutes (such as MSRA) have studied the Tag Ranking problem. Since an image may be labeled with several semantic conceptual markers at the same time, it is a typical multi-marker learning problem. The image itself has a certain degree of semantic ambiguity. However, the realization of tag set sorting according to semantic relevance is abstracted as a typical multi-label Ranking problem. At present, there are many researches on multi-label learning, while there are relatively few researches on multi-label sorting problem. Most of the existing Tag Ranking algorithms focus on Relevance-based Tag Ranking. Intuitively, given an image and a label set, if the correlation of a label A in the label set is higher than that of label B, it is shown that In other words, the frequency of tag A appearing in the subset of K-nearest neighbor image of a given image is higher than that of tag B. This kind of algorithm mainly has two representative works. (1) Statistical model-based algorithm. Modeling sorting algorithm; (2) Data-driven sorting algorithm. Statistical model-based sorting algorithm uses kernel density estimation to estimate the semantic correlation between each tag in an image and the image itself. Its essence is to estimate the Typicality of the sample, if the image represents the region of a semantic tag. Low-level visual features are more typical, that is, if the feature vectors of the region with the same label are closer in the feature space, the semantic relevance of the label will be high; considering the semantic correlation between the labels, random walk algorithm is used to improve the sorting results and achieve the final label sorting. Because global low-level visual features represent images with multi-label semantics, it is impossible to estimate the density of each label in the feature space. Data-driven sorting algorithm gets a subset of the nearest neighbor images of a given image by simple image global feature matching, and counts them by Neighbor-voting strategy. Different from the sorting algorithm based on statistical model, the data-driven sorting algorithm only uses the visual features of the image when selecting the nearest neighbor sample set of the sorted image without considering the label information of the image. Intuitively, tag sorting algorithm based on nearest neighbor voting mechanism shows good scalability in massive image datasets because of its simplicity. However, it should be pointed out that this kind of algorithm ignores the semantic correlation between tags, so its sorting performance is not very good. Secondly, the algorithm also uses global visual features. In this paper, an improved image multi-label sorting algorithm is proposed. By introducing the sparse representation theory in the field of signal processing, the nearest neighbor image retrieval problem is transformed into a sparse reconstruction problem, which improves the selection of nearest neighbor image sets. In recent years, the combination of Compressed Sensing (CS) and feature selection theory and method to form more effective sparse representations for images has become a hot topic in the field of computer vision and machine learning. Tibshirani and Breiman of the University of California, Berkeley, et al. almost simultaneously proposed the idea of associating feature selection sparsely with (?) 1-norm constraints to make the selected features as sparse as possible and to improve the interpretability and accuracy of the data processing process. For statistical analysis of high-dimensional data, the theory and method of image semantic understanding can be studied on the basis of sparse representation. The idea of image multi-label sorting algorithm based on sparse representation proposed in this paper is as follows: Firstly, the algorithm essentially belongs to image multi-label sorting based on semantic correlation sorting. Given a test image to be sorted and a large set of labeled community images, we consider the test image to be sorted as a test sample to be reconstructed, and the large set of labeled community images as an over-complete dictionary. The sparse reconstruction of a few samples from the over-complete dictionary can be used to characterize the semantic similarity and correlation between each labeled image and the sample image in the bullet based on the sparse coefficient vector obtained from the learning. Therefore, each dimension of the sparse coefficient vector obtained from the learning represents the test sample image and the dictionary. Finally, the nearest neighbor image subset of the test image is obtained based on the acquired semantic correlation, and the frequency of each key word in the tag sequence is counted by the nearest neighbor voting strategy, and the tag sequence is sorted according to the frequency. Considering the semantic correlation between tags (i.e. the symbiosis relationship), the Random Walk algorithm is used to improve the sorting result and achieve the final sorting. We implement the algorithm proposed in this paper by using MATLAB programming language and carry out experimental verification on the NUS-WIDE image data set. Comparison is made to verify the effectiveness of our proposed sparse representation based image label sorting algorithm.
【學(xué)位授予單位】：北京交通大學(xué)
【學(xué)位級別】：碩士
【學(xué)位授予年份】：2017
【分類號】：TP391.41

【相似文獻(xiàn)】

相關(guān)期刊論文前10條

1 安朝輝;錢劍敏;;一種新的排序算法——端點(diǎn)排序算法[J];現(xiàn)代電子技術(shù);2011年24期

2 盧敏;黃亞樓;謝茂強(qiáng);王揚(yáng);劉杰;廖振;;代價敏感的列表排序算法[J];計(jì)算機(jī)研究與發(fā)展;2012年08期

3 張正鈾;;散列排序算法[J];廣西科學(xué)院學(xué)報;1982年01期

4 全惠云;;基于矩陣分裂法的一類異步N&行排序算法[J];計(jì)算技術(shù)與自動化;1991年01期

5 董德林;兩個高效排序算法的APPLESOFT BASIC程序[J];麗水師專學(xué)報;1992年S1期

6 王曉東;最優(yōu)堆排序算法[J];小型微型計(jì)算機(jī)系統(tǒng);2000年05期

7 吳江,張德同;二次分“檔”鏈接排序算法分析[J];計(jì)算機(jī)研究與發(fā)展;2001年08期

8 李德啟,王雄;一種新型快速的排序算法[J];計(jì)算機(jī)工程;2001年03期

9 趙忠孝;一種新的散列排序算法[J];電腦開發(fā)與應(yīng)用;2001年03期

10 許善祥,朱學(xué)東,邵敬春;選擇排序算法的改進(jìn)[J];佳木斯大學(xué)學(xué)報(自然科學(xué)版);2001年04期

相關(guān)會議論文前10條

1 周曉方;金志權(quán);;尋找最佳分布式排序算法[A];第九屆全國數(shù)據(jù)庫學(xué)術(shù)會議論文集(上)[C];1990年

2 張艷秋;李建中;;一種基于蛇型磁帶的排序算法[A];第十八屆全國數(shù)據(jù)庫學(xué)術(shù)會議論文集（研究報告篇）[C];2001年

3 劉春陽;葉君峰;母海龍;陸秋霞;陳滄;高鶯;;一種商品標(biāo)題主題詞的重要性排序算法[A];第五屆全國信息檢索學(xué)術(shù)會議論文集[C];2009年

4 王少帥;湯慶新;姚路;;并行獨(dú)立集排序算法的改進(jìn)與實(shí)現(xiàn)[A];第十六屆全國青年通信學(xué)術(shù)會議論文集（上）[C];2011年

5 于芳;王大玲;于戈;陳冬玲;鮑玉斌;;面向用戶的排序算法研究[A];第二十四屆中國數(shù)據(jù)庫學(xué)術(shù)會議論文集（研究報告篇）[C];2007年

6 閆潑;馬軍;陳竹敏;;面向主題的網(wǎng)頁排序算法研究[A];第三屆全國信息檢索與內(nèi)容安全學(xué)術(shù)會議論文集[C];2007年

7 張健沛;李連江;楊靜;;個性化搜索引擎排序算法的研究與改進(jìn)[A];第三屆全國信息檢索與內(nèi)容安全學(xué)術(shù)會議論文集[C];2007年

8 吳志彬;陳義華;;ANP中超矩陣排序算法研究[A];2006中國控制與決策學(xué)術(shù)年會論文集[C];2006年

9 陳叢叢;石冰;陳健;;面向主題的查詢相關(guān)網(wǎng)頁排序算法[A];第三屆中國智能計(jì)算大會論文集[C];2009年

10 齊曼;張珩;;實(shí)時視覺仿真中幀連貫性應(yīng)用[A];'2000系統(tǒng)仿真技術(shù)及其應(yīng)用學(xué)術(shù)交流會論文集[C];2000年

相關(guān)重要報紙文章前1條

1 廣東黃陀;基本算法簡介（三）[N];電腦報;2001年

相關(guān)博士學(xué)位論文前3條

1 趙立軍;基于歸并的高效排序算法的研究[D];中國科學(xué)院研究生院（計(jì)算技術(shù)研究所）;1998年

2 崔筠;無向基因組的移位排序算法[D];山東大學(xué);2006年

3 郝凡昌;有向基因組復(fù)合操作重組排序算法研究[D];山東大學(xué);2011年

相關(guān)碩士學(xué)位論文前10條

1 徐林龍;基于商品特征屬性的排序算法研究[D];西南交通大學(xué);2015年

2 陳浩;基于圖理論的圖像搜索結(jié)果重排序的研究[D];安徽大學(xué);2016年

3 雙全;基于用戶行為分析的搜索排序算法研究[D];華中科技大學(xué);2014年

4 王麒深;面向網(wǎng)絡(luò)輿情的社會情感排序算法研究[D];中國民航大學(xué);2012年

5 郭佳;一種SDN環(huán)境中的網(wǎng)絡(luò)節(jié)點(diǎn)重要性排序算法[D];西安電子科技大學(xué);2015年

6 馮少泳;兩層哈希的重排序算法[D];華南理工大學(xué);2016年

7 陸沛棟;基于可重構(gòu)SoC平臺的排序算法設(shè)計(jì)和自相關(guān)算法優(yōu)化[D];南京大學(xué);2017年

8 祁洋;RankNet學(xué)習(xí)排序算法的一種改進(jìn)[D];吉林大學(xué);2017年

9 RAPHAEL DE-SOUZA;基于稀疏編碼理論的圖像多標(biāo)簽排序算法研究[D];北京交通大學(xué);2017年

10 王靖;數(shù)據(jù)庫管理系統(tǒng)中高能效排序算法[D];浙江工業(yè)大學(xué);2012年

，

本文編號：2225130

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2225130.html

上一篇：油田水表檢定系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)
下一篇：大數(shù)據(jù)時代廣電網(wǎng)絡(luò)的服務(wù)創(chuàng)新路徑初探

論文發(fā)表

·知網(wǎng)|萬方|維普|龍?jiān)磡省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于稀疏編碼理論的圖像多標(biāo)簽排序算法研究