天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁(yè) > 科技論文 > 軟件論文 >

大規(guī)模多模態(tài)多標(biāo)簽數(shù)據(jù)哈希方法研究

發(fā)布時(shí)間:2018-06-08 18:24

  本文選題:哈希 + 多模態(tài) ; 參考:《山東大學(xué)》2017年碩士論文


【摘要】:近年來(lái),隨著我國(guó)和世界移動(dòng)互聯(lián)網(wǎng)技術(shù)和移動(dòng)設(shè)備的不斷加速發(fā)展,數(shù)據(jù)的規(guī)模越來(lái)越龐大,數(shù)據(jù)的存儲(chǔ)方式和種類也越來(lái)越多種多樣。多模態(tài)多標(biāo)簽數(shù)據(jù)的處理在生活中變得十分重要,例如網(wǎng)頁(yè)、新聞等都常表示為文字、圖片和視頻的組合,每個(gè)網(wǎng)頁(yè)也常具有數(shù)個(gè)關(guān)鍵字作為標(biāo)簽。對(duì)于如何在多模態(tài)數(shù)據(jù)中搜索到合適的內(nèi)容,應(yīng)此而生的跨模態(tài)檢索已經(jīng)成為一個(gè)緊要的問(wèn)題。哈希方法由于其在存儲(chǔ)性能和計(jì)算性能的優(yōu)越性的存在,尤其適合解決此類問(wèn)題。哈希方法是將原始數(shù)據(jù)的特征映射到海明空間,得到它們的二值編碼,通過(guò)計(jì)算數(shù)據(jù)的哈希碼之間的海明距離得到檢索結(jié)果,大大提高了數(shù)據(jù)的檢索效率;同時(shí),以哈希碼代替原始數(shù)據(jù)存儲(chǔ),也極大提高了空間存儲(chǔ)效率。哈希方法根據(jù)其模型是否使用樣本的監(jiān)督信息訓(xùn)練數(shù)據(jù)可分為:有監(jiān)督哈希方法、半監(jiān)督哈希方法和無(wú)監(jiān)督哈希方法。無(wú)監(jiān)督哈希方法采用無(wú)標(biāo)簽的數(shù)據(jù)進(jìn)行訓(xùn)練來(lái)得到相應(yīng)的哈希碼。有監(jiān)督哈希方法采用帶標(biāo)簽的數(shù)據(jù)來(lái)提升檢索的性能。半監(jiān)督哈希方法采用部分?jǐn)?shù)據(jù)的監(jiān)督信息,相對(duì)于無(wú)監(jiān)督哈希方法使用了標(biāo)簽等信息提升了性能,相比監(jiān)督哈希方法對(duì)數(shù)據(jù)要求更低,不必所有數(shù)據(jù)都含有標(biāo)簽。然而在現(xiàn)實(shí)生活中,上述單模態(tài)哈希方法并不總能有效解決問(wèn)題,于是多模態(tài)哈希方法因此產(chǎn)生。多模態(tài)哈希方法主要目的是利用一個(gè)模態(tài)中的數(shù)據(jù)檢索另一模態(tài)中的相似數(shù)據(jù),其主要形式是檢索另一模態(tài)中與某一哈希碼海明距離相近的哈希碼。在大規(guī)模數(shù)據(jù)廣泛利用的今天,對(duì)多模態(tài)多標(biāo)簽數(shù)據(jù)進(jìn)行檢索具有良好的應(yīng)用前景和較高的學(xué)術(shù)價(jià)值,同時(shí)哈希方法和多標(biāo)簽學(xué)習(xí)所利用的機(jī)器學(xué)習(xí)也是當(dāng)前互聯(lián)網(wǎng)行業(yè)以及計(jì)算機(jī)科技產(chǎn)業(yè)的研究熱點(diǎn),具有很大商業(yè)價(jià)值。優(yōu)秀的多模態(tài)哈希算法在網(wǎng)頁(yè)檢索、圖片檢索等領(lǐng)域可以提高檢索速度,進(jìn)而提升用戶體驗(yàn),具有廣泛的應(yīng)用前景和價(jià)值。不同于傳統(tǒng)的單一標(biāo)簽學(xué)習(xí),多標(biāo)簽學(xué)習(xí)的訓(xùn)練樣本包含多個(gè)標(biāo)簽,更加貼近現(xiàn)實(shí)生活中的情況,然而對(duì)其應(yīng)用單標(biāo)簽學(xué)習(xí)方法并不能得到良好的反回結(jié)果,且多標(biāo)簽學(xué)習(xí)更加昂貴和耗時(shí)。利用哈希方法對(duì)多標(biāo)簽數(shù)據(jù)進(jìn)行檢索可以降低檢索的時(shí)間復(fù)雜度和空間復(fù)雜度。本文旨在使用哈希方法對(duì)大規(guī)模多模態(tài)多標(biāo)簽數(shù)據(jù)進(jìn)行跨模態(tài)檢索。針對(duì)多模態(tài)多標(biāo)簽數(shù)據(jù)集設(shè)計(jì)哈希算法,利用圖片檢索數(shù)據(jù)庫(kù)中的相似文本和利用文本檢索數(shù)據(jù)庫(kù)中的相似圖片。以往提出的多種多模態(tài)哈希方法很少考慮多標(biāo)簽的影響,或者僅僅利用標(biāo)簽構(gòu)建一個(gè)簡(jiǎn)單的相似性矩陣,即當(dāng)兩個(gè)樣本具有至少一個(gè)相同標(biāo)簽時(shí)使矩陣的對(duì)應(yīng)項(xiàng)為1,否則另其值為0。而事實(shí)上這樣的方法不能充分利用多標(biāo)簽所含有的信息,我們期望找到一個(gè)能夠充分利用多標(biāo)簽所含信息的模型來(lái)提高方法的性能。本文假設(shè)每一個(gè)標(biāo)簽對(duì)應(yīng)海明空間中的一個(gè)哈希碼,而數(shù)據(jù)樣本的哈希碼可以由標(biāo)簽和標(biāo)簽哈希碼的線性組合來(lái)產(chǎn)生。對(duì)于測(cè)試樣本,我們?cè)谧钚』嗨茦颖緦?duì)間的海明距離的同時(shí),最大化相異樣本對(duì)間的海明距離,利用產(chǎn)生的投影矩陣來(lái)獲得測(cè)試集的哈希碼。在學(xué)習(xí)到哈希碼之后,由于學(xué)到的哈希碼是由錨點(diǎn)的線性組合產(chǎn)生的,所以我們認(rèn)為這些哈希碼包含大量的標(biāo)簽信息,所以我們拓展哈希碼的應(yīng)用范圍,對(duì)這些哈希碼應(yīng)用多標(biāo)簽分類函數(shù),即用哈希碼作為多標(biāo)簽分類的訓(xùn)練樣本特征。由于特征使用的是壓縮后的二進(jìn)制編碼,所以其還具有開(kāi)銷低、速度快等優(yōu)點(diǎn)。我們將上述方法在三個(gè)常用的公開(kāi)數(shù)據(jù)集上與現(xiàn)有的最新多模態(tài)哈希方法進(jìn)行比較,實(shí)驗(yàn)結(jié)果證明本方法的性能優(yōu)于作為對(duì)比的各個(gè)多模態(tài)哈希方法。我們同樣使用哈希碼做了多標(biāo)簽分類的對(duì)比試驗(yàn),實(shí)驗(yàn)證明我們的方法是有效的。
[Abstract]:In recent years, with the rapid development of mobile Internet technology and mobile devices in our country and the world, the scale of data is becoming more and more large, and the ways and types of data storage are becoming more and more diverse. The processing of multimodal and multi label data is becoming very important in life, such as web pages, news and so on as words, pictures and videos. Each web page often has several keywords as labels. For how to search suitable content in multimodal data, cross modal retrieval that should be generated by this method has become a critical problem. Hash method is especially suitable for solving such problems because of its superiority in storage performance and computing performance. The method is to map the characteristics of the original data to the Hamming space, get their two value code, get the retrieval results by calculating the hash distance between the hash codes of the data, greatly improve the retrieval efficiency of the data. At the same time, the hash code is used instead of the original data storage, and the space storage efficiency is greatly raised. Hash method is based on the model of the hash. The supervised hash training data can be divided into: supervised hash method, semi supervised hash method and unsupervised hash method. Unsupervised hash method is used to train the corresponding hash code using unlabeled data. The supervised hash method uses the labeled data to improve the performance of the retrieval. Semi supervised hash method is used. Compared with the unsupervised hash method, the supervised hash method uses the label information to improve the performance compared with the unsupervised hash method. Compared with the supervised hash method, the data are not required to contain all the tags. However, in real life, the single mode hash method does not always effectively solve the problem, so the multimodal hash method is therefore the result. The main purpose of the multimodal hash method is to retrieve the similar data in another mode by using the data in one mode, the main form of which is to retrieve the hash code that is similar to a hash code in another mode. Today, it is well used to retrieve multi-modal and multi label data in the extensive use of large-scale data. The prospect and high academic value, and the machine learning used by hash method and multi label learning is also the research hotspot of the Internet industry and the computer science and technology industry. It has great commercial value. The excellent multimodal hash algorithm can improve the retrieval speed in the web search, picture retrieval and so on, and then improve the user's body. It has wide application prospect and value. Unlike traditional single label learning, the training sample of multi label learning contains multiple labels, which is more close to the real life. However, the application of single label learning method can not get good back results, and the multi label learning is more expensive and time-consuming. The retrieval of multi label data can reduce the time complexity and space complexity of the retrieval. This paper aims to use hash method to carry out cross modal retrieval on large-scale multi-modal and multi label data. Similar pictures in the library. Many of the previous multimodal hash methods rarely consider the influence of multi label, or simply construct a simple similarity matrix using labels, that is, when two samples have at least one same label, the corresponding term of the matrix is 1, otherwise the other is 0.. In fact, the method can not be fully utilized. We expect to find a model that can make full use of the information contained in multiple tags to improve the performance of the method. This paper assumes that each tag corresponds to a hash code in the sea space, and the hash code of the data sample can be produced by a linear combination of labels and label hash codes. For test samples, We consider the hash code of the test set to be obtained by minimizing the Hamming distance between the similar samples, maximizing the hash distance between the different samples and using the generated projection matrix. After learning the hash code, the hash codes learned are produced by the linear combination of the anchors, so we think that these hash codes contain a large number of marks. We sign the information, so we extend the application range of hash code and apply the multi label classification function to these hash codes, that is, the hash code is used as the training sample feature of the multi label classification. Because the feature uses the compressed binary encoding, it also has the advantages of low overhead and fast speed. We use the above method in three common common public codes. The experimental results show that the performance of this method is better than the various multimodal hash methods used as contrast. We also use hash code to make a comparative test of multi label classification. The experiment proves that our method is effective.
【學(xué)位授予單位】:山東大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:TP391.3

【相似文獻(xiàn)】

相關(guān)期刊論文 前10條

1 葉軍偉;;哈希表沖突處理方法淺析[J];科技視界;2014年06期

2 張勇,雷振明;基于流應(yīng)用中的哈希查表性能研究[J];計(jì)算機(jī)工程與應(yīng)用;2003年25期

3 馬如林;蔣華;張慶霞;;一種哈希表快速查找的改進(jìn)方法[J];計(jì)算機(jī)工程與科學(xué);2008年09期

4 蔣大宏;動(dòng)態(tài)哈希方法[J];計(jì)算機(jī)工程;1993年01期

5 蔣大宏;實(shí)現(xiàn)檢索代價(jià)最優(yōu)的動(dòng)態(tài)哈希法[J];計(jì)算機(jī)工程與應(yīng)用;1994年Z2期

6 劉冠福;;動(dòng)態(tài)哈希表的設(shè)計(jì)及應(yīng)用[J];計(jì)算機(jī)時(shí)代;1996年02期

7 朱芳芳;李訓(xùn)根;;改進(jìn)的哈希表查找算法[J];杭州電子科技大學(xué)學(xué)報(bào);2013年05期

8 趙宇;;基于哈希表查找方法的優(yōu)勢(shì)及其算法的改進(jìn)[J];中小企業(yè)管理與科技(下旬刊);2012年03期

9 高文利;朱麗;;哈希表在計(jì)算語(yǔ)言學(xué)中的運(yùn)用[J];現(xiàn)代語(yǔ)文(語(yǔ)言研究版);2009年06期

10 賀元香;史寶明;;除留余數(shù)法建立哈希表的方法改進(jìn)[J];甘肅科技;2008年07期

相關(guān)會(huì)議論文 前2條

1 朱芳芳;李訓(xùn)根;;改進(jìn)的哈希表查找算法[A];浙江省電子學(xué)會(huì)2013學(xué)術(shù)年會(huì)論文集[C];2013年

2 趙競(jìng);余宏亮;張X;鄭緯民;;廣域網(wǎng)分布式哈希表存儲(chǔ)副本可靠性的維護(hù)[A];全國(guó)網(wǎng)絡(luò)與信息安全技術(shù)研討會(huì)論文集(下冊(cè))[C];2007年

相關(guān)博士學(xué)位論文 前4條

1 黃慧群;內(nèi)容中心網(wǎng)絡(luò)的查表技術(shù)研究[D];解放軍信息工程大學(xué);2014年

2 季劍秋;面向大規(guī)模數(shù)據(jù)相似計(jì)算和搜索的哈希方法研究[D];清華大學(xué);2015年

3 彭建章;非阻塞算法與多進(jìn)程網(wǎng)絡(luò)程序優(yōu)化研究[D];中國(guó)科學(xué)技術(shù)大學(xué);2013年

4 付海燕;基于圖像哈希的大規(guī)模圖像檢索方法研究[D];大連理工大學(xué);2014年

相關(guān)碩士學(xué)位論文 前10條

1 郝廣洋;語(yǔ)音感知哈希及其在密文語(yǔ)音檢索中的應(yīng)用研究[D];西南交通大學(xué);2015年

2 黃志騫;基于迭代量化的用于近似最近鄰檢索的哈希方法[D];華南理工大學(xué);2015年

3 王聰;基于局部敏感哈希的聲源定位方法[D];大連理工大學(xué);2015年

4 鄧慧茹;面向大規(guī)模視覺(jué)檢索的哈希學(xué)習(xí)[D];西安電子科技大學(xué);2014年

5 張梁;基于局部敏感哈希的近似近鄰查詢算法研究[D];南京郵電大學(xué);2015年

6 任劉姣;感知哈希及其在語(yǔ)音檢索與認(rèn)證中的應(yīng)用[D];西南交通大學(xué);2016年

7 王戊林;面向視頻檢索的高效哈希技術(shù)研究[D];山東大學(xué);2016年

8 黃賽金;基于譜哈希的分布式近鄰存儲(chǔ)方法的設(shè)計(jì)與實(shí)現(xiàn)[D];南京郵電大學(xué);2016年

9 孫永;基于哈希的快速多標(biāo)記學(xué)習(xí)算法研究[D];南京郵電大學(xué);2016年

10 李蕾;基于哈希的圖像檢索研究[D];北京交通大學(xué);2017年



本文編號(hào):1996746

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/1996746.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶8c38d***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com