天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當前位置:主頁 > 科技論文 > 軟件論文 >

基于深度學習的圖像語義標注與描述研究

發(fā)布時間:2018-04-11 07:37

  本文選題:圖像標注 + 卷積神經(jīng)網(wǎng)絡。 參考:《廣西師范大學》2017年碩士論文


【摘要】:隨著信息科學技術的高速發(fā)展,伴隨而來的是多樣的媒體數(shù)據(jù)飛速增長,這得益于數(shù)字設備的普及和存儲技術的發(fā)展。面對大量無標簽數(shù)據(jù)的產(chǎn)生,如文本、音頻、圖像及視頻等,如何管理和使用這些無標注數(shù)據(jù),儼然成為一個亟需解決的問題。當前的圖像語義標注技術可以有效地對圖像進行標注,這不僅能夠幫助人們管理大量的無標記圖像,還能夠讓機器更智能的理解圖像,所以圖像語義標注是一項非常有意義的研究工作。所謂圖像理解技術,其核心技術是在圖像處理分析基礎上,結合計算機視覺和自然語言處理等相關理論,進而分析、理解圖像內容,并以文本語義信息的形式反饋給人類。因此圖像理解技術的完成不僅需要圖像標注,還需要圖像描述。圖像標注的任務是以圖像為對象,語義信息為載體,研究圖像中有何物體以及物體之間的聯(lián)系。圖像描述的任務是以自然語言處理技術分析并產(chǎn)生標注詞,進而將生成的標注詞組合為自然語言的描述語句。近年來,圖像描述得到了研究界的極大興趣,同圖像標注工作一樣,它們都具有廣闊的應用前景。論文以圖像語義標注為研究主線,以多媒體數(shù)據(jù)中的圖像作為研究對象,以圖像描述為應用擴展,按照特征提取表示-語義映射模型構建-分析理解語義的研究思路,重點研究圖像標注中的目標識別和語義分析問題,其中包括特征學習、多標簽分類、語義關聯(lián)性分析和單詞語句序列生成等技術;谝陨涎芯,本文的主要工作有:為了縮減不同模態(tài)數(shù)據(jù)間的語義鴻溝,提出了 一種基于深度卷積神經(jīng)網(wǎng)絡(Deep Convolutional Neural Network,CNN)和集成的分類器鏈(Ensembles of Classifier Chains,ECC)的圖像多標注混合架構CNN-ECC。該模型框架主要由生成式特征學習和判別式語義學習兩階段構成。第一步利用改進的卷積神經(jīng)網(wǎng)絡學習圖像多示例融合的高級視覺特征。第二步基于獲取的視覺特征與圖像的語義標簽集訓練集成的分類器鏈,集成的分類器鏈不僅能夠學習到視覺特征包含的語義信息,還能夠充分挖掘語義標簽間的關聯(lián)性,使得生成的標簽間具有更強的關聯(lián)性,從而避免產(chǎn)生冗余的標簽。最終利用訓練得到的模型對未知的圖像進行自動語義標注。圖像標注為圖像描述工作奠定了基礎,為了將圖像生成的標注詞組裝成自然語言的語句描述,提出了一種基于卷積神經(jīng)網(wǎng)絡(Convolutional Neural Network,CNN)和雙向長短期記憶單元(Double Long-short Term Memory,DLSTM)的圖像描述模型 CNN-DLSTM。該模型框架由視覺模型和語言模型兩部分組成。首先視覺模型用于學習圖像視覺內容概念,生成圖像關鍵語義詞。其次語言模型基于人工的描述序列學習詞法與語法,結合視覺概念詞和相應的語法生成對應的語言描述,完成圖像描述任務。為了使模型生成的語句更加類人化,最后CNN-DLSTM還引入了一個生成描述質量的置信評估模型,選擇性輸出得分更高的圖像描述語句。圖像的內容不僅復雜而抽象,而且在語義概念上也存在模糊和多義性等特點。因而本文在圖像標注的特征學習、語義學習等關鍵工作上做出改進,實現(xiàn)圖像自動標注,改善了圖像標注及描述性能。
[Abstract]:With the rapid development of information science and technology, accompanied by the rapid growth of a variety of media data, which benefited from the development of digital devices and storage technology. In the face of a large number of unlabeled data, such as text, audio, image and video, how to manage and use the unlabeled data, has become a a problem to be solved. The current image semantic annotation technology can annotate the image effectively, unmarked images of this can not only help to manage a large number of people, also can make the machines understand more intelligent image, so image semantic annotation is a very meaningful research work. The image understanding technology, its core in the image processing technology is the basis of the analysis, combined with computer vision and Natural Language Processing and other related theories, and analysis, understand the content of the image, and the semantic information of text in the form of feedback to Human image understanding technology. Therefore the need not only to complete image annotation, image description. The task still need image annotation based on image semantic information for the object, as the carrier, research object and object relation between any image. The task of image description is based on Natural Language Processing technology analysis and annotation, which will generate a statement the combination of natural language annotation words. In recent years, image description has been great interest in the research community, with the image annotation work, which has a wide application prospect. Based on image semantic annotation is the main line to the image in multimedia data as the research object, the description of the image application, in accordance with the characteristics of extraction - research ideas of constructing - Analysis and understanding semantic mapping model, object recognition and semantic analysis focus on image annotation, including The characteristics of learning, multi label classification, semantic association analysis and word and sentence sequence generation. Based on the above research, the main works of this paper are: in order to reduce the semantic gap between different modal data, proposes a convolutional neural network based on (Deep Convolutional Neural Network, CNN) and the integrated classifier chain (Ensembles of Classifier Chains, ECC) of the image annotation CNN-ECC. hybrid architecture of the model framework is mainly composed of generative learning and discriminative learning of semantic features two stage. The first step in the use of improved convolution neural network multi instance learning image fusion advanced visual features. The second step semantic label visual features and image acquisition based on the training set the integrated classifier chain, integrated classifier chain can not only learn the semantic information contained in visual features, but also can fully dig the semantic labels The relevance, relevance is more generated between tags, so as to avoid redundant label. Finally using the model which is trained on the unknown image automatic semantic annotation. The image has laid the foundation for image annotation description, in order to generate image annotation phrases into natural language sentence description, proposed a based on a convolutional neural network (Convolutional Neural Network, CNN) and two long short term memory unit (Double Long-short Term Memory, DLSTM) of the image description model CNN-DLSTM. the model framework by visual and language model is composed of two parts. The first visual model for image visual content of concept learning, image semantic key. Second language model based on artificial description of sequence learning of lexical and syntax, combined with visual concept words and the corresponding grammar generates a corresponding language description, complete graph Like describing the task. In order to make the statement more humanoid model generation, finally, CNN-DLSTM also introduced a confidence evaluation model is generated to describe the quality of the output image selective score higher description statement. The content of the image is complicated and abstract, but also vague and ambiguous and other features in the semantic concept. So this study in the characteristics of image annotation, semantic learning and other key work to improve, to achieve automatic image annotation, image annotation and description of improved performance.

【學位授予單位】:廣西師范大學
【學位級別】:碩士
【學位授予年份】:2017
【分類號】:TP391.41

【參考文獻】

相關期刊論文 前5條

1 李志欣;施智平;張燦龍;王金艷;;混合生成式和判別式模型的圖像自動標注[J];中國圖象圖形學報;2015年05期

2 裴明濤;王永杰;賈云得;郭志強;;基于多尺度模板匹配和部件模型的車牌字符分割方法[J];北京理工大學學報;2014年09期

3 向征;譚恒良;馬爭鳴;;改進的HOG和Gabor,LBP性能比較[J];計算機輔助設計與圖形學學報;2012年06期

4 尹文杰;韓軍偉;郭雷;賀勝;許明;;基于顯著區(qū)域的圖像自動標注[J];計算機應用研究;2011年10期

5 李志欣;施智平;李志清;史忠植;;圖像檢索中語義映射方法綜述[J];計算機輔助設計與圖形學學報;2008年08期

,

本文編號:1735035

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/1735035.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權申明:資料由用戶fc53e***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com