融合圖像場景及物體先驗知識的圖像描述生成模型
[Abstract]:Objective at present, the methods of image description based on the deep convolution neural network (CNN) and the long-and long-term memory (LSTM) network model are usually based on the prior knowledge of the object category information to extract the CNN features of the image. Ignoring the priori knowledge of the scene in the image, resulting in the lack of accurate description of the scene in the generated sentences, it is easy to misjudge the position relationship of the object in the image. In order to solve this problem, an image description generation model (F-SOCPK) which combines the prior information of scene and object category is designed. The scene priori information in the image and the prior information of the object category are incorporated into the model, and the description sentences of the image are generated in collaboration. Improve the quality of sentence generation. Methods first, the parameters of CNN-S model were trained on the large-scale scene data set Place205, so that the CNN-S model could contain more prior information of the scene, and then the parameters of the model were migrated to CNNd-S by the method of migration learning. For capturing scene information in an image to be described; At the same time, the parameters in the CNN-O model are trained on the large-scale object class data set Imagenet, and then transferred to the CNNd-O model to capture the object information in the image. After extracting the scene information and object information of the image, they are fed into the language model LM-S and LMO respectively, and then the output information of LM-S and LM-O is transformed by Softmax function to get the probability score of each word in the single word list. Finally, the final value of each word is calculated by using the weighted fusion method, and the word corresponding to the maximum probability is taken as the output of the current time step, and finally the description sentence of the image is generated. Results the experiment was carried out on three open datasets MSCOCO,Flickr30k and Flickr8k. The model designed in this paper can reflect the BLEU index of sentence coherence and accuracy. The METEOR index, which reflects the accuracy and recall rate of the words in the sentence, and the CIDEr index, which reflect the semantic richness, all exceed the models that use the object category information alone, especially on the Flickr8k data set and on the CIDEr index. It's 9% higher than the Object-based model based on the object category alone and nearly 11% higher than the Scene-based model based on the scene category alone. Conclusion the method presented in this paper has a remarkable effect, and its performance is greatly improved on the basis of the benchmark model, and the performance of the proposed method is superior to that of other mainstream methods. Especially on larger data sets (such as MSCOCO), its advantages are obvious, but on smaller data sets (such as Flickr8k), its performance needs to be further improved. In the next step, more visual priori information, such as action category, object-to-object relationship and so on, will be incorporated into the model to further improve the quality of the description sentence. At the same time, more visual techniques, such as deeper CNN model, target detection, scene understanding and so on, will be combined to further improve the accuracy of sentences.
【作者單位】: 井岡山大學數(shù)理學院;井岡山大學流域生態(tài)與地理環(huán)境監(jiān)測國家測繪地理信息局重點實驗室;同濟大學計算機科學與技術(shù)系;井岡山大學電子與信息工程學院;
【基金】:流域生態(tài)與地理環(huán)境監(jiān)測國家測繪地理信息局重點實驗室基金項目(WE2016015) 江西省教育廳科學技術(shù)研究項目(GJJ160750,GJJ150788) 井岡山大學科研基金項目(JZ14012)~~
【分類號】:TP391.41
【相似文獻】
相關(guān)期刊論文 前10條
1 周衛(wèi)東,馮其波,匡萃方;圖像描述方法的研究[J];應用光學;2005年03期
2 吳娛;趙嘉濟;平子良;杜昊翔;;基于指數(shù)矩的圖像描述[J];現(xiàn)代電子技術(shù);2013年14期
3 任越美;程顯毅;李小燕;謝玉宇;;基于概念級語義的圖像描述與識別[J];計算機科學;2008年07期
4 毛玉萃;;一種面向用戶需求的圖像描述方法[J];制造業(yè)自動化;2010年11期
5 周昌;鄭雅羽;周凡;陳耀武;;基于局部圖像描述的目標跟蹤方法[J];浙江大學學報(工學版);2008年07期
6 宮偉力;安里千;趙海燕;毛靈濤;;基于圖像描述的煤巖裂隙CT圖像多尺度特征[J];巖土力學;2010年02期
7 胡美燕,姜獻峰,柴國鐘;Hu矩在一次性輸液針圖像描述中的應用[J];中國圖象圖形學報;2005年02期
8 謝玉鵬;吳海燕;;基于AAM的人臉圖像描述與編碼[J];計算機仿真;2009年06期
9 阿木古楞,楊性愉,平子良;用變形雅可比(p=4,q=3)-傅立葉矩進行圖像描述[J];光電子·激光;2003年09期
10 于永新;馮志勇;;基于常識庫支持的圖像描述和檢索系統(tǒng)[J];計算機應用研究;2009年02期
相關(guān)博士學位論文 前2條
1 梁浩然;自然圖像的視覺顯著性特征分析與檢測方法及其應用研究[D];浙江工業(yè)大學;2016年
2 湯進;基于圖理論的圖像描述與檢索方法研究[D];安徽大學;2007年
相關(guān)碩士學位論文 前2條
1 鐘艾妮;人臉識別中圖像描述方法的研究[D];哈爾濱工業(yè)大學;2010年
2 陳影;基于復雜網(wǎng)絡(luò)理論的圖像描述與識別方法研究[D];安徽大學;2014年
,本文編號:2456050
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2456050.html