基于深度學(xué)習(xí)的圖像多標簽分類算法研究
發(fā)布時間:2018-07-12 15:07
本文選題:深度學(xué)習(xí) + 卷積神經(jīng)網(wǎng)絡(luò)。 參考:《北京郵電大學(xué)》2016年碩士論文
【摘要】:隨著移動互聯(lián)網(wǎng)時代的到來,圖像和視頻數(shù)據(jù)急劇增長,造就了圖像大數(shù)據(jù)時代的到來。這就導(dǎo)致傳統(tǒng)的圖像單標簽分類技術(shù)已經(jīng)無法滿足對含有復(fù)雜語義圖像的分類識別的需求,亟待分類識別速度快、精度高的多標簽分類技術(shù)的出現(xiàn)。本文針對圖像多標簽語義分類過程,研究其圖像預(yù)處理、特征提取和多標簽分類器訓(xùn)練算法。良好的圖像特征對圖像分類識別系統(tǒng)的性能至關(guān)重要,而圖像預(yù)處理結(jié)果的好壞嚴重影響到能否提取到圖像的本質(zhì)特征以及提取過程的復(fù)雜度,同時多標簽分類器決定整個分類識別系統(tǒng)能否充分利用上述獲取的良好特征和標簽本身的特性以提高最終的分類識別性能。本文的主要工作如下:1.闡述了圖像預(yù)處理基本理論,針對圖像具有尺度不一、像素之間具有很強的相關(guān)性及高維度等特性,以及對比度差異可能對圖像特征提取過程具有不良影響等問題,提出了聯(lián)合使用圖像尺度歸一化、亮度和對比度歸一化、白化等技術(shù)預(yù)處理圖像。2.闡述了多標簽分類基本理論,得出充分利用標簽間相關(guān)性對提高分類性能具有重要作用的結(jié)論,針對RAkEL算法需要設(shè)置較多參數(shù)及較多數(shù)據(jù)的交叉驗證才能獲得最佳性能的不足問題,提出了采用基于GPU的并行交叉算法。該算法充分利用GPU強大的并行運行能力,同時執(zhí)行針對不同的參數(shù)驗證過程,從而提高了訓(xùn)練速度。3.闡述了深度學(xué)習(xí)的基本理論,從隱含層層數(shù)、權(quán)值共享等方面重點闡述了卷積神經(jīng)網(wǎng)絡(luò)(CNN)模型,本文最終采用一個輸入層、三個卷積層及三個特征映射層的卷積神經(jīng)網(wǎng)絡(luò)結(jié)構(gòu),同時采用池化技術(shù)低維度化特征向量以避免訓(xùn)練時的過擬合問題。接著提出了本文的核心算法CNN-RAkEL算法,并闡述了 CNN和RAkEL的結(jié)合原理及基于CNN-RAkEL的多標簽分類系統(tǒng)的學(xué)習(xí)訓(xùn)練過程。基于GPU和Pylearn2的深度學(xué)習(xí)模型庫在PASCALVOC2007圖像數(shù)據(jù)庫上進行系統(tǒng)試驗和參數(shù)調(diào)優(yōu)。仿真實驗表明在圖像多標簽分類領(lǐng)域,本文提出的基于CNN-RAkEL的多標簽分類系統(tǒng)識別率比CNN-SVM(PASCAL VOC 2007的最佳水平保持者)提高高達9.416個百分點。
[Abstract]:With the advent of the era of mobile Internet, the rapid growth of image and video data has brought about the arrival of the era of large data data. This leads to the fact that the traditional image single label classification technology has been unable to meet the needs of classification and recognition containing complex semantic images. It is urgent to classify and recognize the fast and high precision multi label classification technology. In this paper, the image preprocessing, feature extraction and multi label classifier training algorithm are studied for the image multi label semantic classification process. The good image features are very important to the performance of the image classification and recognition system, and the quality of the image preprocessing results seriously affects the essential characteristics of the image extraction and the complexity of the extraction process. At the same time, the multi label classifier determines whether the whole classification recognition system can make full use of the good features obtained and the characteristics of the tag itself to improve the final classification performance. The main work of this paper is as follows: 1. the basic theory of image preprocessing is described, and the image has a different scale, and there is a strong correlation between pixels. And the high dimension and other characteristics, and the contrast difference may have a bad influence on the process of image feature extraction, and put forward the combined use of image scale normalization, brightness and contrast normalization, whitening and other technology preprocessing image.2. to explain the basic theory of multi label classification, and make full use of the correlation between labels to improve the classification. The conclusion is that the RAkEL algorithm needs to set more parameters and cross validation of more data to obtain the problem of optimal performance. A parallel cross algorithm based on GPU is proposed. The algorithm makes full use of the powerful parallel running ability of GPU, and the execution of the needle to different parameters is improved. The training speed.3. expounds the basic theory of deep learning, and focuses on the convolution neural network (CNN) model from hidden layers, weight sharing and so on. In this paper, a convolution neural network structure with an input layer, three coiling layers and three feature mapping layers is adopted, and the low dimension eigenvector of pool technology is used to avoid the structure of the convolution neural network. The problem of overfitting in training is given. Then the core algorithm CNN-RAkEL algorithm is proposed, and the combination principle of CNN and RAkEL and the learning and training process of the multi label classification system based on CNN-RAkEL are introduced. The system experiment and parameter tuning of the GPU and Pylearn2 based depth learning model base on the PASCALVOC2007 image database are carried out. True experiments show that the recognition rate of the multi label classification system based on CNN-RAkEL is up to 9.416 percentage points higher than the CNN-SVM (the best level holder of PASCAL VOC 2007) in the image multi label classification field.
【學(xué)位授予單位】:北京郵電大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2016
【分類號】:TP391.41;TP181
【參考文獻】
相關(guān)期刊論文 前4條
1 張春霞;姬楠楠;王冠偉;;受限波爾茲曼機[J];工程數(shù)學(xué)學(xué)報;2015年02期
2 黃凱奇;任偉強;譚鐵牛;;圖像物體分類與檢測算法綜述[J];計算機學(xué)報;2014年06期
3 余凱;賈磊;陳雨強;徐偉;;深度學(xué)習(xí)的昨天、今天和明天[J];計算機研究與發(fā)展;2013年09期
4 李思男;李寧;李戰(zhàn)懷;;多標簽數(shù)據(jù)挖掘技術(shù):研究綜述[J];計算機科學(xué);2013年04期
相關(guān)碩士學(xué)位論文 前2條
1 王臻;基于學(xué)習(xí)標簽相關(guān)性的多標簽分類算法[D];中國科學(xué)技術(shù)大學(xué);2015年
2 林妙真;基于深度學(xué)習(xí)的人臉識別研究[D];大連理工大學(xué);2013年
,本文編號:2117549
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2117549.html
最近更新
教材專著