基于上下文的目標(biāo)檢測算法研究
發(fā)布時間:2018-05-21 13:12
本文選題:目標(biāo)檢測 + 上下文。 參考:《南京大學(xué)》2017年碩士論文
【摘要】:近年來,隨著網(wǎng)絡(luò)普及以及視屏網(wǎng)站和社交網(wǎng)絡(luò)的興起,人們能夠接觸到大量的圖像和視屏等多媒體資源。正因此,計算機視覺得到快速的發(fā)展,而其中目標(biāo)檢測也受到越來越多的關(guān)注。目標(biāo)檢測作為一個分類問題,對計算機視覺和機器學(xué)習(xí)的研究發(fā)展也起到推波助瀾的作用。目標(biāo)檢測的應(yīng)用十分普遍,如人臉檢測、行人檢測、車輛檢測和圖像分類。為了達到檢測目的,目標(biāo)檢測通常分為兩個子任務(wù)[1]:目標(biāo)分類和目標(biāo)區(qū)域定位。目標(biāo)分類是判斷圖像中是否存在著被檢測的類別對象,若存在的話根據(jù)分類概率得出該對象所屬分類。而目標(biāo)區(qū)域定位是找出被檢測的對象的位置,通常會是一個矩形框。傳統(tǒng)的目標(biāo)檢測算法一般分為三個步驟,第一步使用滑動窗口選定一個區(qū)域,第二步對這個區(qū)域抽取特征,最后對區(qū)域特征進行分類得到結(jié)果。比如人臉檢測,首先在圖像上選擇滑動窗口,抽取LBP(Local Binary Pattern)或者HOG(Histogram of Oriented Gradient)等特征,然后采用SVM或AdaBoost分類器進行分類處理,判斷當(dāng)前窗口是否為人臉。從2006年開始逐步蔓延開的深度學(xué)習(xí)方法對計算機視覺領(lǐng)域產(chǎn)生了重大影響,應(yīng)用了深度學(xué)習(xí)的目標(biāo)檢測得到了跨越式的發(fā)展。采用Region Proposal的深度學(xué)習(xí)目標(biāo)檢測方法只需選取較少的窗口即可達到較高的召回率,基于回歸方法的深度學(xué)習(xí)目標(biāo)檢測方法更是大大加快了檢測的速度。在進行目標(biāo)檢測時,目標(biāo)常常會存在形變、被遮擋和視角變化等問題,這導(dǎo)致檢測結(jié)果不佳。眾多研究表明,合理利用圖像中的局部上下文、全局上下文和目標(biāo)上下文,能夠減輕這些問題的影響,從而提高檢測的準(zhǔn)確率。為了解決這些問題,本文在傳統(tǒng)方法和深度學(xué)習(xí)方法上分別提出了一種基于上下文的目標(biāo)檢測算法,主要研究內(nèi)容如下:1.基于LBP,設(shè)計了一種新的特征直方圖統(tǒng)計方法,加入了局部上下文信息。主要改動有兩點,一是擴展了直方圖統(tǒng)計區(qū)域,一是對于不同的位置,在統(tǒng)計時會給予不同的權(quán)重。2.基于YOLOv2,設(shè)計了一種上下文目標(biāo)檢測深度學(xué)習(xí)方法,加入了目標(biāo)上下文信息。首先,在訓(xùn)練數(shù)據(jù)集上計算得到類間相關(guān)性。然后,使用YOLOv2的卷積網(wǎng)絡(luò)得到邊界框(bounding boxes)和所有類的分類概率。選擇邊界框中置信度最高的框所屬的分類作為參考類,根據(jù)類間相關(guān)性改變所有類的分類概率。最后,分類概率最高的類作為指定類,計算窗口內(nèi)含有指定類的目標(biāo)的概率,篩選掉低于閾值的窗口。最后,對于以上兩種方法,本文分別在ORL人臉數(shù)據(jù)集和PASCAL目標(biāo)檢測數(shù)據(jù)集上進行了實驗,實驗結(jié)果表明本文提出的方法能夠獲得更高的檢測準(zhǔn)確性。
[Abstract]:In recent years, with the popularity of the network and the rise of video websites and social networks, people can access a large number of multimedia resources such as images and video. As a result, computer vision is developing rapidly, and target detection is attracting more and more attention. As a classification problem, target detection also contributes to the research and development of computer vision and machine learning. Target detection is widely used, such as face detection, pedestrian detection, vehicle detection and image classification. In order to achieve the purpose of detection, target detection is usually divided into two sub-tasks [1]: target classification and target region location. Target classification is to judge whether there is a class object to be detected in the image and, if it exists, to get the classification of the object according to the probability of classification. The location of the target area is to find out the location of the object being detected, usually a rectangular box. The traditional target detection algorithm is generally divided into three steps. In the first step, a region is selected using a sliding window; the second step is used to extract the features of the region; finally, the result is obtained by classifying the region features. For example, in face detection, a sliding window is first selected on the image to extract features such as LBP(Local Binary pattern or HOG(Histogram of Oriented Gradient), and then SVM or AdaBoost classifier is used to classify the current window to determine whether the current window is a face or not. The depth learning method, which has spread gradually since 2006, has a great influence on the field of computer vision, and the target detection of the application of deep learning has been developed by leaps and bounds. The depth learning target detection method based on Region Proposal can achieve a higher recall rate by selecting only a few windows, and the depth learning target detection method based on regression method greatly speeds up the detection speed. In target detection, there are always some problems, such as deformation, occlusion and change of angle of view, which lead to poor detection results. Many studies show that reasonable use of the local context, global context and target context in the image can reduce the impact of these problems and improve the accuracy of detection. In order to solve these problems, this paper proposes a context-based object detection algorithm based on traditional methods and depth learning methods. The main research contents are as follows: 1. Based on LBP, a new feature histogram statistic method is designed, and local context information is added. There are two main changes, one is to expand the histogram statistical area, the other is to give different weights. 2. Based on YOLOv2, a depth learning method for contextual object detection is designed, and target context information is added. First, the correlation between classes is calculated on the training data set. Then, YOLOv2 convolution network is used to obtain the boundary bounding boxes) and the classification probability of all classes. The classification which belongs to the box with the highest confidence in the boundary box is selected as the reference class, and the classification probability of all classes is changed according to the correlation between the classes. Finally, the class with the highest classification probability is used as the specified class. The probability of the target with the specified class in the window is calculated, and the window below the threshold value is filtered out. Finally, for the above two methods, the experiments are carried out on the ORL face data set and the PASCAL target detection data set, respectively. The experimental results show that the proposed method can achieve higher detection accuracy.
【學(xué)位授予單位】:南京大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:TP391.41
【參考文獻】
相關(guān)期刊論文 前3條
1 張春鳳;宋加濤;王萬良;;行人檢測技術(shù)研究綜述[J];電視技術(shù);2014年03期
2 李彬彬;安建成;;基于特征臉及人工神經(jīng)網(wǎng)絡(luò)的人臉識別[J];電腦開發(fā)與應(yīng)用;2012年04期
3 薛冰;郭曉松;蒲鵬程;;人臉識別技術(shù)綜述[J];四川兵工學(xué)報;2010年07期
,本文編號:1919340
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/1919340.html
最近更新
教材專著