基于特征融合的視覺關(guān)注算法研究
本文選題:視覺關(guān)注 + 特征融合。 參考:《中國礦業(yè)大學(xué)(北京)》2017年博士論文
【摘要】:視覺關(guān)注是計(jì)算機(jī)視覺領(lǐng)域的重要研究內(nèi)容之一,是指利用模式識別、機(jī)器學(xué)習(xí)等分析方法預(yù)測實(shí)驗(yàn)對象關(guān)注的感興趣目標(biāo)或者方向;谔卣魅诤系囊曈X關(guān)注算法是指通過特征提取和融合的方式構(gòu)建頭部特征矩陣,并計(jì)算頭部姿態(tài)信息或者凝視方向信息,最終確定視覺關(guān)注的目標(biāo)或者方向。近年來視覺關(guān)注算法在公共安全、自然會(huì)議和輔助駕駛等諸多領(lǐng)域得到廣泛應(yīng)用。雖然大量研究人員對基于人臉特征的視覺關(guān)注算法進(jìn)行了大量研究,但是仍然存在許多問題,主要表現(xiàn)在三個(gè)方面:(1)局部特征與全局特征的表達(dá)不平衡問題。通常,特征融合方法選取整幅圖片的各種特征進(jìn)行加權(quán)融合,僅考慮了特征融合的全局有效性而造成局部特征表達(dá)不充分,或者是僅考慮局部特征,采用多種方法提取局部特征,產(chǎn)生全局特征表達(dá)過于復(fù)雜的問題。同一圖像不同區(qū)域的顯著特征各異,從全局提取特征多種特征容易造成全局特征計(jì)算復(fù)雜度高;而提取少量的特征會(huì)引起局部特征信息表達(dá)不充分。為了高效的提取到盡可能充分的局部特征,降低全局特征的計(jì)算復(fù)雜度,需要綜合考慮局部特征與全局特征的平衡有效表達(dá)。(2)頭部姿態(tài)表達(dá)復(fù)雜計(jì)算效率低下的問題。頭部姿態(tài)是視覺關(guān)注技術(shù)的核心組成部分。準(zhǔn)確的頭部姿態(tài)估計(jì)可以高效的推動(dòng)視覺關(guān)注目標(biāo)預(yù)測和跟蹤。頭部姿態(tài)估計(jì)方法包括基于外觀模型的、基于幾何模型的和基于特征表達(dá)的三大類;谔卣鞅磉_(dá)的方法容易被外界環(huán)境干擾如頭部配飾、頭部位置變化;基于外觀模型的方法需要訓(xùn)練大量頭部數(shù)據(jù),并且需要將姿態(tài)信息在訓(xùn)練樣本中進(jìn)行準(zhǔn)確標(biāo)注;基于幾何模型的方法實(shí)時(shí)性高,受到相機(jī)標(biāo)定參數(shù)、圖像分辨率的嚴(yán)格限制,另外單個(gè)攝像頭無法獲得深度信息,即使準(zhǔn)確率在達(dá)到了像素級,仍然存在5°左右的姿態(tài)角度誤差。為了準(zhǔn)確表達(dá)并高效計(jì)算頭部姿態(tài),需要構(gòu)造高效簡潔的頭部姿態(tài)特征矩陣和頭部姿態(tài)計(jì)算方法。(3)視覺關(guān)注中頭部姿態(tài)有與凝視方向的歧義性問題。凝視方向與頭部姿態(tài)是視覺關(guān)注算法研究的兩個(gè)核心內(nèi)容,二者相輔相成,缺一不可。單一的頭部姿態(tài)或者凝視方向并不能準(zhǔn)確表達(dá)人的視覺關(guān)注狀態(tài)。在同一頭部朝向范圍內(nèi)存在多個(gè)潛在的關(guān)注目標(biāo),需要結(jié)合凝視方向才能準(zhǔn)確鎖定視覺關(guān)注目標(biāo);此外,在頭部朝向確定的條件下,存在凝視偏移,即視覺關(guān)注目標(biāo)正在發(fā)生變化。目前對于視覺關(guān)注的研究往往集中于頭部姿態(tài)分析或者凝視方向估計(jì)兩個(gè)獨(dú)立的方面,并沒有達(dá)到緩解頭部朝向與凝視方向歧義的目的。因此,綜合考慮頭部姿態(tài)與凝視方向之間的關(guān)系,能夠緩解頭部朝向和凝視方向歧義問題的視覺關(guān)注算法亟待提出。由于局部特征與全局特征的不平衡性、頭部姿態(tài)表達(dá)的復(fù)雜性和計(jì)算的低時(shí)效性、頭部朝向與凝視方向的歧義性,基于特征融合的視覺關(guān)注算法仍是艱難并富有挑戰(zhàn)的研究課題。針對以上問題本文進(jìn)行了以下三個(gè)方面的研究工作。(1)局部特征與全局特征的表達(dá)平衡性。在特征融合方面,為了提高局部特征表達(dá)充分性,降低全局特征表達(dá)復(fù)雜性,達(dá)到局部特征與全局特征的平衡性,本文構(gòu)建了基于信息熵的局部特征提取框架,提出了用于頭部姿態(tài)估計(jì)的加權(quán)熵融合的Gabor和Phase Congruency頭部特征矩陣。首先,根據(jù)信息熵理論衡量圖像局部特征的重要程度,確定何種特征能充分的表達(dá)該區(qū)域的原始信息;然后將所有的局部特征以簡潔方式聯(lián)結(jié)構(gòu)成全局特征矩陣;最后,通過公開的人臉數(shù)據(jù)集、頭部數(shù)據(jù)集使用機(jī)器學(xué)習(xí)分類器和回歸器進(jìn)行驗(yàn)證,說明文中提出的加權(quán)信息熵融合的頭部姿態(tài)特征矩陣結(jié)合相應(yīng)的監(jiān)督學(xué)習(xí)方法在頭部姿態(tài)的分類性能優(yōu)于常用全局特征融合矩陣。(2)頭部姿態(tài)的準(zhǔn)確表達(dá)和高效計(jì)算。在頭部姿態(tài)表達(dá)和計(jì)算面,為了提高頭部姿表達(dá)準(zhǔn)確性,提升頭部姿態(tài)計(jì)算的時(shí)效性,本文提出了基于深度信息重建的頭部姿態(tài)估計(jì)算法以及改進(jìn)的加權(quán)版本。首先提取頭部的LBP(Local binary pattern,LBP)特征構(gòu)建Adaboost-LBP人臉分類器;然后根據(jù)相機(jī)成像原理重建深度信息,根據(jù)深度信息及目標(biāo)與相機(jī)之間的幾何關(guān)系利用基于深度信息重建的頭部姿態(tài)估計(jì)算法計(jì)算頭部姿態(tài)。為了提高構(gòu)建深度信息的精確度,使用ASM(Active shape model,ASM)方法提取提取68點(diǎn)人臉輪廓模型,構(gòu)建加權(quán)深度信息重建算法;最后使用優(yōu)化后的深度信息結(jié)合頭部特征及外觀模型對視覺關(guān)注場景中的頭部姿態(tài)進(jìn)行實(shí)驗(yàn),說明本文提出的基于深度信息重建的頭部姿態(tài)估計(jì)算法和其改進(jìn)的加權(quán)版本在頭部姿態(tài)表達(dá)準(zhǔn)確性和計(jì)算性能兩方面優(yōu)于常用的頭部姿態(tài)估計(jì)方法。(3)頭部姿態(tài)與凝視方向的歧義性。視覺關(guān)注領(lǐng)域包括頭部姿態(tài)與凝視方向兩方面的研究工作。單一的頭部姿態(tài)可以對應(yīng)多個(gè)凝視方向,同一個(gè)凝視方向也可以處于不同的頭部姿態(tài)條件下。因此,用頭部姿態(tài)或者凝視方向來描述視覺關(guān)注會(huì)產(chǎn)生歧義。為了緩解視覺關(guān)注領(lǐng)域頭部姿態(tài)與凝視方向的歧義性,本文提出了凝視輔助的HMM(Hidden Markov Model,HMM)結(jié)合的視覺關(guān)注算法。首先,通過深度卷積神經(jīng)網(wǎng)絡(luò)學(xué)習(xí)獲得頭部數(shù)據(jù)并計(jì)算頭部姿態(tài)和凝視方向;然后,通過HMM將凝視方向與頭部姿態(tài)結(jié)合預(yù)測視覺關(guān)注方向或者目標(biāo);最后,使用公開頭部姿態(tài)數(shù)據(jù)集和實(shí)時(shí)視頻數(shù)據(jù)進(jìn)行實(shí)驗(yàn)分析,說明本文提出的凝視輔助的視覺關(guān)注算法方法在一定程度上削弱了視覺關(guān)注歧義性,能夠提高視覺關(guān)注目標(biāo)預(yù)測準(zhǔn)確率。通過公共數(shù)據(jù)集和視頻數(shù)據(jù)的同構(gòu)異構(gòu)數(shù)據(jù)驗(yàn)證,得出了以下結(jié)論:(1)采用加權(quán)信息熵特征融合框架對Gabor特征和Phase Congruency特征進(jìn)行融合,構(gòu)建的頭部姿態(tài)特征矩陣,既充分表達(dá)了頭部局部特征,也降低了全局特征的復(fù)雜性,達(dá)到了局部與全局的平衡,提高了頭部姿態(tài)估計(jì)算法分類精度與時(shí)效。(2)提出的基于深度信息重建的頭部姿態(tài)估計(jì)算法和其改進(jìn)后的加權(quán)版本,準(zhǔn)確地重建了深度信息,提高了頭部姿態(tài)表達(dá)的準(zhǔn)確性和姿態(tài)估計(jì)時(shí)效性。(3)提出的凝視輔助的視覺關(guān)注算法,通過HMM將凝視方向與頭部姿態(tài)結(jié)合預(yù)測視覺關(guān)注方向或者目標(biāo),緩解了視覺關(guān)注算法中頭部朝向與凝視方向的歧義性,降低了視覺關(guān)注的誤差。
[Abstract]:Visual attention is one of the important research contents in the field of computer vision. It refers to the use of pattern recognition, machine learning and other analytical methods to predict the interest target or direction of the experimental object. The visual attention algorithm based on feature fusion refers to the construction of the head feature matrix by feature extraction and fusion, and the calculation of the head posture. In recent years, visual attention algorithms have been widely used in many fields, such as public security, natural meeting and auxiliary driving. Although a large number of researchers have done a lot of research on visual attention algorithms based on face features, there are still many problems. It is mainly manifested in three aspects: (1) the problem of unbalanced expression of local and global features. Usually, the feature fusion method selects the various features of the whole picture to carry on the weighted fusion, only considering the global validity of the feature fusion, resulting in inadequate expression of the local features, or only considering the local features, and using a variety of methods to extract the bureaus. The features of the same image are too complex to express the characteristics of the same image. The distinct features of the different regions of the same image are different. It is easy to extract the features from the global feature to cause the high complexity of the global feature, and the extraction of a small number of features will cause insufficient local feature information to be expressed. In order to reduce the computational complexity of global features, it is necessary to consider the balanced and effective representation of local and global features. (2) the problem of low computational complexity in the expression of head attitude. Head pose is the core component of visual attention technology. The accurate head attitude estimation can efficiently promote the vision and tracking of visual attention. The head attitude estimation method includes three categories based on the appearance model, the geometric model and the feature based expression. The method based on the feature expression is easily disturbed by the external environment, such as the head accessories and the head position. The method based on the appearance model needs to train a large number of head data, and the attitude information needs to be in the training sample. Based on the geometric model, the method has high real-time performance, the camera calibration parameters and the image resolution are strictly limited. In addition, a single camera can not obtain depth information. Even if the accuracy rate reaches the pixel level, there is still a attitude angle error of about 5 degrees. In order to accurately express and efficiently calculate the head posture, it needs to be constructed. The high efficient and concise head attitude feature matrix and head attitude calculation method. (3) the head posture has the ambiguity problem with the gaze direction in the visual attention. The gaze direction and the head pose are the two core contents of the visual attention algorithm research. The two are complementary and indispensable. The single head posture or the gaze direction is not accurate. There are a number of potential attention targets in the same direction of the same head. It is necessary to lock the visual attention target with the direction of the gaze. In addition, there is a gaze shift under the condition of the head orientation, that is, the visual attention is changing. Research on visual attention is often focused on the focus of visual attention. Two independent aspects of head attitude analysis or gaze direction estimation do not achieve the purpose of alleviating the ambiguity of head orientation and gaze direction. Therefore, considering the relationship between head attitude and gaze direction, the visual attention algorithm which can alleviate the ambiguity problem of head orientation and gaze direction needs to be put forward urgently. The unbalance of global features, the complexity of the expression of the head attitude, the low timeliness of the computing, the ambiguity of the direction of the head and the direction of the gaze, the visual attention algorithm based on the feature fusion is still a difficult and challenging research topic. In this paper, the following three aspects are studied. (1) local features and global characteristics In the aspect of feature fusion, in order to improve the expression of local features, reduce the complexity of global feature expression and achieve the balance of local features and global features, this paper constructs a local feature extraction framework based on information entropy, and proposes a weighted entropy fusion Gabor and Phase Congruenc for head attitude estimation. Y head feature matrix. First, according to the information entropy theory to measure the importance of the local feature of the image, and determine what features can fully express the original information of the region, and then combine all the local features in a concise way to form a global feature matrix; finally, the header data set uses a machine learning score through an open face data set. The classification performance of head attitude is better than the common global feature fusion matrix. (2) the accurate expression of the head posture and the high efficiency calculation. In the head attitude expression and the computing surface, the head attitude expression and computing face are used to improve the head. In this paper, the head attitude estimation algorithm based on the depth information reconstruction and the improved weighted version are proposed. Firstly, the LBP (Local binary pattern, LBP) feature of the head is extracted and the Adaboost-LBP face classifier is constructed. Then the depth information is reconstructed according to the camera imaging principle, and the depth of the face is reconstructed according to the camera imaging principle. Degree information and the geometric relationship between the target and the camera use the head attitude estimation algorithm based on the depth information reconstruction to calculate the head posture. In order to improve the accuracy of the construction depth information, ASM (Active shape model, ASM) method is used to extract the 68 point face contour model and construct the weighted depth information reconstruction algorithm. Finally, the optimization is used to optimize the algorithm. After the depth information combined with the head feature and appearance model, the head posture in the visual attention scene is experimentation, and the proposed head attitude estimation algorithm based on the depth information reconstruction and its improved weighted version are superior to the commonly used head attitude estimation methods in the two sides of the head attitude expression and computing performance. (3) The ambiguity of the posture of the head and the direction of the gaze. The field of visual attention includes two aspects of the head attitude and the direction of the gaze. A single head posture can correspond to multiple gaze directions, and the same gaze direction can also be in a different head posture. Therefore, the visual attention will be described with the head attitude or the direction of the gaze. In order to alleviate the ambiguity of head posture and gaze direction in the field of visual attention, this paper presents the visual attention algorithm of HMM (Hidden Markov Model, HMM) combined with gaze assistant. First, the head data is obtained by the deep convolution neural network and the head attitude and the direction of gaze is calculated. Then, the direction of the gaze is calculated by HMM. Combined with the head posture, the visual attention direction or target is predicted. Finally, the open head attitude data set and real-time video data are used to carry out experimental analysis. It shows that the gaze assisted visual attention algorithm proposed in this paper weakens the ambiguity of visual attention to a certain extent, and can improve the accuracy of the visual attention target prediction. The following conclusions are obtained from the isomorphic heterogeneous data validation of public data sets and video data. (1) a weighted information entropy feature fusion framework is used to fuse the features of Gabor and Phase Congruency, and the head attitude feature matrix is constructed, which not only fully expresses the local feature of the head, but also reduces the complexity of the global feature, and reaches the local level. With the global balance, the classification accuracy and time limitation of the head attitude estimation algorithm are improved. (2) the proposed head attitude estimation algorithm based on the depth information reconstruction and its improved weighted version can accurately reconstruct the depth information, improve the accuracy of the posture expression of the head and the timeliness of the attitude estimation. (3) the visual correlation of the gaze assistance proposed. The method of injection is used to predict the direction of visual attention or target by combining the direction of the gaze with the head posture by HMM, which alleviates the ambiguity of the direction of the head and the gaze in the visual attention algorithm, and reduces the error of visual attention.
【學(xué)位授予單位】:中國礦業(yè)大學(xué)(北京)
【學(xué)位級別】:博士
【學(xué)位授予年份】:2017
【分類號】:TP391.41
【相似文獻(xiàn)】
相關(guān)期刊論文 前10條
1 於東軍,趙海濤,楊靜宇;人臉識別:一種基于特征融合及神經(jīng)網(wǎng)絡(luò)的方法(英文)[J];系統(tǒng)仿真學(xué)報(bào);2005年05期
2 周斌;林喜榮;賈惠波;周永冠;;量化層多生物特征融合的最佳權(quán)值[J];清華大學(xué)學(xué)報(bào)(自然科學(xué)版);2008年02期
3 丁寶亮;;基于局部特征融合的人臉識別研究[J];中國新技術(shù)新產(chǎn)品;2012年14期
4 劉增榮;余雪麗;李志;;基于特征融合的圖像情感語義識別研究[J];太原理工大學(xué)學(xué)報(bào);2012年05期
5 黃雙萍;俞龍;衛(wèi)曉欣;;一種異質(zhì)特征融合分類算法[J];電子技術(shù)與軟件工程;2013年02期
6 劉冰;羅熊;劉華平;孫富春;;光學(xué)與深度特征融合在機(jī)器人場景定位中的應(yīng)用[J];東南大學(xué)學(xué)報(bào)(自然科學(xué)版);2013年S1期
7 卞志國;金立左;費(fèi)樹岷;;特征融合與視覺目標(biāo)跟蹤[J];計(jì)算機(jī)應(yīng)用研究;2010年04期
8 韓萍;徐建龍;吳仁彪;;一種新的目標(biāo)跟蹤特征融合方法[J];中國民航大學(xué)學(xué)報(bào);2010年04期
9 何賢江;何維維;左航;;一種句詞五特征融合模型的復(fù)述研究[J];四川大學(xué)學(xué)報(bào)(工程科學(xué)版);2012年06期
10 劉冬梅;;基于特征融合的人臉識別[J];計(jì)算機(jī)光盤軟件與應(yīng)用;2013年12期
相關(guān)會(huì)議論文 前7條
1 劉冰;羅熊;劉華平;孫富春;;光學(xué)與深度特征融合在機(jī)器人場景定位中的應(yīng)用[A];2013年中國智能自動(dòng)化學(xué)術(shù)會(huì)議論文集(第三分冊)[C];2013年
2 翟懿奎;甘俊英;曾軍英;;基于特征融合與支持向量機(jī)的偽裝人臉識別[A];第六屆全國信號和智能信息處理與應(yīng)用學(xué)術(shù)會(huì)議論文集[C];2012年
3 卞志國;金立左;費(fèi)樹岷;;基于增量判別分析的特征融合與視覺目標(biāo)跟蹤[A];2009年中國智能自動(dòng)化會(huì)議論文集(第三分冊)[C];2009年
4 韓文靜;李海峰;韓紀(jì)慶;;基于長短時(shí)特征融合的語音情感識別方法研究[A];第九屆全國人機(jī)語音通訊學(xué)術(shù)會(huì)議論文集[C];2007年
5 羅昕煒;方世良;;寬帶調(diào)制信號特征融合方法[A];中國聲學(xué)學(xué)會(huì)水聲學(xué)分會(huì)2013年全國水聲學(xué)學(xué)術(shù)會(huì)議論文集[C];2013年
6 金挺;周付根;白相志;;一種簡單有效的特征融合粒子濾波跟蹤算法[A];2007年光電探測與制導(dǎo)技術(shù)的發(fā)展與應(yīng)用研討會(huì)論文集[C];2007年
7 孟凡潔;孔祥維;尤新剛;;基于特征融合的相機(jī)來源認(rèn)證方法[A];全國第一屆信號處理學(xué)術(shù)會(huì)議暨中國高科技產(chǎn)業(yè)化研究會(huì)信號處理分會(huì)籌備工作委員會(huì)第三次工作會(huì)議?痆C];2007年
相關(guān)博士學(xué)位論文 前10條
1 王曉萌;基于特征融合的視覺關(guān)注算法研究[D];中國礦業(yè)大學(xué)(北京);2017年
2 周斌;多生物特征融合理論的研究與實(shí)驗(yàn)[D];清華大學(xué);2007年
3 彭偉民;特征數(shù)據(jù)的量子表示與融合方法[D];華南理工大學(xué);2013年
4 陳倩;多生物特征融合身份識別研究[D];浙江大學(xué);2007年
5 蒲曉蓉;多模態(tài)生物特征融合的神經(jīng)網(wǎng)絡(luò)方法[D];電子科技大學(xué);2007年
6 王志芳;基于感知信息的多模態(tài)生物特征融合技術(shù)研究[D];哈爾濱工業(yè)大學(xué);2009年
7 王楠;基于多視覺特征融合的后方車輛檢測技術(shù)研究[D];東北大學(xué) ;2009年
8 徐穎;基于特征融合與仿生模式的生物特征識別研究[D];華南理工大學(xué);2013年
9 樊國梁;基于多類特征融合的蛋白質(zhì)亞線粒體定位預(yù)測研究[D];內(nèi)蒙古大學(xué);2013年
10 曾凡祥;復(fù)雜環(huán)境下魯棒實(shí)時(shí)目標(biāo)跟蹤技術(shù)研究[D];北京郵電大學(xué);2017年
相關(guān)碩士學(xué)位論文 前10條
1 付艷紅;基于特征融合的人臉識別算法研究與實(shí)現(xiàn)[D];天津理工大學(xué);2015年
2 許超;基于特征融合與壓縮感知的實(shí)木地板缺陷檢測方法研究[D];東北林業(yè)大學(xué);2015年
3 楊文婷;基于微博的情感分析算法研究與實(shí)現(xiàn)[D];西南交通大學(xué);2015年
4 梅尚健;基于特征融合的圖像檢索研究與實(shí)現(xiàn)[D];西南交通大學(xué);2015年
5 王鵬飛;基于多慢特征融合的人體行為識別研究[D];西南大學(xué);2015年
6 丁倩;基于語音信息的多特征情緒識別算法研究[D];山東大學(xué);2015年
7 薛冰霞;基于多模特征融合的人體跌倒檢測算法研究[D];山東大學(xué);2015年
8 何樂樂;醫(yī)學(xué)圖像分類中的特征融合與特征學(xué)習(xí)研究[D];電子科技大學(xué);2015年
9 戴博;基于結(jié)構(gòu)復(fù)雜度特征融合的視覺注意模型研究及其應(yīng)用[D];復(fù)旦大學(xué);2014年
10 王寧;基于特征融合的人臉識別算法[D];東北大學(xué);2013年
,本文編號:1941275
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/1941275.html