基于深度學(xué)習(xí)的魯棒性視覺跟蹤方法
發(fā)布時(shí)間:2019-08-07 08:53
【摘要】:傳統(tǒng)的視覺跟蹤方法(如L1等)大多直接使用視頻序列各幀內(nèi)的像素級特征進(jìn)行建模,而沒有考慮到各圖像塊內(nèi)部的深層視覺特征信息.在現(xiàn)實(shí)世界的固定攝像頭視頻監(jiān)控場景中,通常可以找到一塊區(qū)域,該區(qū)域中目標(biāo)物體具有清晰、易于分辨的表觀.因此,文中在各視頻場景內(nèi)事先選定一塊可以清晰分辨目標(biāo)表觀的參考區(qū)域用以構(gòu)造訓(xùn)練樣本,并構(gòu)建了一個(gè)兩路對稱且權(quán)值共享的深度卷積神經(jīng)網(wǎng)絡(luò).該深度網(wǎng)絡(luò)使得參考區(qū)域外目標(biāo)的輸出特征盡可能與參考區(qū)域內(nèi)目標(biāo)的輸出特征相似,以獲得參考區(qū)域內(nèi)目標(biāo)良好表征的特性.經(jīng)過訓(xùn)練后的深度卷積神經(jīng)網(wǎng)絡(luò)模型具有增強(qiáng)目標(biāo)可識(shí)別性的特點(diǎn),可以應(yīng)用在使用淺層特征的跟蹤系統(tǒng)(如L1等)中以提高其魯棒性.文中在L1跟蹤系統(tǒng)的框架下使用訓(xùn)練好的深度網(wǎng)絡(luò)提取目標(biāo)候選的特征進(jìn)行稀疏表示,從而獲得了跟蹤過程中應(yīng)對遮擋、光照變化等問題的魯棒性.文中在25個(gè)行人視頻中與當(dāng)前國際上流行的9種方法對比,結(jié)果顯示文中提出的方法的平均重疊率比次優(yōu)的方法高0.11,平均中心位置誤差比次優(yōu)的方法低1.0.
[Abstract]:Most of the traditional visual tracking methods (such as L _ 1, etc.) use pixel-level features in each frame of the video sequence directly to model, but do not take into account the deep visual feature information within each image block. In real-world fixed camera video surveillance scenes, an area can usually be found in which the target object has a clear and easy to distinguish appearance. Therefore, in this paper, a reference area which can clearly distinguish the apparent appearance of the target is selected in advance in each video scene to construct the training sample, and a deep convolution neural network with two symmetry and weight sharing is constructed. The depth network makes the output feature of the target outside the reference region as similar as possible to the output feature of the target in the reference region, in order to obtain the good representation of the target in the reference region. The trained deep convolution neural network model has the characteristic of enhancing the recognizability of the target, and can be applied to the tracking system using shallow features (such as L 1, etc.) to improve its robustness. In this paper, the trained depth network is used to extract the features of the target candidates for sparse representation in the framework of the L1 tracking system, and the robustness to deal with occlusion, lighting changes and other problems in the tracking process is obtained. Compared with 9 popular methods in 25 pedestrian videos, the results show that the average overlap rate of the proposed method is 0.11 higher than that of the suboptimal method, and the average center position error is 1.0 lower than that of the suboptimal method.
【作者單位】: 中國科學(xué)院自動(dòng)化研究所模式識(shí)別國家重點(diǎn)實(shí)驗(yàn)室;
【基金】:國家“九七三”重點(diǎn)基礎(chǔ)研究發(fā)展規(guī)劃項(xiàng)目基金(2012CB316304) 國家自然科學(xué)基金重點(diǎn)項(xiàng)目(61432019)資助
【分類號(hào)】:TP391.41
[Abstract]:Most of the traditional visual tracking methods (such as L _ 1, etc.) use pixel-level features in each frame of the video sequence directly to model, but do not take into account the deep visual feature information within each image block. In real-world fixed camera video surveillance scenes, an area can usually be found in which the target object has a clear and easy to distinguish appearance. Therefore, in this paper, a reference area which can clearly distinguish the apparent appearance of the target is selected in advance in each video scene to construct the training sample, and a deep convolution neural network with two symmetry and weight sharing is constructed. The depth network makes the output feature of the target outside the reference region as similar as possible to the output feature of the target in the reference region, in order to obtain the good representation of the target in the reference region. The trained deep convolution neural network model has the characteristic of enhancing the recognizability of the target, and can be applied to the tracking system using shallow features (such as L 1, etc.) to improve its robustness. In this paper, the trained depth network is used to extract the features of the target candidates for sparse representation in the framework of the L1 tracking system, and the robustness to deal with occlusion, lighting changes and other problems in the tracking process is obtained. Compared with 9 popular methods in 25 pedestrian videos, the results show that the average overlap rate of the proposed method is 0.11 higher than that of the suboptimal method, and the average center position error is 1.0 lower than that of the suboptimal method.
【作者單位】: 中國科學(xué)院自動(dòng)化研究所模式識(shí)別國家重點(diǎn)實(shí)驗(yàn)室;
【基金】:國家“九七三”重點(diǎn)基礎(chǔ)研究發(fā)展規(guī)劃項(xiàng)目基金(2012CB316304) 國家自然科學(xué)基金重點(diǎn)項(xiàng)目(61432019)資助
【分類號(hào)】:TP391.41
【參考文獻(xiàn)】
相關(guān)期刊論文 前4條
1 黃凱奇;陳曉棠;康運(yùn)鋒;譚鐵牛;;智能視頻監(jiān)控技術(shù)綜述[J];計(jì)算機(jī)學(xué)報(bào);2015年06期
2 肖國強(qiáng);康勤;江健民;張貝貝;;基于中心宏塊的視頻目標(biāo)跟蹤算法[J];計(jì)算機(jī)學(xué)報(bào);2011年09期
3 云廷進(jìn);郭永彩;高潮;;基于粒子Mean Shift遷移的紅外人體目標(biāo)跟蹤算法[J];計(jì)算機(jī)學(xué)報(bào);2009年06期
4 黃福珍,蘇劍波;基于Level Set方法的人臉輪廓提取與跟蹤[J];計(jì)算機(jī)學(xué)報(bào);2003年04期
【共引文獻(xiàn)】
相關(guān)期刊論文 前10條
1 任R,
本文編號(hào):2523847
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2523847.html
最近更新
教材專著