基于學(xué)習(xí)的目標(biāo)檢測方法研究與應(yīng)用
發(fā)布時間:2018-10-29 08:25
【摘要】:目標(biāo)檢測是目標(biāo)跟蹤與識別等研究的基礎(chǔ),其方法的好壞與跟蹤、識別以及處理的精度息息相關(guān),F(xiàn)實生活中,由于圖像或視頻采集的過程中受到不同光照或氣候、局部遮擋、陰影、拍攝視角、目標(biāo)尺度改變以及旋轉(zhuǎn)等因素的影響,導(dǎo)致目標(biāo)的外觀形態(tài)特征有了很大變化,這些都給目標(biāo)檢測帶來了巨大的挑戰(zhàn)。針對靜止場景和動態(tài)背景兩種情況下目標(biāo)檢測存在的主要問題,本文主要研究基于學(xué)習(xí)的目標(biāo)檢測方法。提取高壓輸電線路中的電力線、監(jiān)測輸電線路以及監(jiān)測鐵塔形變和故障點是電力自動巡檢中的主要任務(wù)。在直升機電力巡檢中對拍攝的視頻圖像進(jìn)行電力鐵塔的檢測,這對于鐵塔類型的判定、鐵塔形變和故障的判斷均有著至關(guān)重要的作用。本文提出了基于學(xué)習(xí)的兩級塔臺檢測方法,主要由以下步驟組成:首先,從直升機/無人機拍攝的輸電線路巡檢視頻中剪切豐富的塔臺和非塔臺圖片構(gòu)成訓(xùn)練樣本集,并進(jìn)行正負(fù)樣本標(biāo)注;其次,對訓(xùn)練樣本集提取局部二值模式(LBP,Local Binary Pattern)特征,把特征集和標(biāo)注信息送入自適應(yīng)增強(ADABOOST,Adaptive Boosting)模型進(jìn)行訓(xùn)練,學(xué)習(xí)生成分類器classifier1;然后,設(shè)計深度學(xué)習(xí)CNN模型結(jié)構(gòu),把訓(xùn)練樣本集和標(biāo)注信息送入快速特征嵌入的卷積結(jié)構(gòu)(CAFFE,Convolution Architecture for Fast Feature Embedding)架構(gòu)下的卷積神經(jīng)網(wǎng)絡(luò)(CNN,Convolution Neural Network)模型,學(xué)習(xí)生成分類器classifier2;最后,在多尺度下,將滑動窗內(nèi)的測試視頻圖像塊送入訓(xùn)練生成的分類器classifier1,根據(jù)classifier1輸出得到塔臺候選區(qū)域;將塔臺候選區(qū)域送入訓(xùn)練得到的分類器classifier2,判定其是否為塔臺目標(biāo),根據(jù)classifier2輸出得到塔臺的準(zhǔn)確定位。自然場景中文字信息的獲取服務(wù)機器人在盲人輔助導(dǎo)航、視覺定位等領(lǐng)域有著廣泛的應(yīng)用前景。由于自然場景中出現(xiàn)的文字其位置、方向、字體、顏色、尺寸的多樣性和模糊、污染、遮擋等的影響,導(dǎo)致自然場景中文字的檢測定位本身就是一個極具挑戰(zhàn)性的問題。本文提出了一種基于視覺詞袋(BOVW,Bag of Visual Word)模型的文字標(biāo)牌高效檢測方法,主要由訓(xùn)練和測試兩部分組成。在訓(xùn)練部分,首先選用計算簡單且對尺度變化和旋轉(zhuǎn)有一定魯棒性的二進(jìn)制魯棒不變可擴展關(guān)鍵點(BRISK,Binary Robust Invariant Scalable Keypo-ints)作為文字標(biāo)牌的紋理特征;接著提取圖像的BRISK特征并進(jìn)行自生長和自組織神經(jīng)網(wǎng)絡(luò)(SGONG,Self-Growing and Self-Organized Neural Gas network)聚類,得到視覺字典;再提取訓(xùn)練正負(fù)樣本圖像的BRISK特征并在視覺字典上進(jìn)行特征量化,得到BRISK形狀直方圖特征,同時提取其HS顏色直方圖特征,并進(jìn)一步計算得到HS顏色不變性直方圖特征;然后融合兩個特征得到文字標(biāo)牌的強區(qū)分性特征;最后選用ADABOOST分類器作為文字標(biāo)牌的分類算法,對文字標(biāo)牌樣本集進(jìn)行訓(xùn)練得到文字標(biāo)牌檢測器。在測試部分,首先利用最大穩(wěn)定色彩區(qū)域(MSCR,Maximally Stable Color Regions)算法對自然場景中的文字標(biāo)牌進(jìn)行初檢以降低直接使用分類器進(jìn)行檢測的復(fù)雜度;然后在MSCR檢測得到的候選區(qū)域中,采用學(xué)習(xí)得到的文字標(biāo)牌檢測器對文字標(biāo)牌進(jìn)行細(xì)檢,得到定位的文字標(biāo)牌。使用時長共約30分鐘的測試視頻對本文直升機電力巡檢系統(tǒng)中的塔臺檢測方法進(jìn)行測試,結(jié)果表明,該方法可以達(dá)到92%的召回率和79%的準(zhǔn)確率,其加權(quán)調(diào)和平均值為85%,其平均每幀檢測耗時大約為0.33秒,相比于級聯(lián)ADABOOST方法和深度學(xué)習(xí)CNN方法,具有更好的檢測性能。由此可見,為進(jìn)一步執(zhí)行檢修任務(wù)并進(jìn)行故障判斷,本文的塔臺檢測方法可直接應(yīng)用在直升機電力巡檢系統(tǒng)中的塔臺自動檢測方面。使用678幅街景圖像(含661個文字標(biāo)牌)對本文自然場景中的文字標(biāo)牌檢測方法進(jìn)行測試,結(jié)果表明,該方法對于遠(yuǎn)、中、近文字標(biāo)牌的檢測率分別達(dá)到了 76%、81%和90%,識別文字標(biāo)牌的能力也分別為58%、78%和89%,相比于采用HS顏色不變性特征、尺度不變特征轉(zhuǎn)換(SIFT,Scale Invariant Feature Transform)特征、SIFT+HS、BRISK、快速視網(wǎng)膜關(guān)鍵點(FREAK,Fast Retina Keypoint)或FREAK+HS特征進(jìn)行文字標(biāo)牌檢測的方法,其對文字標(biāo)牌的定位更準(zhǔn)確,誤檢較少,檢測準(zhǔn)確率較高,且檢測耗時較少,因而具有較好的檢測性能。因此,為了更好地進(jìn)行文字分割與識別,本文的文字標(biāo)牌檢測方法可直接應(yīng)用于自然場景中的文字標(biāo)牌檢測定位方面。
[Abstract]:Target detection is the basis of research on target tracking and recognition, and its methods are closely related to tracking, identification and processing precision. in real life, due to the influence of different illumination or climate, partial shielding, shading, shooting angle, target scale change and rotation in the process of image or video acquisition, the appearance morphological characteristics of the target are greatly changed, These have brought enormous challenges to target detection. Aiming at the main problems of target detection in two cases of stationary scene and dynamic background, this paper mainly studies the target detection method based on learning. extracting power line in high-voltage transmission line, monitoring transmission line and monitoring tower deformation and fault point are main tasks in automatic power inspection of electric power. The detection of the power tower is carried out in the helicopter power inspection, which plays an important role in the determination of tower type, iron tower deformation and fault diagnosis. The paper puts forward a two-stage tower detection method based on learning, which mainly comprises the following steps: firstly, a training sample set is formed by cutting a rich tower and a non-tower picture in a transmission line inspection video shot by a helicopter/ unmanned aerial vehicle and carrying out positive and negative sample labeling; secondly, extracting a local two-valued mode (LBP, Local Binary Pattern) feature on the training sample set, sending the feature set and the annotation information into an adaptive enhancement (ADABOSS, Adaptive Boosting) model for training, and learning to generate a classifier classfier 1; then, designing a depth learning CNN model structure, the training sample set and the dimension information are fed into a convolution neural network (CNN) model under the structure of a fast feature embedded convolution structure (CAFFE, Convolant Architecture for Fast Transfer Embedded), and the classifier classfier 2 is generated; and finally, under the multi-scale, the test video image block in the sliding window is sent to the training-generated classifier classfier 1, the tower candidate area is obtained according to the output of the classfier 1, the tower candidate area is sent to the trained classifier classfier 2, and whether the tower candidate area is a tower target is judged, and the accurate positioning of the tower is obtained according to the output of the classfier 2. The retrieval service robot of the text information in the natural scene has a wide application prospect in the fields of assistant navigation, vision positioning and the like of the blind. Due to the influence of the position, direction, font, color and size of the characters appearing in the natural scene, the detection and positioning of the characters in the natural scene is a very challenging problem. This paper presents a high-efficiency detection method based on BOVW, Bag of Visual Word (BOVW, Bag of Visual Word) model, which mainly consists of two parts: training and testing. In the training section, we first select the binary roubar invariant extensible key points (BRIK, Binary Robust Invariant Scalable Keyo-points) which are simple to calculate and have certain robustness to the scale change and rotation as the texture features of the text label; then extracting the BRISK feature of the image and carrying out self-growth and self-organizing neural network (SGONG, Self-Growth and Self-Organic Materials network) clustering to obtain a visual dictionary, extracting the BRISK features of the training positive and negative sample images and performing feature quantization on the visual dictionary to obtain the BRISK shape histogram feature, At the same time, the HS color histogram feature is extracted, and the HS color invariant histogram feature is further calculated; then the strong distinguishing feature of the character tag is obtained by fusing the two features; and finally, the ADABOOST classifier is selected as the classification algorithm of the text label, and training the text sign sample set to obtain a text label detector. In the test part, firstly, a maximum stable color region (MSCR) algorithm is utilized to perform initial detection on the text sign in the natural scene to reduce the complexity of directly using the classifier to detect, and then, in the candidate area obtained by the MSCR detection, A text label detector obtained by learning is used for carrying out fine inspection on the character placard to obtain the positioning text label. The test results show that the method can achieve 92% recall rate and 79% accuracy, and the weighted harmonic mean value is 85%. The average per frame detection time is about 0.33s, and has better detection performance than the cascaded ADABOLDA method and the depth learning CNN method. Therefore, in order to further carry out the maintenance task and make fault judgment, the tower detection method in this paper can be directly applied to the automatic detection of the tower in the helicopter power patrol system. Using 678 street view images (including 661 text signs) to test the characters in the natural scenes in this paper, the results show that the detection rate of the method is 76%, 81% and 90%, respectively. The ability to identify a text label is also 58%, 78%, and 89%, respectively, compared to a method of text label detection using the HS color invariance feature, the Scale Invariant Feature Transform (SIFT, Scale Invest Transform) feature, SIFT + HS, BRIK, Fast Retina Key (FREAK, Fast Retina Keypoint), or FREAK + HS feature, respectively. The method has the advantages of more accurate positioning, less false detection, high detection accuracy, less detection time consumption and better detection performance. Therefore, in order to better carry out character segmentation and recognition, the text label detection method can be directly applied to the detection and positioning of text tags in natural scenes.
【學(xué)位授予單位】:西安理工大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:TP391.41;TM755
[Abstract]:Target detection is the basis of research on target tracking and recognition, and its methods are closely related to tracking, identification and processing precision. in real life, due to the influence of different illumination or climate, partial shielding, shading, shooting angle, target scale change and rotation in the process of image or video acquisition, the appearance morphological characteristics of the target are greatly changed, These have brought enormous challenges to target detection. Aiming at the main problems of target detection in two cases of stationary scene and dynamic background, this paper mainly studies the target detection method based on learning. extracting power line in high-voltage transmission line, monitoring transmission line and monitoring tower deformation and fault point are main tasks in automatic power inspection of electric power. The detection of the power tower is carried out in the helicopter power inspection, which plays an important role in the determination of tower type, iron tower deformation and fault diagnosis. The paper puts forward a two-stage tower detection method based on learning, which mainly comprises the following steps: firstly, a training sample set is formed by cutting a rich tower and a non-tower picture in a transmission line inspection video shot by a helicopter/ unmanned aerial vehicle and carrying out positive and negative sample labeling; secondly, extracting a local two-valued mode (LBP, Local Binary Pattern) feature on the training sample set, sending the feature set and the annotation information into an adaptive enhancement (ADABOSS, Adaptive Boosting) model for training, and learning to generate a classifier classfier 1; then, designing a depth learning CNN model structure, the training sample set and the dimension information are fed into a convolution neural network (CNN) model under the structure of a fast feature embedded convolution structure (CAFFE, Convolant Architecture for Fast Transfer Embedded), and the classifier classfier 2 is generated; and finally, under the multi-scale, the test video image block in the sliding window is sent to the training-generated classifier classfier 1, the tower candidate area is obtained according to the output of the classfier 1, the tower candidate area is sent to the trained classifier classfier 2, and whether the tower candidate area is a tower target is judged, and the accurate positioning of the tower is obtained according to the output of the classfier 2. The retrieval service robot of the text information in the natural scene has a wide application prospect in the fields of assistant navigation, vision positioning and the like of the blind. Due to the influence of the position, direction, font, color and size of the characters appearing in the natural scene, the detection and positioning of the characters in the natural scene is a very challenging problem. This paper presents a high-efficiency detection method based on BOVW, Bag of Visual Word (BOVW, Bag of Visual Word) model, which mainly consists of two parts: training and testing. In the training section, we first select the binary roubar invariant extensible key points (BRIK, Binary Robust Invariant Scalable Keyo-points) which are simple to calculate and have certain robustness to the scale change and rotation as the texture features of the text label; then extracting the BRISK feature of the image and carrying out self-growth and self-organizing neural network (SGONG, Self-Growth and Self-Organic Materials network) clustering to obtain a visual dictionary, extracting the BRISK features of the training positive and negative sample images and performing feature quantization on the visual dictionary to obtain the BRISK shape histogram feature, At the same time, the HS color histogram feature is extracted, and the HS color invariant histogram feature is further calculated; then the strong distinguishing feature of the character tag is obtained by fusing the two features; and finally, the ADABOOST classifier is selected as the classification algorithm of the text label, and training the text sign sample set to obtain a text label detector. In the test part, firstly, a maximum stable color region (MSCR) algorithm is utilized to perform initial detection on the text sign in the natural scene to reduce the complexity of directly using the classifier to detect, and then, in the candidate area obtained by the MSCR detection, A text label detector obtained by learning is used for carrying out fine inspection on the character placard to obtain the positioning text label. The test results show that the method can achieve 92% recall rate and 79% accuracy, and the weighted harmonic mean value is 85%. The average per frame detection time is about 0.33s, and has better detection performance than the cascaded ADABOLDA method and the depth learning CNN method. Therefore, in order to further carry out the maintenance task and make fault judgment, the tower detection method in this paper can be directly applied to the automatic detection of the tower in the helicopter power patrol system. Using 678 street view images (including 661 text signs) to test the characters in the natural scenes in this paper, the results show that the detection rate of the method is 76%, 81% and 90%, respectively. The ability to identify a text label is also 58%, 78%, and 89%, respectively, compared to a method of text label detection using the HS color invariance feature, the Scale Invariant Feature Transform (SIFT, Scale Invest Transform) feature, SIFT + HS, BRIK, Fast Retina Key (FREAK, Fast Retina Keypoint), or FREAK + HS feature, respectively. The method has the advantages of more accurate positioning, less false detection, high detection accuracy, less detection time consumption and better detection performance. Therefore, in order to better carry out character segmentation and recognition, the text label detection method can be directly applied to the detection and positioning of text tags in natural scenes.
【學(xué)位授予單位】:西安理工大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:TP391.41;TM755
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 李楠;;基于深度學(xué)習(xí)框架Caffe的路面裂縫識別研究[J];工程技術(shù)研究;2017年03期
2 劉亮;王平;孫亮;;基于區(qū)域灰度變化的自適應(yīng)FAST角點檢測算法[J];微電子學(xué)與計算機;2017年03期
3 何希平;張瓊?cè)A;劉波;;基于HOG的目標(biāo)分類特征深度學(xué)習(xí)模型[J];計算機工程;2016年12期
4 趙棟杰;;改進(jìn)的LBP算子和稀疏表達(dá)分類在人臉表情識別上的應(yīng)用[J];電子設(shè)計工程;2016年20期
5 張元軍;李清華;;電力鐵塔運行狀態(tài)智能在線監(jiān)測的研究及應(yīng)用[J];科技視界;2016年22期
6 鐘_,
本文編號:2297147
本文鏈接:http://sikaile.net/kejilunwen/dianlidianqilunwen/2297147.html
最近更新
教材專著