基于深度學(xué)習(xí)的暴力檢測(cè)及人臉識(shí)別方法研究
本文選題:公共安全 切入點(diǎn):智能視頻分析 出處:《中國(guó)科學(xué)技術(shù)大學(xué)》2017年博士論文 論文類型:學(xué)位論文
【摘要】:隨著"平安城市"建設(shè)的不斷推進(jìn),公共安全逐漸成為人們關(guān)注的熱點(diǎn)問(wèn)題,視頻監(jiān)控技術(shù)也隨之得到了越來(lái)越廣泛的應(yīng)用,傳統(tǒng)的視頻監(jiān)控系統(tǒng)主要提供采集存儲(chǔ)功能,這遠(yuǎn)遠(yuǎn)無(wú)法滿足人們對(duì)其智能化的需求。要實(shí)現(xiàn)智能化的視頻監(jiān)控系統(tǒng),以下幾個(gè)關(guān)鍵問(wèn)題亟需解決:(1)如何快速發(fā)現(xiàn)監(jiān)控視頻中的異常行為,及時(shí)給出警報(bào),并最大限度地減少誤報(bào)和漏報(bào)現(xiàn)象;(2)如何在多種不利因素下(如單樣本,低分辨率)對(duì)可疑目標(biāo)進(jìn)行準(zhǔn)確的識(shí)別分析;(3)在海量數(shù)據(jù)的情況下,如何確保視頻分析系統(tǒng)的實(shí)時(shí)性及準(zhǔn)確性。近年來(lái),深度學(xué)習(xí)在機(jī)器視覺(jué)、語(yǔ)音識(shí)別和自然語(yǔ)言處理等多個(gè)領(lǐng)域都取得了優(yōu)異的成績(jī),這也為智能視頻分析技術(shù)的發(fā)展帶來(lái)了新的契機(jī)。因此,本文基于深度學(xué)習(xí)的方法對(duì)上述相關(guān)問(wèn)題展開研究,主要研究工作與創(chuàng)新如下:1.針對(duì)監(jiān)控視頻中的異常行為尤其是暴力打斗行為難以準(zhǔn)確快速發(fā)現(xiàn)的問(wèn)題,提出了一種基于三維卷積深度網(wǎng)絡(luò)的暴力檢測(cè)方法。該方法利用大量帶標(biāo)簽的視頻數(shù)據(jù)進(jìn)行有監(jiān)督的學(xué)習(xí),通過(guò)將傳統(tǒng)二維卷積核擴(kuò)展為三維來(lái)提取視頻中的運(yùn)動(dòng)信息,然后綜合利用視頻的空間信息及運(yùn)動(dòng)信息來(lái)構(gòu)建深度神經(jīng)網(wǎng)絡(luò)模型,從而實(shí)現(xiàn)對(duì)監(jiān)控視頻中暴力打斗的檢測(cè)。由于深層模型端到端學(xué)習(xí)的特性,所以不需要設(shè)計(jì)復(fù)雜的手工特征來(lái)描述運(yùn)動(dòng)信息,從而降低了任務(wù)的復(fù)雜度。實(shí)驗(yàn)結(jié)果表明,本文提出的方法在單一場(chǎng)景以及人群密集環(huán)境下都可以對(duì)暴力打斗行為進(jìn)行準(zhǔn)確識(shí)別。2.針對(duì)人臉圖像在單訓(xùn)練樣本下難以被準(zhǔn)確識(shí)別的問(wèn)題,提出了一種基于核主成分分析網(wǎng)絡(luò)(Kerne1 Principle Component Analysis Networks,KPCANet)模型的二階段投票人臉識(shí)別方法。該方法在不使用額外樣本數(shù)據(jù)的情況下,利用非監(jiān)督深層模型KPCANet對(duì)分塊后的人臉圖像進(jìn)行訓(xùn)練并利用KPCA學(xué)習(xí)得到的濾波器進(jìn)行特征提取,從而保證了提取的特征對(duì)光照及遮擋的魯棒性,同時(shí)也消除了人臉局部形變對(duì)識(shí)別率的影響。本文通過(guò)投票的方法融合每一個(gè)分塊的預(yù)測(cè)值來(lái)得到最后的識(shí)別結(jié)果,對(duì)于單次投票結(jié)果不唯一的情況,本文采取了二階段的投票方法,通過(guò)擴(kuò)大每一塊的預(yù)測(cè)候選集,并對(duì)不同的區(qū)域賦予不同的權(quán)值來(lái)得出最后的結(jié)果,從而進(jìn)一步提升了識(shí)別的準(zhǔn)確率。實(shí)驗(yàn)結(jié)果表明,該方法在四個(gè)公開人臉數(shù)據(jù)集上都取得了優(yōu)異的表現(xiàn),算法準(zhǔn)確率優(yōu)于使用了額外數(shù)據(jù)集的通用方法,尤其是在非限制人臉數(shù)據(jù)集LFW-a上,本文提出的方法比SVDL和LGR方法準(zhǔn)確率提升了約l5%。3.針對(duì)監(jiān)控視頻中人臉圖像由于分辨率過(guò)低而無(wú)法準(zhǔn)確識(shí)別的問(wèn)題,提出了一種基于卷積神經(jīng)網(wǎng)絡(luò)模型的低分辨率人臉識(shí)別的解決方案。該方案提出了兩種模型:多尺度輸入的卷積神經(jīng)網(wǎng)絡(luò)(Convolutional Neural Networks,CNN)模型和基于空間金字塔池化(Spatial Pyramid Pooling,SPP)的CNN模型。(1)多尺度輸入的CNN模型是對(duì)現(xiàn)有的"二步法"進(jìn)行的改進(jìn),利用簡(jiǎn)單雙三次插值方法對(duì)低分辨率圖像進(jìn)行上采樣,再將上采樣得到的圖像與高分辨率圖像混合作為模型訓(xùn)練樣本,讓CNN模型學(xué)習(xí)高低分辨率圖像共同的特征空間,然后通過(guò)余弦距離來(lái)衡量特征相似度,最后給出識(shí)別結(jié)果。在CMU PIE和Extended Yale B數(shù)據(jù)集上的實(shí)驗(yàn)表明,模型的準(zhǔn)確率要優(yōu)于其他對(duì)比方法,相對(duì)于目前識(shí)別率最高的CMDA_BGE算法,準(zhǔn)確率獲得了 2.5%~9.9%的顯著提升。(2)基于SPP的CNN模型,屬于改進(jìn)的"跨空間法",通過(guò)在CNN模型中加入空間金字塔池化層,使模型對(duì)于不同尺寸的輸入圖像都可以輸出恒定維度的特征向量,最后通過(guò)比較樣本庫(kù)與測(cè)試圖像的特征相似度就可以得到最后的識(shí)別結(jié)果。實(shí)驗(yàn)表明,相比多尺度輸入的CNN模型,該方法在保持較高準(zhǔn)確率的同時(shí),省去了上采樣的操作,簡(jiǎn)化了圖像預(yù)處理的過(guò)程,同時(shí)也減少了傳統(tǒng)"跨空間法"中需要學(xué)習(xí)的呋射函數(shù)的個(gè)數(shù)。4.針對(duì)監(jiān)控系統(tǒng)中數(shù)據(jù)流傳輸帶來(lái)的帶寬占用問(wèn)題以及對(duì)海量數(shù)據(jù)的快速準(zhǔn)確分析需求,提出了一種基于"海云協(xié)同"的深度學(xué)習(xí)模型框架。海端系統(tǒng)利用深度學(xué)習(xí)的方法對(duì)本地?cái)?shù)據(jù)進(jìn)行訓(xùn)練得到局部模型,通過(guò)局部模型可以對(duì)數(shù)據(jù)進(jìn)行快速檢測(cè),進(jìn)而給出實(shí)時(shí)響應(yīng)。海端系統(tǒng)通過(guò)上傳局部模型和少量數(shù)據(jù)的方式協(xié)同云端訓(xùn)練,云端系統(tǒng)利用這些局部模型和數(shù)據(jù)構(gòu)建更加復(fù)雜的深度模.型并進(jìn)行調(diào)優(yōu),得到性能更好的全局模型。在MNIST、Cifar-10和LFW數(shù)據(jù)集上的實(shí)驗(yàn)表明,"海云協(xié)同"的方法有效地減少了數(shù)據(jù)傳輸?shù)膸捪?同時(shí)也保證了海端的快速性和云端的精確性。上述方法已部分應(yīng)用于中科院先導(dǎo)"海量網(wǎng)絡(luò)數(shù)據(jù)流海云協(xié)同實(shí)時(shí)處理系統(tǒng)(XDA060112030)" 課題之中。
[Abstract]:With the "green city" construction, public security has gradually become the focus of people's attention, video surveillance technology has been more and more widely used, the traditional video monitoring system mainly provides the data storage function, which cannot meet the demand for intelligent. To realize intelligent video surveillance system to solve the key problems, the following: (1) how to quickly find the abnormal behaviors in video surveillance, timely given warning, and to minimize the false positives and false negatives; (2) how many unfavorable factors in (such as single sample, low resolution) and accurate analysis and identification of suspicious targets; (3) in the case of massive data, how to ensure the video analysis system's real-time and accuracy. In recent years, deep learning in many fields of machine vision, speech recognition and Natural Language Processing have achieved excellent Different grades, this is the development of intelligent video analysis technology has brought new opportunities. Therefore, this method based on deep learning of the related problems are studied, the main work and innovation are as follows: 1. for the study of abnormal behaviors in video surveillance especially violent fighting is difficult to accurately and quickly find the problem, put forward a violent detection method of 3D convolution based on network depth. The method using a large number of video data with labels for supervised learning, the traditional two-dimensional convolution kernel is extended to 3D to extract moving letter in the video information, and comprehensive utilization of spatial information and motion information to construct the depth of the neural network model, in order to achieve detection of violence in video surveillance fighting. Because of the characteristics of deep model of end-to-end learning, so do not need to describe the characteristics of manual design of complex motion information, In order to reduce the complexity of the task. The experimental results show that the proposed method can accurately identify the violent fighting behavior of.2. for face images are difficult to accurately identify the problem in the case of single training sample in a single scene and crowded environment, proposed a network based on kernel principal component analysis (Kerne1 Principle Component Analysis Networks, KPCANet) two phase of voting in face recognition model. The method without using additional sample data, using unsupervised deep KPCANet model block after the face images in training and learning by using KPCA filter in feature extraction, so as to ensure the robustness of the feature extraction of light illumination and occlusion at the same time, but also eliminates the influence of local deformation of face recognition rate. The voting fusion method of predicting each block value to get Finally, the identification results for a single vote not only, the voting method of the two stage, by expanding the candidate prediction of each block set, and in different regions give different weights to the results, so as to further enhance the recognition accuracy. The experimental results show that this method have achieved excellent performance in four public face datasets, the general algorithm accuracy is superior to the use of additional data sets, especially in the non restricted LFW-a face data sets, the method proposed in this paper than the SVDL and LGR methods to enhance the accuracy of about l5%.3. for face in video surveillance images because of low resolution can not accurately identify the problem, proposes a solution of low resolution face recognition model based on convolutional neural network. The proposed two models: input the multiscale convolution of God The network (Convolutional Neural Networks, CNN) model and the space of Pyramid (Spatial Pyramid pool based on Pooling, SPP) CNN model. (1) multiscale input CNN model is improved on the existing "two steps", to the sampling of low resolution images using a simple three biquadratic interpolation method, and then the sampled image and high resolution image mixed as samples, let CNN model learning feature space of high and low resolution image together, and then through the cosine distance to measure similarity, finally gives the recognition results show that in CMU and Extended. PIE Yale B data sets were better than other methods to compare accuracy of the model the highest recognition rate compared with the current CMDA_BGE algorithm, the accuracy rate has significantly increased from 2.5% to 9.9%. (2) SPP based on the CNN model, to improve the "space", through the CNN mode Join the layer space of Pyramid basin type, so that the model can output constant dimension feature vector for input images of different sizes, the similarity comparison sample library and test image can get the final recognition results. Experimental results show that compared with the multiscale input CNN model, the method maintains the high accuracy at the same time, saves the sampling operation, simplifies the process of image processing, but also reduce the traditional "cross space method" to study the function of the number of.4. furosemide injection according to the data monitoring system of transmission to bandwidth and huge amounts of data fast and accurate analysis of demand, put forward a based on "Haiyun synergy" deep learning model framework. The sea end system using deep learning to train the local data obtained by local model, local model of data For rapid detection, real-time response is given. The sea end system collaborative cloud training by uploading a local model and a small amount of data, to build more complex mode depth of these local models and data using cloud system. Type and tuning, get better performance of the global model. In MNIST, Cifar-10 and LFW show that the experiments on the data sets, "method of Haiyun synergy" can effectively reduce the data transmission bandwidth consumption, but also to ensure the accuracy and rapidity of cloud sea end. These methods have been applied to some CAS pilot "massive network data flow collaborative Haiyun real-time processing system (XDA060112030) project.
【學(xué)位授予單位】:中國(guó)科學(xué)技術(shù)大學(xué)
【學(xué)位級(jí)別】:博士
【學(xué)位授予年份】:2017
【分類號(hào)】:TP391.41
【相似文獻(xiàn)】
相關(guān)期刊論文 前10條
1 周旭東;陳曉紅;陳松燦;;半配對(duì)半監(jiān)督場(chǎng)景下的低分辨率人臉識(shí)別[J];計(jì)算機(jī)研究與發(fā)展;2012年11期
2 Thomas Gerbeaux;黃毅翔;;低分辨率“蘭博” 產(chǎn)品設(shè)計(jì)新法——低分辨率3D掃描[J];汽車生活;2011年01期
3 李士錦;低分辨率氣象衛(wèi)星云圖接收系統(tǒng)[J];無(wú)線電通信技術(shù);1993年02期
4 化春鍵;陳瑩;;低分辨率視覺(jué)條件下二維工件的精確測(cè)量[J];傳感器與微系統(tǒng);2010年08期
5 賴智銘;;適用于低分辨率的逆透視映射算法[J];計(jì)算機(jī)工程與設(shè)計(jì);2013年10期
6 林忠;;低分辨率人臉圖像識(shí)別性能研究[J];南京工程學(xué)院學(xué)報(bào)(自然科學(xué)版);2009年04期
7 戴金波;肖霄;趙宏偉;;基于低分辨率局部二值模式的人臉識(shí)別[J];吉林大學(xué)學(xué)報(bào)(工學(xué)版);2013年02期
8 王莉;陳健生;賀金平;蘇光大;;超低分辨率人臉圖像高速重建方法[J];中國(guó)科學(xué):信息科學(xué);2013年07期
9 徐冬;;基于鑲嵌的低分辨率離屏混合粒子渲染[J];數(shù)字技術(shù)與應(yīng)用;2013年09期
10 王曉云;苑瑋琦;郭金玉;;低分辨率人耳圖像識(shí)別方法研究[J];計(jì)算機(jī)應(yīng)用研究;2010年11期
相關(guān)會(huì)議論文 前1條
1 孫帥;劉偉濤;劉吉英;陳平形;;自適應(yīng)目標(biāo)定位在關(guān)聯(lián)成像中的應(yīng)用[A];第十六屆全國(guó)量子光學(xué)學(xué)術(shù)報(bào)告會(huì)報(bào)告摘要集[C];2014年
相關(guān)重要報(bào)紙文章 前4條
1 本報(bào)記者 張巍巍;身臨其境看電視[N];科技日?qǐng)?bào);2012年
2 廣東 LEON;如何選擇掃描分辨率[N];電腦報(bào);2001年
3 特約作者 海濤;絕地反擊——Palm Tungsten T詳析[N];電腦報(bào);2002年
4 ;無(wú)畏的挑戰(zhàn)者[N];電腦報(bào);2003年
相關(guān)博士學(xué)位論文 前1條
1 丁春輝;基于深度學(xué)習(xí)的暴力檢測(cè)及人臉識(shí)別方法研究[D];中國(guó)科學(xué)技術(shù)大學(xué);2017年
相關(guān)碩士學(xué)位論文 前10條
1 趙志國(guó);基于深度學(xué)習(xí)的低分辯率多恣態(tài)人臉識(shí)別[D];大連理工大學(xué);2015年
2 王潤(rùn)洲;低分辨率目標(biāo)檢測(cè)與跟蹤算法及其應(yīng)用[D];電子科技大學(xué);2015年
3 孔佑磊;低分辨率人臉識(shí)別技術(shù)及其應(yīng)用[D];電子科技大學(xué);2016年
4 黃錦添;移動(dòng)機(jī)器人低分辨率視覺(jué)識(shí)別技術(shù)研究[D];廣東技術(shù)師范學(xué)院;2014年
5 楊威;基于半耦合判決性字典學(xué)習(xí)的極低分辨率人臉識(shí)別算法[D];武漢工程大學(xué);2016年
6 李香;基于多輸出回歸的超低分辨率人臉重構(gòu)研究[D];哈爾濱工程大學(xué);2012年
7 肖哲;基于統(tǒng)一特征空間的低分辨率人臉識(shí)別算法[D];哈爾濱工業(yè)大學(xué);2014年
8 尹秀珍;低分辨率蘋果果實(shí)病害圖像識(shí)別方法研究[D];西北農(nóng)林科技大學(xué);2011年
9 周毅;低分辨率人臉圖像識(shí)別關(guān)鍵技術(shù)研究[D];電子科技大學(xué);2011年
10 楊松林;低分辨率下的行人檢測(cè)[D];武漢理工大學(xué);2012年
,本文編號(hào):1566211
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/1566211.html