深度學習驅動的場景分析和語義目標解析
本文關鍵詞: 深度學習 卷積神經(jīng)網(wǎng)絡 深度估計 光流估計 行人細粒度分析 全變分模型 多尺度相關性學習 出處:《浙江大學》2017年碩士論文 論文類型:學位論文
【摘要】:語義目標解析和場景分析是計算機視覺中重要的研究方向,其主要目的是對圖像和視頻中的目標和場景進行分析、理解,在視頻監(jiān)控、自動駕駛、智能交通等方面均有廣泛的應用。語義目標解析涉及對行人、車輛等目標的檢測、識別及分析過程。其中行人細粒度分析是很多計算機視覺應用的基礎,其目的是將行人圖像分割成語義部件,并識別其屬性。場景分析主要包括對場景的深度估計、運動分析以及結構分析等。場景的深度估計是指從圖像中得到場景的深度信息,有助于恢復場景的三維結構。場景的運動分析則主要是指從連續(xù)視頻幀中得到光流信息,被用于運動目標的行為識別和異常事件的檢測分類。因此,有效的行人細粒度分析、圖像深度估計和光流估計算法具有重要的現(xiàn)實意義,本文也主要關注這三個任務。近年來,深度學習已在目標檢測、人臉識別、場景標注等計算機視覺任務上取得突破,設計以任務為導向的網(wǎng)絡模型受到學術界和工業(yè)界越來越多的關注。本文將針對行人細粒度分析、單張圖像深度估計和光流估計這三個任務,分別提出不同的基于深度學習的模型。具體如下:1.對于單張圖像深度估計任務,本文首先回顧了已有的相關方法,然后針對目前基于深度學習的深度估計模型在建?臻g上下文關系上存在的不足,本文分別提出基于數(shù)據(jù)驅動的上下文特征學習模型和基于全變分模型的損失函數(shù)模型。前者通過數(shù)據(jù)學習和像素位置相關的上下文關系權值將鄰域特征融合到深度值預測,而后者則能夠有效地壓制噪聲并在保留邊緣的同時使結果更加的平滑。最后本文將這兩種模型融合,得到更有效的方法。2.在光流估計任務中,相對于傳統(tǒng)的光流估計方法,基于深度學習的方法具有效率高、易擴展的優(yōu)點。然而目前基于深度學習的方法并不多,同時已有的深度模型在大位移光流預測問題上存在不足。本文將提出一種基于多尺度的相關性學習的深度卷積網(wǎng)絡結構,能夠有效地處理大位移情況。在一些大位移光流數(shù)據(jù)集上,相對于基準算法,本文提出的框架的表現(xiàn)有很明顯的改善。另外,由于預測的結果含有較多的噪聲和較大的誤差,本文提出將遞歸神經(jīng)網(wǎng)絡與卷積神經(jīng)網(wǎng)絡相結合對預測的結果進一步修正并得到更加精細的結果。3.對于行人細粒度分析任務,本文針對監(jiān)控視頻下的行人精細化識別競賽,提出兩種基于Faster R-CNN的模型框架,一種是在同一個網(wǎng)絡模型中聯(lián)合學習部件檢測和部件屬性分類,另一種則是先基于Faster R-CNN框架檢測出部件位置,然后再訓練另一個網(wǎng)絡對部件進行屬性分類。實驗表明先檢測再分類的分階段方式能夠減少類之間的干擾進而減少誤分類現(xiàn)象。
[Abstract]:Semantic object parsing and scene analysis are important research directions in computer vision. Their main purpose is to analyze the objects and scenes in images and videos, to understand, to monitor video, to drive automatically. Semantic target resolution involves the detection, identification and analysis of objects such as pedestrians and vehicles, in which fine-grained pedestrian analysis is the basis of many computer vision applications. Scene analysis includes depth estimation of scene, motion analysis and structure analysis. Depth estimation of scene refers to the depth information of scene. The motion analysis of the scene mainly refers to the optical flow information obtained from the continuous video frame, which is used to identify the behavior of moving targets and detect and classify abnormal events. Image depth estimation and optical flow estimation algorithms have important practical significance. This paper also focuses on these three tasks. In recent years, depth learning has made a breakthrough in computer vision tasks, such as target detection, face recognition, scene tagging and so on. The design of task-oriented network model has attracted more and more attention from academia and industry. This paper will focus on the three tasks of pedestrian fine-grained analysis, single image depth estimation and optical flow estimation. Different models based on depth learning are proposed respectively. The following are as follows: 1. For the task of estimating the depth of a single image, this paper first reviews the existing methods. Then aiming at the shortcomings of depth estimation model based on depth learning in modeling spatial context relationship, In this paper, a data-driven contextual feature learning model and a loss function model based on a total variation model are proposed, respectively, in which neighborhood features are fused to depth prediction through data learning and contextual weights related to pixel positions. The latter can effectively suppress noise and make the results smoother while preserving edges. Finally, the two models are fused to obtain a more effective method .2. compared with traditional optical flow estimation methods, Methods based on depth learning have the advantages of high efficiency and easy to be extended. However, there are few methods based on depth learning at present. At the same time, the existing depth models are deficient in the problem of large displacement optical flow prediction. In this paper, a kind of depth convolution network structure based on multi-scale correlation learning is proposed. In some large displacement optical flow data sets, the performance of the frame proposed in this paper is obviously improved compared with the reference algorithm. In addition, the prediction results contain more noise and larger errors. In this paper, the combination of recurrent neural network and convolutional neural network is proposed to further revise the prediction results and obtain more precise results .3. for the pedestrian fine grained analysis task, this paper aims at the pedestrian fine recognition competition under the surveillance video. Two model frameworks based on Faster R-CNN are proposed. One is to combine learning component detection and component attribute classification in the same network model, the other is to detect the location of components based on Faster R-CNN framework. Then another network is trained to classify the components. The experiment shows that the method of detecting and reclassifying can reduce the interference between classes and reduce the phenomenon of misclassification.
【學位授予單位】:浙江大學
【學位級別】:碩士
【學位授予年份】:2017
【分類號】:TP391.41;TP18
【相似文獻】
相關期刊論文 前10條
1 邵義元;;一類模型學習樣本的預處理[J];鄂州大學學報;2012年05期
2 譚建輝;;徑向基函數(shù)神經(jīng)網(wǎng)絡的再學習算法及其應用[J];微電子學與計算機;2006年05期
3 薛志東;王燕;邱德紅;;逆C均值學習樣本篩選方法[J];微計算機信息;2007年27期
4 張映偉,于川,邢鎮(zhèn)容;學習樣本存在分類錯誤時的判據(jù)穩(wěn)定性問題[J];計算機仿真;2003年06期
5 岑健;秦勇;邢鎮(zhèn)容;;學習樣本存在分類錯誤時的決策判據(jù)分析[J];茂名學院學報;2006年04期
6 黎移新;;多層前饋神經(jīng)網(wǎng)絡幾種算法的樣本順序敏感性[J];食品與機械;2010年04期
7 胡瑞敏,李德仁,沈未名,,吳捷,姚天任;連續(xù)函數(shù)映射網(wǎng)絡樣本重組的研究[J];計算機學報;1996年09期
8 李遠,劉悅,王媛,吳耿鋒;地震預報專家系統(tǒng)中學習樣本的構建[J];計算機工程與應用;2005年04期
9 蔣明 ,柏文陽 ,肖建華 ,符江東;調(diào)和的復合BP網(wǎng)絡及學習算法[J];小型微型計算機系統(tǒng);2003年03期
10 高雋;胡勇;胡良梅;;關于AM學習樣本選擇的實驗研究[J];模式識別與人工智能;2002年03期
相關會議論文 前3條
1 田建艷;武增懿;韓肖清;;徑向基函數(shù)神經(jīng)網(wǎng)絡學習算法的改進[A];2009年中國智能自動化會議論文集(第七分冊)[南京理工大學學報(增刊)][C];2009年
2 周斌;;內(nèi)燃機排放神經(jīng)網(wǎng)絡模型學習樣本的確定[A];加入WTO和中國科技與可持續(xù)發(fā)展——挑戰(zhàn)與機遇、責任和對策(上冊)[C];2002年
3 文博武;胡壽松;;基于再勵學習的殲擊機安全著陸橫側向協(xié)調(diào)控制[A];2005全國自動化新技術學術交流會論文集(二)[C];2005年
相關碩士學位論文 前2條
1 趙杉杉;深度學習驅動的場景分析和語義目標解析[D];浙江大學;2017年
2 惠寅華;基于同倫的學習算法研究[D];蘇州大學;2013年
本文編號:1494140
本文鏈接:http://sikaile.net/kejilunwen/zidonghuakongzhilunwen/1494140.html