基于鏡頭及場景上下文的短視頻標注方法研究

發(fā)布時間：2018-08-09 14:33

【摘要】：隨著數字媒體技術、通信技術及網絡技術的飛速發(fā)展,以視頻為代表的數字媒體信息的數量急劇膨脹。短視頻是一類內容龐雜的視頻數據,如何在海量短視頻數據中尋找到有效信息一直是用戶關注的問題,由此產生了視頻索引、視頻檢索等相關應用。視頻標注就是解決這些應用的核心環(huán)節(jié)。目前視頻標注已成為數字媒體應用和計算機視覺領域中的一個熱點研究課題。從語義的角度,視頻可以分割成若干種語義單位。不同的語義單位具有不同的語義內涵,在每個語義層次上均可實現語義標注。本文在對視頻結構進行深入分析的基礎上,對視頻片段進行分割,形成不同的語義單位,并在鏡頭語義層、場景語義層對短視頻進行標注。本文的研究成果與創(chuàng)新點主要有:(1)結合視頻幀的全局特征和局部特征,提出了一種新的結合視頻動態(tài)紋理和SIFT特征的鏡頭邊緣檢測方法。該方法首先對相鄰兩幀圖像進行均勻分塊,在RGB顏色空間下,計算幀中每個圖像塊的平均梯度。由所有圖像塊的平均梯度形成視頻動態(tài)紋理,比較相鄰幀圖像的動態(tài)紋理,并結合相鄰幀SIFT特征的匹配情況來判斷鏡頭的變化。該算法對不同類型的視頻數據進行鏡頭邊緣檢測,均能取得較高的檢測準確率。(2)提出一種基于鏡頭事件的視頻語義標注模型。在分析視頻結構的基礎上,提取鏡頭中的運動目標和鏡頭關鍵幀的背景顏色特征來表達一個鏡頭的事件,進一步延伸到場景事件的表達,最終由所有事件的集合來作為視頻片段的主題。該模型以結合時序上下文的鏡頭運動對象和環(huán)境背景組成的事件組作為標注結果。該標注模型較好地代表了鏡頭的語義內涵,提高了視頻語義表達的準確度。(3)提出一種基于半監(jiān)督聚類的視頻標注新方法。以鏡頭事件為單位,用事件組來標注視頻。為了降低視頻標注對已標注樣本的依賴,利用半監(jiān)督學習思想構造半監(jiān)督K-means聚類算法,優(yōu)化目標函數,使得最終的聚類結果既體現類間的低耦合及類內的高聚合,又體現類內局部的數據分布密度。該算法實現了諸如視頻等多屬性異構數據的聚類,提高了視頻標注的準確度。(4)提出一種基于上下文的多核學習視頻分類新方法。以傳統(tǒng)的詞袋模型為基礎,根據相鄰鏡頭關鍵幀之間具有相關性的特點提出了一種用于視頻場景分類的模型。首先將視頻片段進行分割,提取關鍵幀,對關鍵幀圖像歸一化。接著將關鍵幀圖像作為圖像塊以時序關系合成新圖像,提取新圖像的SIFT特征及HSV顏色特征,并將圖像的SIFT特征及HSV顏色特征數據映射到希爾伯特空間。通過多核學習,選取合適的核函數組對每個圖像進行訓練,最終得到分類模型,得到較好的分類效果。上述研究成果可廣泛應用于視頻分類、視頻索引、視頻檢索、視頻內容理解、視頻數據管理等諸多領域,具有重要的理論意義和較高的應用價值。
[Abstract]:With the rapid development of digital media technology, communication technology and network technology, the number of digital media information represented by video is expanding rapidly. Short video is a kind of video data with a lot of content. How to find effective information in a large amount of short video data has always been a problem of concern to users, resulting in video indexing, video retrieval and other related applications. Video tagging is the core of these applications. At present, video tagging has become a hot research topic in the field of digital media applications and computer vision. From the semantic point of view, video can be divided into several semantic units. Different semantic units have different semantic connotations and can realize semantic annotation at each semantic level. Based on the in-depth analysis of the video structure, the video segment is segmented to form different semantic units, and the short video is annotated in the shot semantic layer and scene semantic layer. The main achievements and innovations of this paper are as follows: (1) combining the global and local features of video frames, a novel shot edge detection method combining video dynamic texture and SIFT features is proposed. In this method, two adjacent frames are partitioned evenly, and the average gradient of each image block in the frame is calculated in RGB color space. The video dynamic texture is formed by the average gradient of all image blocks. The dynamic texture of adjacent frames is compared and the shot change is judged by matching the SIFT features of adjacent frames. This algorithm can detect the shot edge of different types of video data with high accuracy. (2) A video semantic annotation model based on shot events is proposed. Based on the analysis of the video structure, the background color features of the moving object and the key frame of the shot are extracted to express the event of a shot, which extends to the expression of the scene event. Ultimately, the collection of all events is the subject of a video clip. The model takes the event group composed of the shot moving object and the environment background as the annotation result. The annotation model represents the semantic connotation of shot and improves the accuracy of video semantic expression. (3) A new method of video annotation based on semi-supervised clustering is proposed. In the unit of shot event, the video is annotated with event group. In order to reduce the dependence of video tagging on labeled samples, semi-supervised K-means clustering algorithm is constructed by semi-supervised learning idea, and the objective function is optimized, so that the final clustering results can not only reflect the low coupling between classes and high aggregation within classes. It also reflects the local data distribution density in the class. This algorithm implements the clustering of multi-attribute heterogeneous data such as video, and improves the accuracy of video tagging. (4) A new context-based multi-core learning video classification method is proposed. Based on the traditional word bag model, a video scene classification model is proposed according to the correlation between the adjacent shot key frames. Firstly, the video segment is segmented, the key frame is extracted, and the key frame image is normalized. Then the key frame image is used as the image block to synthesize the new image with temporal relation, and the SIFT feature and HSV color feature of the new image are extracted, and the SIFT feature and HSV color feature data of the image are mapped to Hilbert space. Through multi-kernel learning, the appropriate kernel function groups are selected to train each image, and finally the classification model is obtained, and a better classification effect is obtained. These research results can be widely used in many fields such as video classification, video indexing, video retrieval, video content understanding, video data management and so on, which have important theoretical significance and high application value.
【學位授予單位】：上海大學
【學位級別】：博士
【學位授予年份】：2016
【分類號】：TP391.41

【相似文獻】

相關期刊論文前10條

1 陸懿;陳光夢;畢宏杰;董棟;;改進的自然動態(tài)紋理綜合算法[J];計算機工程與設計;2008年14期

2 姚偉光;王贏;許存祿;;將局部二進制模式應用于動態(tài)紋理識別的新方法[J];微計算機信息;2010年09期

3 陳昌紅;趙恒;胡海虹;梁繼民;;基于改進動態(tài)紋理模型的人體運動分析[J];模式識別與人工智能;2010年02期

4 陳青;朱俊宇;唐朝暉;劉金平;桂衛(wèi)華;;動態(tài)紋理建模在硫浮選工況的識別分析[J];計算機與應用化學;2013年10期

5 邵婧;王冠香;郭蔚;;基于視頻動態(tài)紋理的火災檢測[J];中國圖象圖形學報;2013年06期

6 陳紅倩;陳誼;曹健;劉鸝;;基于動態(tài)紋理技術的實時森林繪制[J];計算機仿真;2012年06期

7 何莎;費樹岷;;動態(tài)紋理背景的建模[J];計算機應用;2009年S2期

8 鄒運蘭;王仁芳;;基于多重紋理和動態(tài)紋理技術的實時水面模擬[J];浙江萬里學院學報;2010年06期

9 陳紅倩;李鳳霞;黃天羽;戰(zhàn)守義;;一種基于動態(tài)紋理的運動場景可視化方法[J];北京理工大學學報;2009年06期

10 于鑫;韓勇;陳戈;;基于動態(tài)紋理和粒子系統(tǒng)的火焰效果模擬[J];信息與電腦(理論版);2009年11期

相關會議論文前1條

1 陸懿;陳光夢;;一種改進的彩色動態(tài)紋理綜合算法[A];中國儀器儀表學會第九屆青年學術會議論文集[C];2007年

相關博士學位論文前3條

1 王勇;基于混沌特征向量的動態(tài)紋理識別[D];上海交通大學;2014年

2 彭太樂;基于鏡頭及場景上下文的短視頻標注方法研究[D];上海大學;2016年

3 周丙寅;張量分解及其在動態(tài)紋理中的應用[D];河北師范大學;2012年

相關碩士學位論文前10條

1 陸懿;一種改進的基于非線性模型的動態(tài)紋理識別算法[D];復旦大學;2008年

2 徐磊磊;動態(tài)紋理性質及其模擬算法研究[D];華中科技大學;2007年

3 姚偉光;基于局部二進制運動模式的動態(tài)紋理描述新方法[D];蘭州大學;2009年

4 周文玲;增強現實中動態(tài)紋理的識別與重建技術研究[D];華東師范大學;2011年

5 劉霞;自然景物模擬的動態(tài)紋理研究與實現[D];國防科學技術大學;2005年

6 丁悅;基于數據驅動的馬爾柯夫鏈蒙特卡洛模型的動態(tài)紋理分析[D];南京理工大學;2007年

7 曹壽剛;基于李群論和動態(tài)紋理的視頻分類技術研究[D];華中科技大學;2013年

8 高平;基于擴展統(tǒng)計地形特征的動態(tài)紋理識別研究[D];蘭州大學;2009年

9 施濵;基于時空方向能量的動態(tài)紋理研究[D];上海交通大學;2012年

10 張茜;基于動態(tài)紋理的流水效果合成技術研究[D];山東大學;2006年

，

本文編號：2174386

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://sikaile.net/shoufeilunwen/xxkjbs/2174386.html

上一篇：氧化鋅微米線酒精氣體傳感器研究
下一篇：信息中心網絡的移動性支持機制研究

論文發(fā)表

·知網|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于鏡頭及場景上下文的短視頻標注方法研究