基于多模態(tài)信息的新聞視頻內(nèi)容分析技術研究

發(fā)布時間：2018-12-14 11:33

【摘要】： 對視頻數(shù)據(jù)的有效處理、瀏覽、檢索和管理正伴隨著視頻數(shù)據(jù)的快速增長而成為亟待解決的現(xiàn)實問題。視頻內(nèi)容分析技術旨在將非結構化的視頻數(shù)據(jù)結構化,并提取其中的語義內(nèi)容,構建低層特征到高層語義之間的橋梁,最終建立視頻的摘要、索引和檢索等應用系統(tǒng),提供給用戶方便的視頻內(nèi)容獲取方式。本論文以新聞視頻為研究對象,以音頻、字幕、視覺等多模態(tài)信息及其有效融合為研究手段,以模式識別理論中的相關模型為工具,對視頻內(nèi)容分析技術展開了較為深入的研究。主要貢獻包括以下三個方面: (1)提出了一種新穎的基于MPEG壓縮域的主持人鏡頭快速檢測算法。其中,在預處理部分,引入了一種改進的利用壓縮域信息檢測人臉的方法;在鏡頭聚類部分,構造了一個新穎的度量特征量對主持人鏡頭采用系統(tǒng)聚類法進行聚類,并用模糊C均值聚類法解決了聚類過程中自適應閾值確定的問題。該算法在保持較高檢測性能的前提下提高了主持人鏡頭的檢測速度。 (2)提出了一種基于決策樹的鏡頭分類算法,將新聞視頻鏡頭依次分為廣告、“其他”、靜態(tài)圖像、主持人、記者和獨白六類。其中廣告、“其他”和靜態(tài)圖像三類分別利用黑幀、運動、時間以及人臉等特征進行檢測;主持人鏡頭采用聚類方法進行檢測;對于比較難區(qū)分的記者和獨白鏡頭,創(chuàng)新性地將它們的檢測轉換為文本序列標注的問題,并采用條件隨機場進行建模。該算法有效地融合了音頻、人臉以及上下文等多模態(tài)信息,對新聞視頻中重要的鏡頭進行了區(qū)分,并取得了較好的分類結果。 (3)提出了一種融合音頻、字幕以及視覺等多模態(tài)信息的新聞故事單元分割算法。創(chuàng)新性地將字幕變化、音頻類型以及鏡頭類型等高層次內(nèi)容特征聯(lián)系起來共同處理,巧妙地將新聞鏡頭序列轉換成為多個關鍵詞序列,使新聞故事單元分割問題轉換成為文本序列分割的問題。該算法采用條件隨機場進行建模,充分利用了每個序列內(nèi)以及序列之間的上下文信息,得到了較好的分割性能。此外,論文還綜述了視頻內(nèi)容分析技術,構造了一個基于規(guī)則和隱馬爾可夫模型的分層音頻分類方法,實現(xiàn)了一個較完整的新聞視頻中字幕提取框架,最終設計并實現(xiàn)了一個基于COM架構的視頻內(nèi)容分析與摘要系統(tǒng)。綜上所述,本論文分別從音頻、字幕、視覺以及它們之間的有效融合等方面對新聞視頻進行了基于內(nèi)容的分析,實驗結果證明了這些算法的有效性。
[Abstract]:With the rapid growth of video data, the efficient processing, browsing, retrieval and management of video data has become a practical problem to be solved. Video content analysis technology aims at structuring unstructured video data, extracting semantic content from it, constructing a bridge between low-level features and high-level semantics, and finally establishing application systems such as summary, index and retrieval of video. Provides the user convenient video content acquisition method. This thesis takes the news video as the research object, takes the multi-modal information such as audio, subtitle, vision and its effective fusion as the research means, and takes the related model in the pattern recognition theory as the tool. The technology of video content analysis is studied deeply. The main contributions are as follows: (1) A novel fast shot detection algorithm based on MPEG compression domain is proposed. In the part of preprocessing, an improved method of using compressed domain information to detect face is introduced. In the part of shot clustering, a novel measure feature is constructed to cluster the host shot using systematic clustering method, and the problem of adaptive threshold determination in the process of clustering is solved by using fuzzy C-means clustering method. The algorithm improves the detection speed of the host shot on the premise of maintaining high detection performance. (2) A shot classification algorithm based on decision tree is proposed, which divides news video shot into six categories: advertisement, "other", still image, host, reporter and monologue. The advertisement, "other" and static images are detected by black frame, motion, time and face respectively, and the host shot is detected by clustering method. For journalists and monologues which are difficult to distinguish, the problem of translating their detection into text sequence tagging is innovated, and the conditional random field is used to model them. The algorithm effectively integrates audio, face and context information, and distinguishes important shots in news video, and achieves good classification results. (3) an algorithm of news story unit segmentation is proposed, which combines audio, subtitle and visual information. Innovative combination of high-level content features, such as subtitle changes, audio types, and shot types, to skillfully convert news shot sequences into multiple keyword sequences. The problem of news story unit segmentation is transformed into the problem of text sequence segmentation. The proposed algorithm uses conditional random fields to model the model and makes full use of the contextual information within and between each sequence to obtain better segmentation performance. In addition, the paper also summarizes the video content analysis technology, constructs a hierarchical audio classification method based on rule and hidden Markov model, and implements a complete subtitle extraction framework in news video. Finally, a video content analysis and summary system based on COM architecture is designed and implemented. To sum up, this paper analyzes the content of news video from audio, subtitle, vision and their effective fusion, respectively. The experimental results show the effectiveness of these algorithms.
【學位授予單位】：天津大學
【學位級別】：博士
【學位授予年份】：2007
【分類號】：TP391.41

【引證文獻】

相關期刊論文前4條

1 閆建鵬;封化民;劉嘉琦;;一種基于多模態(tài)特征的新聞視頻語義提取框架[J];計算機應用研究;2012年07期

2 劉嘉琦;封化民;閆建鵬;;基于多模態(tài)特征融合的新聞故事單元分割[J];計算機工程;2012年24期

3 張清亮;徐健;;網(wǎng)絡情感詞自動識別方法研究[J];現(xiàn)代圖書情報技術;2011年10期

4 夏玉華;孫建德;亓靖濤;;圖書館學術視頻快速瀏覽技術中的關鍵幀提取[J];現(xiàn)代圖書情報技術;2011年10期

相關博士學位論文前1條

1 王振;數(shù)字視頻中文本的提取方法研究[D];中國海洋大學;2011年

相關碩士學位論文前3條

1 夏玉華;基于高校圖書館學術講座視頻的快速瀏覽技術研究[D];山東大學;2010年

2 楊厚德;視頻廣告的自動識別與檢測[D];北京交通大學;2011年

3 劉嘉琦;基于多模態(tài)特征的新聞視頻結構分析[D];西安電子科技大學;2012年

，

本文編號：2378544

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://sikaile.net/wenyilunwen/guanggaoshejilunwen/2378544.html

上一篇：基于CPAC經(jīng)濟型數(shù)控雕刻機的研發(fā)
下一篇：水墨藝術內(nèi)在精神氣質在現(xiàn)代廣告設計中的運用與創(chuàng)新

論文發(fā)表

·知網(wǎng)|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于多模態(tài)信息的新聞視頻內(nèi)容分析技術研究