視音頻信息融合算法研究

發(fā)布時間：2018-11-28 14:54

【摘要】：近年來,隨著計算機信息化進程的發(fā)展,越來越多的視頻設備以及技術應用到人們的學習以及日常生活中。視頻會議、視頻搜索引擎技術以及視頻數(shù)據(jù)查詢等等技術的應用,在包括電影、電視、會議記錄、科學文獻等眾多領域產(chǎn)生了大量的非文本數(shù)據(jù)。對于個人而言,個人攝影設備的普及,以及互聯(lián)網(wǎng)技術的改進,讓普通人發(fā)布個人拍攝視頻變得極其簡單,也因此產(chǎn)生了大量的視頻數(shù)據(jù)。如何處理如此眾多的多媒體信息,如何組織數(shù)據(jù)并對其建立索引進行檢索,對現(xiàn)有的視頻處理技術是個嚴峻考驗。早期的多媒體信息檢索算法已經(jīng)偏離了便宜操作的最初目的,未來檢索算法的設計需要融合底層更多具有代表性的視覺、聽覺、語義特征。視頻信息的多模態(tài)性質(zhì)為信息融合提供了基礎�，F(xiàn)有的分析融合技術大多針對單一模態(tài),但是視頻是具有多模態(tài)性質(zhì)的特殊數(shù)據(jù),并且在描述同一主題時,其包含的多種模態(tài)具有很大關聯(lián)性。因此需要一種有效的方法對視頻進行融合分析,用于更加準確地對視頻進行分類和檢索。本文在處理視頻特征、融合視頻特征過程中的主要工作如下： 1、針對目前處理視頻數(shù)據(jù)的模型定義局限于新聞、廣告等特定領域,并且處理過程中使用的處理技術過于單一、陳舊,本文采用研究分析證明的一系列相對高效的視頻處理技術定義了一個相對完備的視頻檢索預處理模型。該模型利用視頻底層特征的多模態(tài)性質(zhì),提取出視頻的時間結構,然后對內(nèi)容進行特征提取,從原始視頻中構造出視頻數(shù)據(jù)的子集。本文基于此過程提取出視頻的關鍵幀,并從視頻的音頻流中提取出音頻特征。為簡化運算,對提取出的底層特征統(tǒng)一進行降維處理,本文采用的降維算法為Shuicheng Yan等人最新研究的——邊際fisher分析降維算法,該方法優(yōu)于目前通常采用的PCA、LDA等降維算法。根據(jù)得到的各種特征向量,利用魯棒性較好的支持向量機SVM分類器分類處理。 2、在對基于多模態(tài)特征的分類結果進行融合時,提出了一種改進的MGR融合算法。依據(jù)特征向量經(jīng)分類器處理后輸出的樣本序號矩陣,基于Melnik等設計的融合框架,為實現(xiàn)置信度和優(yōu)先權的優(yōu)化,設計了一個融合分數(shù)函數(shù)來改進MGR算法。改進后的算法比起MGR算法,降低了計算量,并且減少了參數(shù)數(shù)量,在識別率方面也有一定的改善。
[Abstract]:In recent years, with the development of computer information technology, more and more video equipment and technology are applied to people's learning and daily life. The application of video conference, video search engine technology and video data query technology has produced a lot of non-text data in many fields, such as film, television, meeting record, scientific literature and so on. For individuals, the popularity of personal photography devices and improvements in Internet technology have made it extremely easy for ordinary people to publish personal videos, resulting in a lot of video data. How to deal with so many multimedia information and how to organize and index the data is a severe test to the existing video processing technology. The early multimedia information retrieval algorithm has deviated from the original purpose of cheap operation. In the future, the design of retrieval algorithm needs to integrate more representative visual, auditory and semantic features. The multimodal nature of video information provides the basis for information fusion. Most of the existing analysis fusion techniques are aimed at single mode, but video is a special data with multi-modal properties, and when describing the same topic, it contains a lot of modes with great relevance. Therefore, an effective method for video fusion and analysis is needed to classify and retrieve video more accurately. The main work of this paper in the process of processing video features and merging video features is as follows: 1. The definition of model for processing video data is limited to specific fields such as news, advertising and so on. And the processing technology used in the processing process is too single and obsolete. In this paper, a relatively complete video retrieval preprocessing model is defined by a series of relatively efficient video processing techniques proved by research and analysis. In this model, the temporal structure of video is extracted by using the multi-modal properties of the bottom features of video, and then the content is extracted and a subset of video data is constructed from the original video. Based on this process, the key frame of video is extracted and audio features are extracted from audio stream of video. In order to simplify the operation and reduce the dimension of the extracted bottom features uniformly, the dimensionality reduction algorithm used in this paper is the marginal fisher analysis dimension reduction algorithm, which is recently studied by Shuicheng Yan et al. This method is superior to the PCA,LDA equal-dimension reduction algorithm which is usually used at present. According to the obtained feature vectors, a robust support vector machine (SVM) SVM classifier is used. 2. An improved MGR fusion algorithm is proposed when the classification results based on multi-modal features are fused. Based on the sample ordinal matrix of the feature vector processed by classifier and based on the fusion framework designed by Melnik and so on, a fusion fraction function is designed to improve the MGR algorithm in order to optimize confidence and priority. Compared with the MGR algorithm, the improved algorithm reduces the computational complexity, reduces the number of parameters, and improves the recognition rate.
【學位授予單位】：太原理工大學
【學位級別】：碩士
【學位授予年份】：2011
【分類號】：TP391.41

【相似文獻】

相關期刊論文前10條

1 張建明;李梅;李廣翠;;基于Simfusion和本體的視頻語義提取[J];計算機工程;2011年15期

2 王晨暉;管鳳旭;宋新景;馬也;;掌紋和三維手形的多模態(tài)圖像采集裝置設計[J];自動化技術與應用;2011年07期

3 周文娟;;基于Pervasive Computing技術的外語網(wǎng)絡交互模態(tài)話語構想[J];現(xiàn)代教育技術;2011年06期

4 胡校成;張衛(wèi)明;俞能海;;針對指紋模板的可逆信息隱藏編碼方法[J];中國科學技術大學學報;2011年07期

5 張大明;符茂勝;羅斌;;基于廣義積分平方誤差譜選擇的圖像分割[J];模式識別與人工智能;2011年02期

6 許磊;熊志廣;邵有為;;一種移動多Sink無線傳感器網(wǎng)絡監(jiān)測系統(tǒng)[J];現(xiàn)代電子技術;2011年11期

7 高偉超;;淺談電氣自動化的發(fā)展[J];現(xiàn)代營銷(學苑版);2011年07期

8 王斌;郭攀;張坤;黃樂;;基于計算機視覺技術的人臉檢測系統(tǒng)設計[J];電子設計工程;2011年16期

9 徐玲;;論模仿諷刺作品對合理使用制度的考量[J];成都紡織高等�？茖W校學報;2011年03期

10 ;[J];;年期

相關會議論文前10條

1 王寧;;嚴重腦血管病人的多模態(tài)監(jiān)測[A];第二屆中西醫(yī)結合腦病診治新進展高級研討班專家講義及論文匯編[C];2010年

2 梁勝;張春富;李彪;;干細胞追蹤用PET/SPECT/MRI/Fluo多模態(tài)探針設計探討[A];中華醫(yī)學會第九次全國核醫(yī)學學術會議論文摘要匯編[C];2011年

3 向良忠;邢達;楊思華;;光聲腫瘤分子成像[A];第七屆全國光生物學學術會議論文摘要集[C];2010年

4 李丹;林超;呂中偉;;多模態(tài)磁性-熒光可降解納米探針的研制及成像研究[A];中華醫(yī)學會第九次全國核醫(yī)學學術會議論文摘要匯編[C];2011年

5 王志剛;;模態(tài)超聲造影劑研究進展[A];2010年超聲醫(yī)學和醫(yī)學超聲論壇會議論文集[C];2010年

6 梁堅;楊永臻;;一種多模態(tài)自適應模糊控制器[A];1995年中國智能自動化學術會議暨智能自動化專業(yè)委員會成立大會論文集（上冊）[C];1995年

7 楊陳科;陶霖密;;情感信息實驗平臺的設計與實現(xiàn)[A];第一屆建立和諧人機環(huán)境聯(lián)合學術會議（HHME2005）論文集[C];2005年

8 黃本才;齊輝;陳勇;;體育場懸挑屋蓋多模態(tài)和交叉項對風激動力響應的影響[A];第八屆全國振動理論及應用學術會議論文集摘要[C];2003年

9 黨軍;;雙語詞典的多模態(tài)化——用戶·詞典·編者[A];福建省外國語文學會2010年年會論文集[C];2010年

10 鐘若飛;郭華東;王為民;朱博勤;;SZ-4多模態(tài)傳感器輻射模態(tài)數(shù)據(jù)處理與應用評價研究[A];第十四屆全國遙感技術學術交流會論文摘要集[C];2003年

相關重要報紙文章前10條

1 浙江大學教授胡曉云　本報記者孫魯威;堅持多模態(tài)產(chǎn)業(yè)模式[N];農(nóng)民日報;2011年

2 記者劉垠;在分子水平上認識疾病[N];大眾科技報;2009年

3 記者劉正午;賀斌：站在讀腦技術前沿[N];醫(yī)藥經(jīng)濟報;2010年

4 胡兆燕;重要的是本領[N];中國財經(jīng)報;2004年

5 本報記者羅朝淑;多模態(tài)神經(jīng)成像：讓大腦病灶無處可逃[N];科技日報;2010年

6 ;HVD：技術優(yōu)勢是制勝關鍵[N];中國電子報;2005年

7 ;塑料將用于制造新型顯示器[N];計算機世界;2004年

8 本報記者尹一捷;鄧中翰：中國“無芯”歷史的終結者[N];計算機世界;2010年

9 陳慕鴻;海信電器數(shù)字電視獲突破[N];證券日報;2004年

10 ;立足根本服務用戶[N];中國電腦教育報;2003年

相關博士學位論文前10條

1 張征;英語課堂多模態(tài)讀寫能力實證研究[D];山東大學;2011年

2 李潔;多模態(tài)腦電信號分析及腦機接口應用[D];上海交通大學;2009年

3 江e，

本文編號：2363195

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://sikaile.net/wenyilunwen/guanggaoshejilunwen/2363195.html

上一篇：遼寧省高爾夫球場會員參與狀況及其滿意度的研究
下一篇：基于消費行為分析的易佰特照明營銷競爭策略

論文發(fā)表

·知網(wǎng)|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

視音頻信息融合算法研究