視音頻信息融合算法研究
[Abstract]:In recent years, with the development of computer information technology, more and more video equipment and technology are applied to people's learning and daily life. The application of video conference, video search engine technology and video data query technology has produced a lot of non-text data in many fields, such as film, television, meeting record, scientific literature and so on. For individuals, the popularity of personal photography devices and improvements in Internet technology have made it extremely easy for ordinary people to publish personal videos, resulting in a lot of video data. How to deal with so many multimedia information and how to organize and index the data is a severe test to the existing video processing technology. The early multimedia information retrieval algorithm has deviated from the original purpose of cheap operation. In the future, the design of retrieval algorithm needs to integrate more representative visual, auditory and semantic features. The multimodal nature of video information provides the basis for information fusion. Most of the existing analysis fusion techniques are aimed at single mode, but video is a special data with multi-modal properties, and when describing the same topic, it contains a lot of modes with great relevance. Therefore, an effective method for video fusion and analysis is needed to classify and retrieve video more accurately. The main work of this paper in the process of processing video features and merging video features is as follows: 1. The definition of model for processing video data is limited to specific fields such as news, advertising and so on. And the processing technology used in the processing process is too single and obsolete. In this paper, a relatively complete video retrieval preprocessing model is defined by a series of relatively efficient video processing techniques proved by research and analysis. In this model, the temporal structure of video is extracted by using the multi-modal properties of the bottom features of video, and then the content is extracted and a subset of video data is constructed from the original video. Based on this process, the key frame of video is extracted and audio features are extracted from audio stream of video. In order to simplify the operation and reduce the dimension of the extracted bottom features uniformly, the dimensionality reduction algorithm used in this paper is the marginal fisher analysis dimension reduction algorithm, which is recently studied by Shuicheng Yan et al. This method is superior to the PCA,LDA equal-dimension reduction algorithm which is usually used at present. According to the obtained feature vectors, a robust support vector machine (SVM) SVM classifier is used. 2. An improved MGR fusion algorithm is proposed when the classification results based on multi-modal features are fused. Based on the sample ordinal matrix of the feature vector processed by classifier and based on the fusion framework designed by Melnik and so on, a fusion fraction function is designed to improve the MGR algorithm in order to optimize confidence and priority. Compared with the MGR algorithm, the improved algorithm reduces the computational complexity, reduces the number of parameters, and improves the recognition rate.
【學位授予單位】:太原理工大學
【學位級別】:碩士
【學位授予年份】:2011
【分類號】:TP391.41
【相似文獻】
相關期刊論文 前10條
1 張建明;李梅;李廣翠;;基于Simfusion和本體的視頻語義提取[J];計算機工程;2011年15期
2 王晨暉;管鳳旭;宋新景;馬也;;掌紋和三維手形的多模態(tài)圖像采集裝置設計[J];自動化技術與應用;2011年07期
3 周文娟;;基于Pervasive Computing技術的外語網(wǎng)絡交互模態(tài)話語構想[J];現(xiàn)代教育技術;2011年06期
4 胡校成;張衛(wèi)明;俞能海;;針對指紋模板的可逆信息隱藏編碼方法[J];中國科學技術大學學報;2011年07期
5 張大明;符茂勝;羅斌;;基于廣義積分平方誤差譜選擇的圖像分割[J];模式識別與人工智能;2011年02期
6 許磊;熊志廣;邵有為;;一種移動多Sink無線傳感器網(wǎng)絡監(jiān)測系統(tǒng)[J];現(xiàn)代電子技術;2011年11期
7 高偉超;;淺談電氣自動化的發(fā)展[J];現(xiàn)代營銷(學苑版);2011年07期
8 王斌;郭攀;張坤;黃樂;;基于計算機視覺技術的人臉檢測系統(tǒng)設計[J];電子設計工程;2011年16期
9 徐玲;;論模仿諷刺作品對合理使用制度的考量[J];成都紡織高等?茖W校學報;2011年03期
10 ;[J];;年期
相關會議論文 前10條
1 王寧;;嚴重腦血管病人的多模態(tài)監(jiān)測[A];第二屆中西醫(yī)結合腦病診治新進展高級研討班專家講義及論文匯編[C];2010年
2 梁勝;張春富;李彪;;干細胞追蹤用PET/SPECT/MRI/Fluo多模態(tài)探針設計探討[A];中華醫(yī)學會第九次全國核醫(yī)學學術會議論文摘要匯編[C];2011年
3 向良忠;邢達;楊思華;;光聲腫瘤分子成像[A];第七屆全國光生物學學術會議論文摘要集[C];2010年
4 李丹;林超;呂中偉;;多模態(tài)磁性-熒光可降解納米探針的研制及成像研究[A];中華醫(yī)學會第九次全國核醫(yī)學學術會議論文摘要匯編[C];2011年
5 王志剛;;模態(tài)超聲造影劑研究進展[A];2010年超聲醫(yī)學和醫(yī)學超聲論壇會議論文集[C];2010年
6 梁堅;楊永臻;;一種多模態(tài)自適應模糊控制器[A];1995年中國智能自動化學術會議暨智能自動化專業(yè)委員會成立大會論文集(上冊)[C];1995年
7 楊陳科;陶霖密;;情感信息實驗平臺的設計與實現(xiàn)[A];第一屆建立和諧人機環(huán)境聯(lián)合學術會議(HHME2005)論文集[C];2005年
8 黃本才;齊輝;陳勇;;體育場懸挑屋蓋多模態(tài)和交叉項對風激動力響應的影響[A];第八屆全國振動理論及應用學術會議論文集摘要[C];2003年
9 黨軍;;雙語詞典的多模態(tài)化——用戶·詞典·編者[A];福建省外國語文學會2010年年會論文集[C];2010年
10 鐘若飛;郭華東;王為民;朱博勤;;SZ-4多模態(tài)傳感器輻射模態(tài)數(shù)據(jù)處理與應用評價研究[A];第十四屆全國遙感技術學術交流會論文摘要集[C];2003年
相關重要報紙文章 前10條
1 浙江大學教授 胡曉云 本報記者 孫魯威;堅持多模態(tài)產(chǎn)業(yè)模式[N];農(nóng)民日報;2011年
2 記者 劉垠;在分子水平上認識疾病[N];大眾科技報;2009年
3 記者 劉正午;賀斌:站在讀腦技術前沿[N];醫(yī)藥經(jīng)濟報;2010年
4 胡兆燕;重要的是本領[N];中國財經(jīng)報;2004年
5 本報記者 羅朝淑;多模態(tài)神經(jīng)成像:讓大腦病灶無處可逃[N];科技日報;2010年
6 ;HVD:技術優(yōu)勢是制勝關鍵[N];中國電子報;2005年
7 ;塑料將用于制造新型顯示器[N];計算機世界;2004年
8 本報記者 尹一捷;鄧中翰:中國“無芯”歷史的終結者[N];計算機世界;2010年
9 陳慕鴻;海信電器 數(shù)字電視獲突破[N];證券日報;2004年
10 ;立足根本 服務用戶[N];中國電腦教育報;2003年
相關博士學位論文 前10條
1 張征;英語課堂多模態(tài)讀寫能力實證研究[D];山東大學;2011年
2 李潔;多模態(tài)腦電信號分析及腦機接口應用[D];上海交通大學;2009年
3 江e,
本文編號:2363195
本文鏈接:http://sikaile.net/wenyilunwen/guanggaoshejilunwen/2363195.html