基于內(nèi)容和語義的視頻短鏡頭分類
發(fā)布時間:2019-03-12 12:19
【摘要】: 隨著近幾年來多媒體技術(shù)和網(wǎng)絡(luò)技術(shù)的發(fā)展,網(wǎng)絡(luò)上涌現(xiàn)出了越來越多的視頻短鏡頭和在線視頻網(wǎng)站,因此,基于內(nèi)容和語義的視頻短鏡頭的分類檢索也成為了一個人們研究領(lǐng)域。 視頻短鏡頭是由時間上連續(xù)的幀圖像組成的集合,因此對視頻的分析包括空間和時間兩個方面。空間上的分析,可以利用現(xiàn)有的圖像特征提取技術(shù),提取有效的視覺特征;對時間的分析,就需要對短鏡頭的數(shù)據(jù)進(jìn)行結(jié)構(gòu)化分析和處理。靜態(tài)和動態(tài)特征的結(jié)合形成描述短鏡頭內(nèi)容的特征空間。另一方面,由于傳統(tǒng)的視頻鏡頭分類系統(tǒng)沒有考慮鏡頭的高級語義信息,這樣導(dǎo)致了底層視覺特征和高層語義信息之間存在著語義鴻溝,因此在分類系統(tǒng)中加入對語義特征的分析和研究是十分有必要的,嘗試由視頻短鏡頭的底層特征推知高層語義信息,從而實現(xiàn)基于高級語義的鏡頭分類系統(tǒng)。 因此,本文主要從以上兩個方面進(jìn)行了研究,并根據(jù)現(xiàn)有方法的特點和不足,提出了相應(yīng)的解決辦法。 在提取了多種視頻短鏡頭的視覺特征的基礎(chǔ)上,采用互信息的方法研究單一的視覺特征的鑒別力,該方法理論基礎(chǔ)強,不依賴于分類器的種類,從特征含類別的信息量的多少來分析特征的鑒別力,表達(dá)了圖像特征與類別之間的內(nèi)在聯(lián)系,試驗中基于SVM分類器的分類錯誤率也反映了使用互信息進(jìn)行特征分析和選擇的正確性和有效性。接下來使用SVM分類器,分析各種視覺特征之間的互補或冗余關(guān)系,從而進(jìn)行最優(yōu)特征組合的選擇。研究確定的針對真人/動漫類別的最佳特征是RGB改進(jìn)顏色矩+邊緣動態(tài)特征的組合特征,針對人物/風(fēng)景類別的最佳特征是RGB改進(jìn)顏色矩+Gabor紋理特征+邊緣動態(tài)特征的組合特征,針對體育/娛樂類別的最佳特征是邊緣方向直方圖+顏色動態(tài)特征。 最后在針對球類比賽的視頻短鏡頭分類系統(tǒng)中加入了高級語義特征的提取和研究,利用鏡頭內(nèi)關(guān)鍵幀的比例和關(guān)鍵幀內(nèi)球場區(qū)域像素比例的特征組合,將短鏡頭數(shù)據(jù)庫分成場內(nèi)和場外場景,利用球場區(qū)域的比例進(jìn)一步將場內(nèi)鏡頭分為遠(yuǎn)景和近景鏡頭,同時利用邊緣區(qū)域的像素比例將場外場景分成教練員和觀眾鏡頭,從而形成了一種針對球類運動的分等級的短鏡頭分類器。
[Abstract]:With the development of multimedia technology and network technology in recent years, more and more video short shots and online video websites have emerged on the network. The classification and retrieval of video short shots based on content and semantics has also become a research field. Video short shot is a collection of time-continuous frame images, so the analysis of video includes two aspects: space and time. Spatial analysis can make use of the existing image feature extraction techniques to extract effective visual features, and the analysis of time requires the structural analysis and processing of short lens data. The combination of static and dynamic features forms a feature space that describes the content of a short lens. On the other hand, the traditional video shot classification system does not consider the high-level semantic information of the shot, which leads to the semantic gap between the underlying visual features and the high-level semantic information. Therefore, it is necessary to analyze and study the semantic features in the classification system. We try to infer the high-level semantic information from the low-level features of video short shots, so as to realize the shot classification system based on the high-level semantics. Therefore, this paper mainly from the above two aspects of research, and according to the characteristics and shortcomings of the existing methods, put forward the corresponding solutions. On the basis of extracting the visual features of a variety of video short lenses, the method of mutual information is used to study the discriminating power of a single visual feature. The method has a strong theoretical basis and does not depend on the classification of classifiers. The discriminating power of the feature is analyzed from the amount of information contained in the feature category, and the inherent relationship between the image feature and the category is expressed. The classification error rate based on SVM classifier in the experiment also reflects the correctness and effectiveness of using mutual information for feature analysis and selection. Next, the SVM classifier is used to analyze the complementary or redundant relations among various visual features, so as to select the optimal feature combination. The best feature identified for real-life / animation categories is the combination of RGB's improved color moment edge dynamic features. The best feature of person / scenery category is the combination feature of RGB improved color moment Gabor texture feature edge dynamic feature, and the best feature of sports / entertainment category is edge direction histogram color dynamic feature. Finally, the extraction and research of advanced semantic features are added to the video short shot classification system for ball games. The feature combination of the ratio of keyframes in the shot and the ratio of the pixels in the field area in the keyframes is used to extract and study the high-level semantic features. The short lens database is divided into in-field and off-field scenes, and the in-field lenses are further divided into long-range and close-range lenses by using the scale of the field area, and the off-field scenes are divided into coaches and spectators by using the pixel ratio of the edge area. Thus, a hierarchical short lens classifier for ball motion is formed.
【學(xué)位授予單位】:上海交通大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2009
【分類號】:TP391.41
本文編號:2438765
[Abstract]:With the development of multimedia technology and network technology in recent years, more and more video short shots and online video websites have emerged on the network. The classification and retrieval of video short shots based on content and semantics has also become a research field. Video short shot is a collection of time-continuous frame images, so the analysis of video includes two aspects: space and time. Spatial analysis can make use of the existing image feature extraction techniques to extract effective visual features, and the analysis of time requires the structural analysis and processing of short lens data. The combination of static and dynamic features forms a feature space that describes the content of a short lens. On the other hand, the traditional video shot classification system does not consider the high-level semantic information of the shot, which leads to the semantic gap between the underlying visual features and the high-level semantic information. Therefore, it is necessary to analyze and study the semantic features in the classification system. We try to infer the high-level semantic information from the low-level features of video short shots, so as to realize the shot classification system based on the high-level semantics. Therefore, this paper mainly from the above two aspects of research, and according to the characteristics and shortcomings of the existing methods, put forward the corresponding solutions. On the basis of extracting the visual features of a variety of video short lenses, the method of mutual information is used to study the discriminating power of a single visual feature. The method has a strong theoretical basis and does not depend on the classification of classifiers. The discriminating power of the feature is analyzed from the amount of information contained in the feature category, and the inherent relationship between the image feature and the category is expressed. The classification error rate based on SVM classifier in the experiment also reflects the correctness and effectiveness of using mutual information for feature analysis and selection. Next, the SVM classifier is used to analyze the complementary or redundant relations among various visual features, so as to select the optimal feature combination. The best feature identified for real-life / animation categories is the combination of RGB's improved color moment edge dynamic features. The best feature of person / scenery category is the combination feature of RGB improved color moment Gabor texture feature edge dynamic feature, and the best feature of sports / entertainment category is edge direction histogram color dynamic feature. Finally, the extraction and research of advanced semantic features are added to the video short shot classification system for ball games. The feature combination of the ratio of keyframes in the shot and the ratio of the pixels in the field area in the keyframes is used to extract and study the high-level semantic features. The short lens database is divided into in-field and off-field scenes, and the in-field lenses are further divided into long-range and close-range lenses by using the scale of the field area, and the off-field scenes are divided into coaches and spectators by using the pixel ratio of the edge area. Thus, a hierarchical short lens classifier for ball motion is formed.
【學(xué)位授予單位】:上海交通大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2009
【分類號】:TP391.41
【引證文獻(xiàn)】
相關(guān)碩士學(xué)位論文 前1條
1 鄧克捷;基于主題的體育新聞視頻檢索的研究[D];中南大學(xué);2011年
,本文編號:2438765
本文鏈接:http://sikaile.net/wenyilunwen/dongmansheji/2438765.html
最近更新
教材專著