基于上下文的移動多媒體信息標注和管理及關鍵技術研究
發(fā)布時間:2019-06-11 01:00
【摘要】:近年來,隨著計算機通信和多媒體壓縮技術的飛速發(fā)展以及存儲成本的不斷下降,尤其是智能手機的流行和各種社交網(wǎng)站的出現(xiàn),視頻、圖片等視覺數(shù)據(jù)的規(guī)模呈現(xiàn)爆炸性增長,如何有效的管理和獲取這些數(shù)據(jù)成為一個亟待解決的問題。為了利用文本管理和檢索技術實現(xiàn)對這些數(shù)據(jù)的直接訪問,視頻和圖片的語義標注技術逐漸發(fā)展起來,而由于人工標注效率低,成本高,主觀性強,目前常用的解決方案是利用計算機對視覺數(shù)據(jù)進行自動標注;谡Z義概念的自動標注是目前常用的標注技術之一,雖然取得了一定的成功,但仍舊存在一些問題影響了自動標注技術的進一步發(fā)展,其中包括對訓練數(shù)據(jù)的依賴和視覺語義的局限性等。本文試圖從一個新的角度來對待和處理視覺數(shù)據(jù)的自動標注問題。從本質上講,視頻和圖片等視覺數(shù)據(jù)是視覺傳感器對現(xiàn)實世界的實體和事件的描述載體,數(shù)據(jù)標注試圖在視覺描述的基礎上實現(xiàn)對原始語義的解析并以語言描述的形式進行還原,以方便組織和管理。視覺傳感器是將其功能范圍內(nèi)目標的視覺表現(xiàn)進行記錄,而大量與目標語義相關的上下文信息被忽略掉。目前該領域的研究重點仍是如何充分挖掘視覺數(shù)據(jù)包含的語義信息,與此不同,本文將注意力放在視覺數(shù)據(jù)的產(chǎn)生過程。隨著物聯(lián)網(wǎng)技術的發(fā)展,各種可穿戴感知設備逐漸普及,本文旨在利用可穿戴感器實現(xiàn)對視覺目標相關的上下文信息進行收集和利用,以幫助視覺數(shù)據(jù)的語義解析,主要研究成果如下:·常規(guī)視頻中人臉檢測和跟蹤技術需要處理視頻中的每一幀圖像,本文提出了一種快速人臉檢測和跟蹤算法,通過利用傳感器收集的上下文信息過濾大量無臉視頻幀,從而降低處理時間,減少人臉誤報和漏報,提高了人臉檢測和跟蹤的性能和效率!ぴ诶脗鞲衅鬟M行快速人臉識別的基礎上,通過深入挖掘不同感知模式中目標身體運動方向的一致性,提出了一種視頻中正面臉部圖像識別的方法。與前述的身份識別類似,可穿戴傳感器引入使識別過程擺脫了對樣本數(shù)據(jù)的依賴,實驗證明,該方法具有更好的魯棒性!鹘y(tǒng)的視頻中目標身份識別方法為了保證準確性,需要針對每個目標收集大量高質量的樣本數(shù)據(jù)。本文提出了一種基于運動匹配的身份識別方法,該方法利用同一目標在不同感知模型中運動特征的內(nèi)在一致性,通過引入可穿戴傳感器來協(xié)助解決視頻中的目標身份識別問題,該方法避開了傳統(tǒng)的處理流程,擺脫了對樣本數(shù)據(jù)的依賴,具有邏輯簡單,計算復雜度低,可靠性高的特點!ぬ岢隽艘环N視頻自動標注方法,該方法分別利用兩種不同種類的感知數(shù)據(jù)進行動作識別,并且通過融合不同感知模式下的判定結果,揭示了目標的身份,最終達到以時間、地點、人物、動作的形式對視頻內(nèi)容進行標注的目的。
[Abstract]:In recent years, with the rapid development of computer communication and multimedia compression technology and the continuous decline of storage costs, especially the popularity of smart phones and the emergence of various social networking sites, video, The scale of visual data such as pictures is exploding. How to effectively manage and obtain these data has become an urgent problem to be solved. In order to use text management and retrieval technology to access these data directly, the semantic tagging technology of video and picture has been gradually developed, but because of the low efficiency, high cost and subjectivity of manual tagging, At present, the commonly used solution is to use computer to automatically mark visual data. Automatic tagging based on semantic concept is one of the commonly used tagging technologies at present. Although it has achieved some success, there are still some problems that affect the further development of automatic tagging technology. It includes the dependence on training data and the limitation of visual semantics. This paper attempts to deal with and deal with the problem of automatic marking of visual data from a new point of view. In essence, visual data such as video and pictures are the description carriers of real-world entities and events by visual sensors. Data tagging attempts to analyze the original semantics and restore them in the form of language description on the basis of visual description, so as to facilitate organization and management. The visual sensor records the visual performance of the target in its functional range, and a large number of contextual information related to the semantics of the target is ignored. At present, the research focus in this field is still how to fully mine the semantic information contained in visual data. Unlike this, this paper focuses on the generation process of visual data. With the development of Internet of things technology, a variety of wearable perceptual devices are becoming more and more popular. the purpose of this paper is to use wearable sensors to collect and utilize the context information related to visual objects in order to help the semantic analysis of visual data. The main research results are as follows: face detection and tracking technology in conventional video needs to deal with every frame of image in video. In this paper, a fast face detection and tracking algorithm is proposed. By filtering a large number of faceless video frames by using the context information collected by the sensor, the processing time is reduced and the false positives and missed positives of the faces are reduced. The performance and efficiency of face detection and tracking are improved. On the basis of using sensor for fast face recognition, the consistency of target body motion direction in different perception patterns is deeply excavated. In this paper, a method of front face image recognition in video is proposed. Similar to the above identification, the introduction of wearable sensors makes the recognition process get rid of the dependence on sample data, and the experimental results show that this method has better robustness. In order to ensure the accuracy of the traditional target identification method in video, A large number of high-quality sample data need to be collected for each target. In this paper, an identification method based on motion matching is proposed, which makes use of the inherent consistency of the motion features of the same target in different perceptual models, and helps to solve the problem of target identification in video by introducing wearable sensors. This method avoids the traditional processing flow and gets rid of the dependence on sample data. It has the characteristics of simple logic, low computational complexity and high reliability. A video automatic marking method is proposed. This method uses two different kinds of perceptual data for action recognition, and reveals the identity of the target by combining the decision results of different perceptual modes, and finally achieves the identity of the target at time, place and character. The purpose of marking the video content in the form of action.
【學位授予單位】:北京郵電大學
【學位級別】:博士
【學位授予年份】:2015
【分類號】:TP391.41
,
本文編號:2496875
[Abstract]:In recent years, with the rapid development of computer communication and multimedia compression technology and the continuous decline of storage costs, especially the popularity of smart phones and the emergence of various social networking sites, video, The scale of visual data such as pictures is exploding. How to effectively manage and obtain these data has become an urgent problem to be solved. In order to use text management and retrieval technology to access these data directly, the semantic tagging technology of video and picture has been gradually developed, but because of the low efficiency, high cost and subjectivity of manual tagging, At present, the commonly used solution is to use computer to automatically mark visual data. Automatic tagging based on semantic concept is one of the commonly used tagging technologies at present. Although it has achieved some success, there are still some problems that affect the further development of automatic tagging technology. It includes the dependence on training data and the limitation of visual semantics. This paper attempts to deal with and deal with the problem of automatic marking of visual data from a new point of view. In essence, visual data such as video and pictures are the description carriers of real-world entities and events by visual sensors. Data tagging attempts to analyze the original semantics and restore them in the form of language description on the basis of visual description, so as to facilitate organization and management. The visual sensor records the visual performance of the target in its functional range, and a large number of contextual information related to the semantics of the target is ignored. At present, the research focus in this field is still how to fully mine the semantic information contained in visual data. Unlike this, this paper focuses on the generation process of visual data. With the development of Internet of things technology, a variety of wearable perceptual devices are becoming more and more popular. the purpose of this paper is to use wearable sensors to collect and utilize the context information related to visual objects in order to help the semantic analysis of visual data. The main research results are as follows: face detection and tracking technology in conventional video needs to deal with every frame of image in video. In this paper, a fast face detection and tracking algorithm is proposed. By filtering a large number of faceless video frames by using the context information collected by the sensor, the processing time is reduced and the false positives and missed positives of the faces are reduced. The performance and efficiency of face detection and tracking are improved. On the basis of using sensor for fast face recognition, the consistency of target body motion direction in different perception patterns is deeply excavated. In this paper, a method of front face image recognition in video is proposed. Similar to the above identification, the introduction of wearable sensors makes the recognition process get rid of the dependence on sample data, and the experimental results show that this method has better robustness. In order to ensure the accuracy of the traditional target identification method in video, A large number of high-quality sample data need to be collected for each target. In this paper, an identification method based on motion matching is proposed, which makes use of the inherent consistency of the motion features of the same target in different perceptual models, and helps to solve the problem of target identification in video by introducing wearable sensors. This method avoids the traditional processing flow and gets rid of the dependence on sample data. It has the characteristics of simple logic, low computational complexity and high reliability. A video automatic marking method is proposed. This method uses two different kinds of perceptual data for action recognition, and reveals the identity of the target by combining the decision results of different perceptual modes, and finally achieves the identity of the target at time, place and character. The purpose of marking the video content in the form of action.
【學位授予單位】:北京郵電大學
【學位級別】:博士
【學位授予年份】:2015
【分類號】:TP391.41
,
本文編號:2496875
本文鏈接:http://sikaile.net/shoufeilunwen/xxkjbs/2496875.html
最近更新
教材專著