農(nóng)業(yè)視頻語義描述算法的研究與實(shí)現(xiàn)
本文選題:視頻檢索 + 中文語義描述。 參考:《西北農(nóng)林科技大學(xué)》2017年碩士論文
【摘要】:為了解決農(nóng)業(yè)視頻的語義索引不完善的問題,研究并實(shí)現(xiàn)農(nóng)業(yè)視頻語義描述算法,為農(nóng)業(yè)視頻生成描述其語義的自然語句作為農(nóng)業(yè)視頻的語義索引和內(nèi)容梗概,從而實(shí)現(xiàn)基于語義關(guān)鍵字的農(nóng)業(yè)視頻檢索和對(duì)檢索結(jié)果的人工篩選,大大降低農(nóng)業(yè)從業(yè)者檢索具體農(nóng)業(yè)生產(chǎn)活動(dòng)相關(guān)視頻的時(shí)間,有助于推動(dòng)農(nóng)業(yè)信息化的發(fā)展。農(nóng)業(yè)視頻語義描述面臨著諸多困難,如怎樣提取代表農(nóng)業(yè)視頻語義的語義關(guān)鍵幀、怎樣識(shí)別語義關(guān)鍵幀中的物體及相對(duì)關(guān)系、怎樣用自然語句表達(dá)語義關(guān)鍵幀的識(shí)別結(jié)果等,是一項(xiàng)涉及到計(jì)算機(jī)視覺和自然語言處理的跨學(xué)科難題。本文對(duì)農(nóng)業(yè)視頻語義描述的解決思路是:將農(nóng)業(yè)視頻按照畫面過渡分割為鏡頭并為每個(gè)鏡頭提取語義關(guān)鍵幀,為語義關(guān)鍵幀提取圖像特征并映射到含義空間,為人工對(duì)語義關(guān)鍵幀添加的語義描述提取文本特征并映射到含義空間,在含義空間使用遞歸神經(jīng)網(wǎng)絡(luò)學(xué)習(xí)語義關(guān)鍵幀生成語義描述,從而為任意語義關(guān)鍵幀生成語義描述。本文的主要工作如下:(1)語義關(guān)鍵幀的圖像特征提取。為農(nóng)業(yè)視頻提取壓縮關(guān)鍵幀,在壓縮域基于直方圖特征使用固定閾值的鏡頭邊界檢測(cè)算法將農(nóng)業(yè)視頻分割為鏡頭,使用K-Means聚類算法為鏡頭提取出語義關(guān)鍵幀;基于人工為語義關(guān)鍵幀添加的物體位置信息訓(xùn)練深度圖像特征提取器,為語義關(guān)鍵幀提取深度圖像特征。(2)語義描述的文本特征提取。為農(nóng)業(yè)視頻的語義關(guān)鍵幀人工添加語義描述,使用分詞算法對(duì)語義描述進(jìn)行分詞操作并統(tǒng)計(jì)分詞結(jié)果中的所有詞匯構(gòu)建初始中文詞表;使用中文詞匯相似度判定算法對(duì)初始中文詞表中的同義詞進(jìn)行合并得到最終中文詞表,將語義描述中的詞匯序列相對(duì)于最終中文詞表的索引序列作為語義描述的文本特征。(3)語義關(guān)鍵幀生成語義描述的學(xué)習(xí)。將語義關(guān)鍵幀的圖像特征映射成含義空間的一個(gè)含義向量并編碼入遞歸神經(jīng)網(wǎng)絡(luò)的隱藏層;將語義關(guān)鍵幀對(duì)應(yīng)語義描述的文本特征映射成含義空間的一組含義向量作為遞歸神經(jīng)網(wǎng)絡(luò)隱藏層的解碼輸入,根據(jù)訓(xùn)練數(shù)據(jù)集中的語義關(guān)鍵幀和語義描述學(xué)習(xí)遞歸神經(jīng)網(wǎng)絡(luò)的編碼矩陣和解碼矩陣。本文的主要?jiǎng)?chuàng)新在于基于區(qū)域而不是基于整幅圖像為語義關(guān)鍵幀提取圖像特征、基于同義詞而不是基于詞匯為語義描述提取文本特征,在農(nóng)事直通車數(shù)據(jù)集上的實(shí)驗(yàn)表明,這兩種創(chuàng)新分別將農(nóng)業(yè)視頻語義描述的得分提高了5.1和1.7。
[Abstract]:In order to solve the problem of imperfect semantic index of agricultural video, the semantic description algorithm of agricultural video is studied and implemented, and the natural sentence describing its semantics is generated for agricultural video as the semantic index and content outline of agricultural video. Therefore, agricultural video retrieval based on semantic keywords and manual selection of retrieval results can greatly reduce the time for agricultural practitioners to retrieve videos related to specific agricultural production activities, and help to promote the development of agricultural informatization. Agricultural video semantic description faces many difficulties, such as how to extract semantic key frames representing agricultural video semantics, how to identify objects and relative relations in semantic key frames, how to express the recognition results of semantic key frames with natural sentences, etc. Is a cross-disciplinary problem involving computer vision and natural language processing. In this paper, the solution to the semantic description of agricultural video is to divide the agricultural video into shots according to the picture transition and extract the semantic key frames for each shot, and extract the image features for the semantic key frames and map them to the meaning space. Text features are extracted and mapped to the meaning space for the semantic description added to the semantic key frame, and the semantic description is generated for any semantic key frame by using recursive neural network to generate the semantic description in the meaning space. The main work of this paper is as follows: 1) feature extraction of semantic key frames. In order to extract and compress key frames of agricultural video, the shot boundary detection algorithm based on histogram feature is used to segment agricultural video into shots in compressed domain, and K-Means clustering algorithm is used to extract semantic key frames for shots. A depth image feature extractor is used to extract depth image features for semantic key frames based on the training of object position information for semantic key frames. The semantic key frame of agricultural video is artificially added semantic description. The segmentation algorithm is used to segment the semantic description and all the words in the segmentation result are counted to construct the initial Chinese word list. The final Chinese thesaurus is obtained by merging the synonyms in the initial Chinese thesaurus by using the Chinese lexical similarity determination algorithm. The lexical sequence in semantic description is compared with the index sequence of the final Chinese lexical table as the text feature of semantic description. The image feature of the semantic key frame is mapped into a meaning vector of the meaning space and encoded into the hidden layer of the recurrent neural network. The text feature of semantic key frame corresponding to semantic description is mapped into a set of meaning vectors in the meaning space as the decoding input of the hidden layer of recursive neural network. The coding matrix and decoding matrix of recurrent neural network are studied according to the semantic key frames and semantic description of the training data set. The main innovation of this paper is to extract the feature of the image based on the region rather than the whole image as the semantic key frame, and the text feature based on the synonym rather than the semantic description. The experiment on the through train data set shows that, The two innovations raised the scores of agricultural video semantic descriptions by 5. 1 and 1. 7, respectively.
【學(xué)位授予單位】:西北農(nóng)林科技大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:H030;S126;TP391.41
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 陳飛;呂紹和;李軍;王曉東;竇勇;;目標(biāo)提取與哈希機(jī)制的多標(biāo)簽圖像檢索[J];中國圖象圖形學(xué)報(bào);2017年02期
2 王敏;王斌;沈鈞戈;高新波;;教學(xué)視頻的文本語義鏡頭分割和標(biāo)注[J];數(shù)據(jù)采集與處理;2016年06期
3 溫皓杰;周婧;傅澤田;張領(lǐng)先;嚴(yán)謹(jǐn);李鑫星;;面向語義挖掘的蔬菜病害知識(shí)視頻場(chǎng)景檢測(cè)[J];農(nóng)業(yè)機(jī)械學(xué)報(bào);2016年S1期
4 韓興邦;毛峽;;一種改進(jìn)的詞義相似度算法[J];中國科技論文;2016年02期
5 李鑫星;劉春迪;溫皓杰;蘇葉;傅澤田;張領(lǐng)先;;基于語音識(shí)別的蔬菜病害視頻語義標(biāo)注與分割方法[J];農(nóng)業(yè)機(jī)械學(xué)報(bào);2015年09期
6 張滬寅;劉道波;溫春艷;;基于《知網(wǎng)》的詞語語義相似度改進(jìn)算法研究[J];計(jì)算機(jī)工程;2015年02期
7 王晗;吳心筱;賈云得;;使用異構(gòu)互聯(lián)網(wǎng)圖像組的視頻標(biāo)注[J];計(jì)算機(jī)學(xué)報(bào);2013年10期
8 傅澤田;蘇葉;張領(lǐng)先;李鑫星;;基于自適應(yīng)雙閾值的蔬菜病害知識(shí)視頻分割方法[J];農(nóng)業(yè)工程學(xué)報(bào);2013年09期
9 田久樂;趙蔚;;基于同義詞詞林的詞語相似度計(jì)算方法[J];吉林大學(xué)學(xué)報(bào)(信息科學(xué)版);2010年06期
10 彭藝;葉齊祥;黃鈞;焦建彬;;一種內(nèi)容完整的視頻穩(wěn)定算法[J];中國圖象圖形學(xué)報(bào);2010年09期
,本文編號(hào):2011342
本文鏈接:http://sikaile.net/wenyilunwen/yuyanyishu/2011342.html