基于數(shù)字指紋的音頻檢索系統(tǒng)的設計與實現(xiàn)
發(fā)布時間:2018-03-18 14:17
本文選題:數(shù)字音頻指紋 切入點:特征提取 出處:《電子科技大學》2014年碩士論文 論文類型:學位論文
【摘要】:近年來,隨著多媒體技術的普及,音頻數(shù)據(jù)在網(wǎng)絡上有了爆炸性的增長,這使得開發(fā)高效的檢索分類音頻數(shù)據(jù)的方法越來越受到關注;趦(nèi)容的音頻檢索系統(tǒng)利用從信號中提取出的聲學特征與數(shù)據(jù)庫中存儲的聲學特征進行比對從而檢索出音頻信號的元數(shù)據(jù)(作者,專輯,流派等)。其潛在應用包括自動音頻識別,音頻軌跡跟蹤,版權保護,電視節(jié)目檢索,廣告背景音樂檢測等等。本篇論文主要實現(xiàn)了基于內(nèi)容的音頻檢索即通過數(shù)字音頻指紋來檢索識別音頻文件。數(shù)字音頻指紋是從音頻內(nèi)容中提取出的一段可以代表音頻重要聲學特征的緊致數(shù)字簽名,將數(shù)字音頻指紋作為用于識別音頻的索引,并和相應的元數(shù)據(jù)信息內(nèi)容一起存儲在數(shù)據(jù)庫中,檢索時將未知音頻文件提取出的數(shù)字音頻指紋與數(shù)據(jù)庫中存儲的進行比對從而識別出未知音頻文件。本文著重對影響著音頻檢索系統(tǒng)魯棒性的幾個重要步驟:特征提取,指紋模型和匹配進行了研究:首先,本文研究比較了幾個頻譜特征,包括梅爾頻率倒譜系數(shù)(Mel-Frequency Cepstral Coefficients,MFCCs),色度頻譜(Chroma Spectrum),常數(shù)Q值轉換頻譜(Constant Q Spectrum),以及積譜(Product Spectrum)。前三個特征提取只是來源于幅度譜,其已經(jīng)廣泛應用于音頻信號處理及關鍵點檢測,而積譜則利用了幅度譜與群延遲的乘積,它在魯棒語音識別中效率非常高。實驗表明在音頻檢索系統(tǒng)中本文所用的基于積譜的特征提取方法比前三種特征提取方法更具有更高的檢索精確度。其次,本文提出了一個累積相似模型,以便能更好地提取出音頻數(shù)據(jù)之間的相似度。實驗表明累積相似模型比歐氏距離模型具有更好的效率與精確度。第三,本文使用高斯混合模型來提高音頻檢索系統(tǒng)的魯棒性。高斯混合模型通過使用期望最大值算法(EM)來訓練音頻數(shù)據(jù)庫,高斯混合模型能更好地描述聲學特征的特點。通過訓練高斯混合模型,數(shù)據(jù)庫中的音頻和待檢測音頻片段的特征向量都轉換成了象征性的符號標記,然后在數(shù)據(jù)庫中進行檢索。實驗結果表明了高斯混合模型的優(yōu)點,它即使在嚴重的噪聲失真的情況下依然保持著較高的精確度。最后,通過實驗將本文提出的方法與一種目前通用的音頻檢索方法AudioDNA進行了比較。本文的方法與AudioDNA的最大區(qū)別是聲學特征提取方法的不同與以及相似性度量方法的不同。實驗結果表明,本文提出的方法更能抵抗噪聲攻擊引起的失真。
[Abstract]:In recent years, with the popularity of multimedia technology, audio data has explosive growth on the network. Therefore, more and more attention has been paid to the development of efficient methods for retrieving classified audio data. Content-based audio retrieval systems compare acoustic features extracted from signals with those stored in the database. Retrieve metadata for audio signals (author, Potential applications include Audio recognition, Audio track tracking, copyright Protection, TV Program Retrieval, etc. This paper mainly implements the content-based audio retrieval, that is, retrieving and identifying audio files by digital audio fingerprint. Digital audio fingerprint is a section of audio content that can be extracted from audio content. Compact Digital signature of important Acoustic Features of Table Audio, The digital audio fingerprint is used as an index to identify the audio, and stored in the database with the corresponding metadata information content, The digital audio fingerprint extracted from the unknown audio file is compared with the stored in the database to identify the unknown audio file. This paper focuses on several important steps that affect the robustness of the audio retrieval system: feature extraction. Fingerprint model and matching are studied. Firstly, several spectrum features are studied and compared. These include Mel-Frequency Cepstral coefficients, chrominance spectrum Chroma spectrum, constant Q conversion spectrum constant Q spectrum, and product spectrum product spectrum. The first three feature extraction is only derived from amplitude spectrum, which has been widely used in audio signal processing and key point detection. The product spectrum uses the product of amplitude spectrum and group delay. It is very efficient in robust speech recognition. Experiments show that the feature extraction method based on product spectrum used in audio retrieval system has higher retrieval accuracy than the first three feature extraction methods. In this paper, a cumulative similarity model is proposed to extract the similarity between audio data. Experiments show that the cumulative similarity model is more efficient and accurate than the Euclidean distance model. In this paper, Gao Si hybrid model is used to improve the robustness of audio retrieval system. Gao Si's hybrid model can better describe the characteristics of acoustic features. By training Gao Si's hybrid model, audio in the database and feature vectors of audio fragments to be detected are transformed into symbolic symbols. The results of the experiments show the advantages of Gao Si's mixed model, which maintains high accuracy even in the case of severe noise distortion. Finally, This paper compares the proposed method with that of AudioDNA, a general audio retrieval method. The biggest difference between this method and AudioDNA is the difference between acoustic feature extraction method and similarity measurement method. The results of the experiment show that. The method proposed in this paper can resist the distortion caused by noise attack.
【學位授予單位】:電子科技大學
【學位級別】:碩士
【學位授予年份】:2014
【分類號】:TN912.3
【參考文獻】
相關期刊論文 前1條
1 楚克明;李芳;;基于LDA話題關聯(lián)的話題演化[J];上海交通大學學報;2010年11期
相關碩士學位論文 前1條
1 許剛;基于內(nèi)容的音頻檢索方法研究[D];電子科技大學;2006年
,本文編號:1629945
本文鏈接:http://sikaile.net/kejilunwen/wltx/1629945.html
最近更新
教材專著