基于音頻指紋和版本識別的音樂檢索技術(shù)研究
發(fā)布時間:2018-11-28 19:42
【摘要】:基于內(nèi)容的音樂檢索是當前音頻檢索的熱門領(lǐng)域,而且隨著在線音樂量的不斷增加,其應(yīng)用價值也越來越大。另一方面,用戶的檢索需求也在變化,他們往往不滿足于僅僅獲得與查詢完全相同的歌曲,還希望獲得目標音樂的多個版本,比如不同歌手、不同場合演唱的版本。隨著網(wǎng)絡(luò)自媒體的發(fā)展和業(yè)余翻唱的普及,這種需求也越來越明顯。 基于內(nèi)容的音樂檢索分別從查詢音樂和樣例音樂提取特征,然后進行特征匹配來檢索與查詢相同的樣例音樂。在樣例檢索中使用的特征通常稱為音頻指紋,其追求格式緊湊簡潔,傾向于匹配內(nèi)容相同的音樂片段,而音樂版本特征表達復(fù)雜,傾向于匹配版本特征相同的片段,而內(nèi)容并不一定相同。因此本文對兩者分開處理,音樂版本識別可以在規(guī)范樣例庫中離線進行,而基于音頻指紋的檢索實時進行,對于指紋檢索命中樣例,可以根據(jù)版本識別結(jié)果馬上給出相關(guān)樣例(即該歌曲的其它版本)。 由于人類聽覺性能良好,本文希望從基于聽覺機理的特征出發(fā)來構(gòu)建音頻指紋。在分析人耳的生理特征后,本文使用余弦基和發(fā)放函數(shù)來仿真耳蝸對聲音的處理流程,,然后使用稀疏分解得到特征系數(shù)。為了克服分解耗時較高的問題,提出了基于匹配追蹤算法的快速特征提取方法。 由于基于聽覺機理的稀疏特征形式復(fù)雜,并不適于直接用來檢索,本文將其壓縮轉(zhuǎn)換為音頻指紋。應(yīng)用的主要方法包括使用最小哈希對高維二值序列特征進行降維,以及使用局部敏感哈希進行快速檢索,然后給出相應(yīng)的候選確認和樣例檢出方法。實驗表明該指紋特征具有較好的檢索效率和表達性,對于輕微噪聲和時域全局性變化的魯棒性較好,但對時域局部變化魯棒性較差。 在音樂版本識別方面,本文首先分析了音樂版本領(lǐng)域內(nèi)的基礎(chǔ)定義、主要問題和通用處理方法。通過對識別流程梳理和各種方法比較分析,構(gòu)建出完整的音樂版本識別方法。本文對常用的諧波音級輪廓特征進行了改進,加入節(jié)拍和調(diào)移信息并作為版本識別的核心特征,而且在特征計算前應(yīng)用了必要的預(yù)處理步驟,包括峰值估計、節(jié)拍估計和參照頻率估計等。實驗結(jié)果顯示本文構(gòu)建的版本識別方法是有效的。
[Abstract]:Content-based music retrieval is a hot area of audio retrieval, and with the increasing of the amount of online music, its application value is increasing. On the other hand, the retrieval needs of users are also changing, they are often not satisfied with just getting the same songs as the query, and they also want to obtain multiple versions of the target music, such as different singers, different singing versions of different occasions. With the development of self-media and the popularity of amateur reproduction, this demand is becoming more and more obvious. Content-Based Music Retrieval (CBIR) extracts features from query music and sample music, and then performs feature matching to retrieve the same sample music as query. The features used in sample retrieval are usually called audio fingerprints, which pursue compact format and tend to match music segments with the same content, while the music version features are complex and tend to match segments with the same version features. And the content is not necessarily the same. Therefore, the music version recognition can be carried out offline in the canonical sample library, and the retrieval based on audio fingerprint can be carried out in real time. Depending on the version recognition result, you can immediately give the relevant sample (that is, other versions of the song). Because the human auditory performance is good, this paper hopes to construct audio fingerprint based on auditory mechanism. After analyzing the physiological characteristics of the human ear, the cosine basis and the firing function are used to simulate the processing process of the cochlea sound, and then the feature coefficients are obtained by sparse decomposition. In order to overcome the time-consuming problem of decomposition, a fast feature extraction method based on matching tracking algorithm is proposed. Because the sparse feature form based on auditory mechanism is complex, it is not suitable for direct retrieval. In this paper, the audio fingerprint is compressed and converted to audio fingerprint. The main methods of application include reducing dimension of high dimensional binary sequence features using minimum hash and fast retrieval using local sensitive hashes. Then the corresponding candidate validation and sample detection methods are given. Experiments show that the fingerprint feature has better retrieval efficiency and expressiveness, better robustness to slight noise and global variation in time domain, but less robust to local variation in time domain. In the aspect of music version recognition, this paper first analyzes the basic definition, main problems and general processing methods in the field of music version. By combing the identification process and comparing various methods, a complete music version recognition method is constructed. In this paper, the commonly used harmonic level contour features are improved by adding beat and modulation information as the core features of version recognition, and the necessary preprocessing steps, including peak estimation, are applied before feature calculation. Beat estimation and reference frequency estimation etc. Experimental results show that the proposed version recognition method is effective.
【學位授予單位】:哈爾濱工業(yè)大學
【學位級別】:碩士
【學位授予年份】:2014
【分類號】:TN912.34
本文編號:2364070
[Abstract]:Content-based music retrieval is a hot area of audio retrieval, and with the increasing of the amount of online music, its application value is increasing. On the other hand, the retrieval needs of users are also changing, they are often not satisfied with just getting the same songs as the query, and they also want to obtain multiple versions of the target music, such as different singers, different singing versions of different occasions. With the development of self-media and the popularity of amateur reproduction, this demand is becoming more and more obvious. Content-Based Music Retrieval (CBIR) extracts features from query music and sample music, and then performs feature matching to retrieve the same sample music as query. The features used in sample retrieval are usually called audio fingerprints, which pursue compact format and tend to match music segments with the same content, while the music version features are complex and tend to match segments with the same version features. And the content is not necessarily the same. Therefore, the music version recognition can be carried out offline in the canonical sample library, and the retrieval based on audio fingerprint can be carried out in real time. Depending on the version recognition result, you can immediately give the relevant sample (that is, other versions of the song). Because the human auditory performance is good, this paper hopes to construct audio fingerprint based on auditory mechanism. After analyzing the physiological characteristics of the human ear, the cosine basis and the firing function are used to simulate the processing process of the cochlea sound, and then the feature coefficients are obtained by sparse decomposition. In order to overcome the time-consuming problem of decomposition, a fast feature extraction method based on matching tracking algorithm is proposed. Because the sparse feature form based on auditory mechanism is complex, it is not suitable for direct retrieval. In this paper, the audio fingerprint is compressed and converted to audio fingerprint. The main methods of application include reducing dimension of high dimensional binary sequence features using minimum hash and fast retrieval using local sensitive hashes. Then the corresponding candidate validation and sample detection methods are given. Experiments show that the fingerprint feature has better retrieval efficiency and expressiveness, better robustness to slight noise and global variation in time domain, but less robust to local variation in time domain. In the aspect of music version recognition, this paper first analyzes the basic definition, main problems and general processing methods in the field of music version. By combing the identification process and comparing various methods, a complete music version recognition method is constructed. In this paper, the commonly used harmonic level contour features are improved by adding beat and modulation information as the core features of version recognition, and the necessary preprocessing steps, including peak estimation, are applied before feature calculation. Beat estimation and reference frequency estimation etc. Experimental results show that the proposed version recognition method is effective.
【學位授予單位】:哈爾濱工業(yè)大學
【學位級別】:碩士
【學位授予年份】:2014
【分類號】:TN912.34
【參考文獻】
相關(guān)期刊論文 前1條
1 于永彥;;基于Jaccard距離與概念聚類的多模型估計[J];計算機工程;2012年10期
本文編號:2364070
本文鏈接:http://sikaile.net/kejilunwen/wltx/2364070.html
最近更新
教材專著