基于音頻指紋和版本識(shí)別的音樂(lè)檢索技術(shù)研究
發(fā)布時(shí)間:2018-11-28 19:42
【摘要】:基于內(nèi)容的音樂(lè)檢索是當(dāng)前音頻檢索的熱門領(lǐng)域,而且隨著在線音樂(lè)量的不斷增加,其應(yīng)用價(jià)值也越來(lái)越大。另一方面,用戶的檢索需求也在變化,他們往往不滿足于僅僅獲得與查詢完全相同的歌曲,還希望獲得目標(biāo)音樂(lè)的多個(gè)版本,比如不同歌手、不同場(chǎng)合演唱的版本。隨著網(wǎng)絡(luò)自媒體的發(fā)展和業(yè)余翻唱的普及,這種需求也越來(lái)越明顯。 基于內(nèi)容的音樂(lè)檢索分別從查詢音樂(lè)和樣例音樂(lè)提取特征,然后進(jìn)行特征匹配來(lái)檢索與查詢相同的樣例音樂(lè)。在樣例檢索中使用的特征通常稱為音頻指紋,其追求格式緊湊簡(jiǎn)潔,傾向于匹配內(nèi)容相同的音樂(lè)片段,而音樂(lè)版本特征表達(dá)復(fù)雜,傾向于匹配版本特征相同的片段,而內(nèi)容并不一定相同。因此本文對(duì)兩者分開(kāi)處理,音樂(lè)版本識(shí)別可以在規(guī)范樣例庫(kù)中離線進(jìn)行,而基于音頻指紋的檢索實(shí)時(shí)進(jìn)行,對(duì)于指紋檢索命中樣例,可以根據(jù)版本識(shí)別結(jié)果馬上給出相關(guān)樣例(即該歌曲的其它版本)。 由于人類聽(tīng)覺(jué)性能良好,本文希望從基于聽(tīng)覺(jué)機(jī)理的特征出發(fā)來(lái)構(gòu)建音頻指紋。在分析人耳的生理特征后,本文使用余弦基和發(fā)放函數(shù)來(lái)仿真耳蝸對(duì)聲音的處理流程,,然后使用稀疏分解得到特征系數(shù)。為了克服分解耗時(shí)較高的問(wèn)題,提出了基于匹配追蹤算法的快速特征提取方法。 由于基于聽(tīng)覺(jué)機(jī)理的稀疏特征形式復(fù)雜,并不適于直接用來(lái)檢索,本文將其壓縮轉(zhuǎn)換為音頻指紋。應(yīng)用的主要方法包括使用最小哈希對(duì)高維二值序列特征進(jìn)行降維,以及使用局部敏感哈希進(jìn)行快速檢索,然后給出相應(yīng)的候選確認(rèn)和樣例檢出方法。實(shí)驗(yàn)表明該指紋特征具有較好的檢索效率和表達(dá)性,對(duì)于輕微噪聲和時(shí)域全局性變化的魯棒性較好,但對(duì)時(shí)域局部變化魯棒性較差。 在音樂(lè)版本識(shí)別方面,本文首先分析了音樂(lè)版本領(lǐng)域內(nèi)的基礎(chǔ)定義、主要問(wèn)題和通用處理方法。通過(guò)對(duì)識(shí)別流程梳理和各種方法比較分析,構(gòu)建出完整的音樂(lè)版本識(shí)別方法。本文對(duì)常用的諧波音級(jí)輪廓特征進(jìn)行了改進(jìn),加入節(jié)拍和調(diào)移信息并作為版本識(shí)別的核心特征,而且在特征計(jì)算前應(yīng)用了必要的預(yù)處理步驟,包括峰值估計(jì)、節(jié)拍估計(jì)和參照頻率估計(jì)等。實(shí)驗(yàn)結(jié)果顯示本文構(gòu)建的版本識(shí)別方法是有效的。
[Abstract]:Content-based music retrieval is a hot area of audio retrieval, and with the increasing of the amount of online music, its application value is increasing. On the other hand, the retrieval needs of users are also changing, they are often not satisfied with just getting the same songs as the query, and they also want to obtain multiple versions of the target music, such as different singers, different singing versions of different occasions. With the development of self-media and the popularity of amateur reproduction, this demand is becoming more and more obvious. Content-Based Music Retrieval (CBIR) extracts features from query music and sample music, and then performs feature matching to retrieve the same sample music as query. The features used in sample retrieval are usually called audio fingerprints, which pursue compact format and tend to match music segments with the same content, while the music version features are complex and tend to match segments with the same version features. And the content is not necessarily the same. Therefore, the music version recognition can be carried out offline in the canonical sample library, and the retrieval based on audio fingerprint can be carried out in real time. Depending on the version recognition result, you can immediately give the relevant sample (that is, other versions of the song). Because the human auditory performance is good, this paper hopes to construct audio fingerprint based on auditory mechanism. After analyzing the physiological characteristics of the human ear, the cosine basis and the firing function are used to simulate the processing process of the cochlea sound, and then the feature coefficients are obtained by sparse decomposition. In order to overcome the time-consuming problem of decomposition, a fast feature extraction method based on matching tracking algorithm is proposed. Because the sparse feature form based on auditory mechanism is complex, it is not suitable for direct retrieval. In this paper, the audio fingerprint is compressed and converted to audio fingerprint. The main methods of application include reducing dimension of high dimensional binary sequence features using minimum hash and fast retrieval using local sensitive hashes. Then the corresponding candidate validation and sample detection methods are given. Experiments show that the fingerprint feature has better retrieval efficiency and expressiveness, better robustness to slight noise and global variation in time domain, but less robust to local variation in time domain. In the aspect of music version recognition, this paper first analyzes the basic definition, main problems and general processing methods in the field of music version. By combing the identification process and comparing various methods, a complete music version recognition method is constructed. In this paper, the commonly used harmonic level contour features are improved by adding beat and modulation information as the core features of version recognition, and the necessary preprocessing steps, including peak estimation, are applied before feature calculation. Beat estimation and reference frequency estimation etc. Experimental results show that the proposed version recognition method is effective.
【學(xué)位授予單位】:哈爾濱工業(yè)大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2014
【分類號(hào)】:TN912.34
本文編號(hào):2364070
[Abstract]:Content-based music retrieval is a hot area of audio retrieval, and with the increasing of the amount of online music, its application value is increasing. On the other hand, the retrieval needs of users are also changing, they are often not satisfied with just getting the same songs as the query, and they also want to obtain multiple versions of the target music, such as different singers, different singing versions of different occasions. With the development of self-media and the popularity of amateur reproduction, this demand is becoming more and more obvious. Content-Based Music Retrieval (CBIR) extracts features from query music and sample music, and then performs feature matching to retrieve the same sample music as query. The features used in sample retrieval are usually called audio fingerprints, which pursue compact format and tend to match music segments with the same content, while the music version features are complex and tend to match segments with the same version features. And the content is not necessarily the same. Therefore, the music version recognition can be carried out offline in the canonical sample library, and the retrieval based on audio fingerprint can be carried out in real time. Depending on the version recognition result, you can immediately give the relevant sample (that is, other versions of the song). Because the human auditory performance is good, this paper hopes to construct audio fingerprint based on auditory mechanism. After analyzing the physiological characteristics of the human ear, the cosine basis and the firing function are used to simulate the processing process of the cochlea sound, and then the feature coefficients are obtained by sparse decomposition. In order to overcome the time-consuming problem of decomposition, a fast feature extraction method based on matching tracking algorithm is proposed. Because the sparse feature form based on auditory mechanism is complex, it is not suitable for direct retrieval. In this paper, the audio fingerprint is compressed and converted to audio fingerprint. The main methods of application include reducing dimension of high dimensional binary sequence features using minimum hash and fast retrieval using local sensitive hashes. Then the corresponding candidate validation and sample detection methods are given. Experiments show that the fingerprint feature has better retrieval efficiency and expressiveness, better robustness to slight noise and global variation in time domain, but less robust to local variation in time domain. In the aspect of music version recognition, this paper first analyzes the basic definition, main problems and general processing methods in the field of music version. By combing the identification process and comparing various methods, a complete music version recognition method is constructed. In this paper, the commonly used harmonic level contour features are improved by adding beat and modulation information as the core features of version recognition, and the necessary preprocessing steps, including peak estimation, are applied before feature calculation. Beat estimation and reference frequency estimation etc. Experimental results show that the proposed version recognition method is effective.
【學(xué)位授予單位】:哈爾濱工業(yè)大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2014
【分類號(hào)】:TN912.34
【參考文獻(xiàn)】
相關(guān)期刊論文 前1條
1 于永彥;;基于Jaccard距離與概念聚類的多模型估計(jì)[J];計(jì)算機(jī)工程;2012年10期
本文編號(hào):2364070
本文鏈接:http://sikaile.net/kejilunwen/wltx/2364070.html
最近更新
教材專著