基于用戶評論的自動化音樂分類方法
發(fā)布時間:2018-05-12 23:01
本文選題:音樂分類 + 分詞模型; 參考:《中國科學(xué)技術(shù)大學(xué)》2017年碩士論文
【摘要】:音樂分類作為音樂信息檢索(Music Information Retrieval,MIR)領(lǐng)域的一個重要分支,常用于音樂檢索和音樂推薦。現(xiàn)有的音樂分類方法從音樂的流派、感情、樂器、藝術(shù)家和標(biāo)注五個角度進(jìn)行分類。但是這些分類方法都過于局限,它們將音樂的類別限制在了一個固定的范圍內(nèi),導(dǎo)致用戶無法根據(jù)音樂的細(xì)節(jié)信息進(jìn)行音樂檢索。針對音樂分類類別固定、搜索內(nèi)容過于局限的問題,本文提出了一種基于用戶評論的自動化音樂分類方法。此方法不再局限于已有的音樂類別,可以得到更為多樣化的分類結(jié)果,為用戶提供更個性化的檢索體驗(yàn)。該方法的出發(fā)點(diǎn)為用戶對音樂的評論描述更為深入,這些詳細(xì)的描述對音樂的分類有重要的參考價值。本文的主要工作如下:1)首先利用線性鏈條件隨機(jī)場(linear Conditional Random Field,linear CRF)模型識別專業(yè)名詞。然后使用N元詞串提取和緊密度分析方法,利用種子生成的思想得到適合音樂語料分詞的字典。此混合方法能獲取較為準(zhǔn)確和豐富的字典,降低基于統(tǒng)計的分詞模型對標(biāo)注語料的需求。2)使用linearCRF和上述的音樂字典進(jìn)行分詞。接著使用基于詞匯緊密度分析的分合測試評估分詞結(jié)果。接著使用MMSEG(Max Matching Segmentation)模型進(jìn)行分詞修正,使得修正后的分詞結(jié)果擁有較高的準(zhǔn)確率。3)對比多個關(guān)鍵詞提取算法,選擇TFIDF(Term Frequency-Inverse Document Frequency)算法并優(yōu)化,削弱了詞頻在提取過程中的影響,提高了候選標(biāo)簽的準(zhǔn)確性。然后再從全局角度對候選標(biāo)簽進(jìn)行過濾,得到音樂的關(guān)聯(lián)標(biāo)簽。4)建立音樂的多標(biāo)簽概率分類模型,對音樂進(jìn)行分類。5)嘗試對音樂標(biāo)簽按相似程度進(jìn)行聚類,減小對音樂分類模型的影響。實(shí)驗(yàn)結(jié)果表明,該音樂分類方法準(zhǔn)確率較高,可以無監(jiān)督地獲取音樂多個維度的標(biāo)簽,為個性化的音樂檢索提供了保障。
[Abstract]:As an important branch of music Information retrieval, music classification is often used in music retrieval and music recommendation. The existing music classification methods are classified from five aspects: genre, emotion, musical instrument, artist and label. However, these classification methods are too limited, they limit the category of music to a fixed range, so users can not search the music according to the details of the music. In order to solve the problem of fixed categories of music classification and too limited search content, an automatic music classification method based on user comments is proposed in this paper. This method is no longer limited to the existing music categories and can obtain more diversified classification results and provide users with more personalized retrieval experience. The starting point of this method is that the user's comments on music are more in-depth, and these detailed descriptions have important reference value for the classification of music. The main work of this paper is as follows: 1) first, we use linear Conditional Random Conditional Random nonlinear CRF model to identify professional nouns. Then, using the method of N element string extraction and compactness analysis, a dictionary suitable for music corpus segmentation is obtained by using the idea of seed generation. This hybrid method can obtain more accurate and rich dictionaries and reduce the need of tagging corpus based on statistical participle model. (2) linearCRF and the music dictionary mentioned above are used to segment words. Then the word segmentation results were evaluated by compositional test based on lexical compactness analysis. Then we use the MMSEG(Max Matching Segmentation) model to modify the word segmentation, which makes the modified segmentation result have higher accuracy. 3) comparing with many keyword extraction algorithms, selecting and optimizing the TFIDF(Term Frequency-Inverse Document frequency algorithm, which weakens the influence of word frequency in the extraction process. The accuracy of candidate labels is improved. Then the candidate labels are filtered from the global perspective, and the associated labels of music. 4) the multi-label probability classification model of music is established, and the music is classified. 5) try to cluster the music labels according to the similarity degree. Reduce the influence on the music classification model. The experimental results show that this music classification method has a high accuracy and can obtain labels of multiple dimensions of music without supervision, which provides a guarantee for individualized music retrieval.
【學(xué)位授予單位】:中國科學(xué)技術(shù)大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:TP391.3
【參考文獻(xiàn)】
相關(guān)碩士學(xué)位論文 前1條
1 黃翼彪;開源中文分詞器的比較研究[D];鄭州大學(xué);2013年
,本文編號:1880517
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/1880517.html
最近更新
教材專著