天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當前位置:主頁 > 科技論文 > 信息工程論文 >

基于多深度模型集成的音頻場景分類方法研究

發(fā)布時間:2018-03-14 12:15

  本文選題:音頻場景分類 切入點:深度學習 出處:《哈爾濱工業(yè)大學》2017年碩士論文 論文類型:學位論文


【摘要】:音頻場景分類(Acoustic Scene Classification,ASC)是計算機聽覺場景分析(Computational Auditory Scene Analysis,CASA)領域的一種特定任務,它根據(jù)音頻流的聲學內容,識別其所對應的特定場景語義標簽,進而達到感知和理解周邊環(huán)境的目的。與致力于理解人類感知音頻場景機制的心理學研究不同,音頻場景識別主要依賴信號處理技術和機器學習方法實現(xiàn)自動識別音頻場景。傳統(tǒng)的ASC任務,主要針對單個場景進行特征提取和分類器選擇。隨著音頻采集設備的迅猛發(fā)展,各種各樣的音頻數(shù)據(jù)被大量收集,傳統(tǒng)的信號處理技術和識別方法面臨著重大的挑戰(zhàn),急需研究新的技術改善現(xiàn)狀。為了充分的利用繁多的音頻場景數(shù)據(jù),本文嘗試了各種深度學習方法,如多層感知機(Multi-Layer Perceptron,MLP)、卷積神經網(wǎng)絡(Convolutional Neural Network,CNN)、長短時循環(huán)神經網(wǎng)絡(Long Short-Term Memory,LSTM)等。首先,提取音頻的幀級特征,包括:梅爾頻率倒譜系數(shù)MFCC(Mel-Frequency Cepstral Coefficients,MFCC)特征和對數(shù)梅爾譜(Log-Mel Spectrogram)特征,然后將音頻幀拼接成段特征,輸入到深度學習模型進行識別分類。為了改善基于LSTM模型的ASC系統(tǒng),本文提出了一種基于亂序自助采樣法的段處理技術。這種段處理技術不僅可以模擬復雜的時序組合關系,而且可以擴大訓練數(shù)據(jù)規(guī)模,從而使模型的泛化能力更強。為了改善基于MLP模型的ASC方法,本文在模型結構中引入了Attention機制。通過引入Attention機制,可以突破數(shù)據(jù)全局表征的局限,更關注數(shù)據(jù)的關鍵部分。同時,Attention機制能很好的處理去耦合問題,即用不同的特征空間來描述不同的場景。不同種類的深度學習方法對不同場景的識別能力不同,如MLP能很好的識別的場景是沙灘、居民區(qū),而CNN更易區(qū)分圖書館、公交車。而集成學習通過將多個學習器進行結合,?色@得比單一學習器顯著優(yōu)越的泛化性能。所以,為了集成各種分類器的在不同場景上的識別優(yōu)勢,本文采用了各種集成學習融合方法,其中基于BAGGING(Bootstrap AGGregat ING)框架的集成選擇方法,使得ASC任務的分類性能得到了明顯提升。
[Abstract]:Audio scene classification is a specific task in the field of computer auditory Auditory Scene Analysis (CASA), which recognizes the semantic labels of specific scenes according to the acoustic content of audio streams. To achieve the purpose of perceiving and understanding the surrounding environment, as opposed to the psychological research devoted to understanding the mechanism of human perception of audio scene, Audio scene recognition mainly depends on signal processing technology and machine learning method to realize automatic recognition of audio scene. Traditional ASC task mainly focuses on feature extraction and classifier selection for a single scene. All kinds of audio data are collected in large quantities. Traditional signal processing technology and recognition methods are facing great challenges. It is urgent to study new technologies to improve the current situation. In order to make full use of a wide range of audio scene data, This paper attempts various depth learning methods, such as Multi-Layer Perceptron (MLP), Convolutional Neural Network (CNN), long Short-Term memory (LSTM) and so on. Firstly, the frame level features of audio frequency are extracted. It includes: Mel frequency cepstrum coefficient MFCC(Mel-Frequency Cepstral coefficients (MFCC) feature and log-Mel spectrum Log-Mel spectrogramgram feature. Then the audio frames are spliced into segment features and input into the depth learning model for recognition and classification. In order to improve the ASC system based on LSTM model, In this paper, a segment processing technique based on out-of-order self-help sampling method is proposed, which can not only simulate complex time series combination relations, but also enlarge the scale of training data. In order to improve the ASC method based on MLP model, the Attention mechanism is introduced into the model structure. By introducing the Attention mechanism, the limitation of global representation of data can be broken through. At the same time, the attention mechanism can deal with the decoupling problem well, that is to say, different feature spaces are used to describe different scenarios. Different kinds of depth learning methods have different recognition ability for different scenes. For example, MLP can well identify the scenes of beach and residential areas, while CNN is easier to distinguish between libraries and buses. Integrated learning can often achieve significantly better generalization performance than a single learner by combining multiple learning devices. In order to integrate the recognition advantages of various classifiers in different scenes, this paper adopts a variety of integrated learning fusion methods, in which the ensemble selection method based on BAGGING(Bootstrap AGGregat frame makes the classification performance of ASC tasks obviously improved.
【學位授予單位】:哈爾濱工業(yè)大學
【學位級別】:碩士
【學位授予年份】:2017
【分類號】:TP18;TN912.3

【參考文獻】

相關碩士學位論文 前2條

1 史秋瑩;基于深度學習和遷移學習的環(huán)境聲音識別[D];哈爾濱工業(yè)大學;2016年

2 陳晨;I-VECTOR說話人識別中基于偏最小二乘的總變化空間估計方法[D];哈爾濱工業(yè)大學;2015年



本文編號:1611162

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/xinxigongchenglunwen/1611162.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權申明:資料由用戶0c825***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com