基于計算聽覺場景分析的雙說話人混合語音分離研究
發(fā)布時間:2018-08-12 19:50
【摘要】:隨著信息技術(shù)的發(fā)展,語音信號處理與搜索引擎和人工智能等領(lǐng)域聯(lián)系緊密,而基于計算聽覺場景分析的語音信號分離在多媒體檢索和機(jī)器人研究等方向上具有廣闊的應(yīng)用前景,也逐漸成為研究人員的研究重點。目前基于計算聽覺場景分析的語音分離系統(tǒng)中,針對多個說話人混合語音的分離難以取得令人滿意的效果,原因在于大部分計算聽覺場景分析系統(tǒng)在提取基音階段不能準(zhǔn)確得到多個基音的軌跡,進(jìn)而影響到語音的分離,另一方面許多分離系統(tǒng)在組織階段采用訓(xùn)練模型,需要依賴樣本訓(xùn)練的有效性以及說話人的先驗知識。 在現(xiàn)有研究基礎(chǔ)上,本文提出一種雙說話人混合語音分離方法,主要研究內(nèi)容包括: (1)提出基于隱馬爾科夫模型的多基音跟蹤方法。首先通過外圍處理模塊將語音信號分解成時頻單元。其次,在基音跟蹤階段利用語音信號的統(tǒng)計特性,通過基于隱馬爾科夫模型的多基音跟蹤算法計算出混合語音中多個基音軌跡,并設(shè)計出能在多個基音存在情況下完成時頻標(biāo)記的方法,得到同時語音流。實驗表明該方法在提取多說話人語音材料基音軌跡的有效性。 (2)提出基于聚類的序列組合方法。首先提取混合語音材料中的gammatone倒譜系數(shù),提出基于類內(nèi)散布矩陣與類間散布矩陣的目標(biāo)函數(shù),然后通過最大化類內(nèi)散布矩陣與類間散布矩陣的跡,搜索同時語音流的最佳分類,最終完成對雙說話人的語音分離。實驗表明該方法在分離雙說話人混合語音的有效性。
[Abstract]:With the development of information technology, voice signal processing is closely related to search engine and artificial intelligence. The separation of speech signals based on computational auditory scene analysis has broad application prospects in multimedia retrieval and robot research and has gradually become the focus of researchers. At present, in the speech separation system based on computational auditory scene analysis, it is difficult to achieve satisfactory results for multi-speaker mixed speech separation. The reason lies in the fact that most of the computational auditory scene analysis systems can not accurately obtain multiple pitch tracks in the pitch extraction stage, which in turn affect the speech separation. On the other hand, many separation systems adopt training models in the organizational phase. It depends on the validity of the sample training and the prior knowledge of the speaker. On the basis of the existing research, this paper proposes a method of dual-speaker mixed speech separation. The main research contents are as follows: (1) A multi-pitch tracking method based on Hidden Markov Model is proposed. Firstly, the speech signal is decomposed into time-frequency unit by peripheral processing module. Secondly, in the pitch tracking stage, using the statistical characteristics of the speech signal, the multiple pitch tracking algorithm based on Hidden Markov Model is used to calculate multiple pitch tracks in the mixed speech. A method is designed to complete the time-frequency tag in the presence of multiple pitch, and the simultaneous speech stream is obtained. Experiments show that the proposed method is effective in extracting pitch trajectories of multi-speaker speech materials. (2) A clustering based sequence combination method is proposed. Firstly, the gammatone cepstrum number is extracted from the mixed speech materials, and the objective function based on the intra-class dispersion matrix and the inter-class dispersion matrix is proposed. Then, by maximizing the trace between the intra-class dispersion matrix and the inter-class dispersion matrix, the optimal classification of simultaneous speech flow is searched. The final completion of the dual speaker speech separation. Experiments show that the proposed method is effective in separating dual speaker mixed speech.
【學(xué)位授予單位】:廣西大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2014
【分類號】:TN912.3
本文編號:2180174
[Abstract]:With the development of information technology, voice signal processing is closely related to search engine and artificial intelligence. The separation of speech signals based on computational auditory scene analysis has broad application prospects in multimedia retrieval and robot research and has gradually become the focus of researchers. At present, in the speech separation system based on computational auditory scene analysis, it is difficult to achieve satisfactory results for multi-speaker mixed speech separation. The reason lies in the fact that most of the computational auditory scene analysis systems can not accurately obtain multiple pitch tracks in the pitch extraction stage, which in turn affect the speech separation. On the other hand, many separation systems adopt training models in the organizational phase. It depends on the validity of the sample training and the prior knowledge of the speaker. On the basis of the existing research, this paper proposes a method of dual-speaker mixed speech separation. The main research contents are as follows: (1) A multi-pitch tracking method based on Hidden Markov Model is proposed. Firstly, the speech signal is decomposed into time-frequency unit by peripheral processing module. Secondly, in the pitch tracking stage, using the statistical characteristics of the speech signal, the multiple pitch tracking algorithm based on Hidden Markov Model is used to calculate multiple pitch tracks in the mixed speech. A method is designed to complete the time-frequency tag in the presence of multiple pitch, and the simultaneous speech stream is obtained. Experiments show that the proposed method is effective in extracting pitch trajectories of multi-speaker speech materials. (2) A clustering based sequence combination method is proposed. Firstly, the gammatone cepstrum number is extracted from the mixed speech materials, and the objective function based on the intra-class dispersion matrix and the inter-class dispersion matrix is proposed. Then, by maximizing the trace between the intra-class dispersion matrix and the inter-class dispersion matrix, the optimal classification of simultaneous speech flow is searched. The final completion of the dual speaker speech separation. Experiments show that the proposed method is effective in separating dual speaker mixed speech.
【學(xué)位授予單位】:廣西大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2014
【分類號】:TN912.3
【參考文獻(xiàn)】
相關(guān)期刊論文 前1條
1 張學(xué)良;劉文舉;李鵬;徐波;;改進(jìn)諧波組織規(guī)則的單通道濁語音分離系統(tǒng)[J];聲學(xué)學(xué)報;2011年01期
相關(guān)博士學(xué)位論文 前1條
1 趙立恒;基于計算聽覺場景分析的單聲道語音分離研究[D];中國科學(xué)技術(shù)大學(xué);2012年
,本文編號:2180174
本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2180174.html
最近更新
教材專著