基于深度信念網(wǎng)絡的說話者識別研究與實現(xiàn)
發(fā)布時間:2018-05-20 17:22
本文選題:說話人識別 + 深度神經(jīng)網(wǎng)絡。 參考:《南京郵電大學》2017年碩士論文
【摘要】:隨著多媒體信息技術(shù)的快速發(fā)展,網(wǎng)絡語音資源呈現(xiàn)出了爆炸式地增長,因此如何利用語音進行分類和識別具有重要的意義。說話人識別技術(shù)可以利用少量聲音數(shù)據(jù)區(qū)分說話人,從而實現(xiàn)身份認證的功能,它是語音信號處理中的關(guān)鍵技術(shù)。但是傳統(tǒng)的說話人識別系統(tǒng)往往還存在學習不充分、網(wǎng)絡模型深度不夠以及語料數(shù)據(jù)不充分的情況下識別系統(tǒng)的真實模型往往復雜度不夠等情況。本文在分析說話人識別方法優(yōu)缺點基礎上使用深度學習技術(shù)設計實現(xiàn)一個說話人識別的系統(tǒng)。本文的主要工作如下:(1)歸納了說話人識別方法和特征提取方式的特點和困難點,對比分析目前常用的各種說話人識別技術(shù)策略、模型和算法之間的優(yōu)缺點。(2)研究了基于深度學習的說話人識別框架。將深度學習理論應用到傳統(tǒng)的說話人識別系統(tǒng),使用受限的玻爾茲曼機和后向傳播算法訓練深度信念網(wǎng)絡,從而克服了直接對多層網(wǎng)絡模型進行訓練的效率問題。(3)引入信道環(huán)境下i-vector分析方法的說話人識別,并在i-vector方法基礎上,對傳統(tǒng)高斯混合型說話人識別進行改善,提出一種使用無壓縮i-vector形式和深度學習相結(jié)合的方法。在使用無壓縮i-vector形式的深度學習說話人識別方法上測試和傳統(tǒng)方法比對識別率的影響;不同性別對識別率的影響。(4)根據(jù)說話人識別的處理流程,進而給出基于深度學習說話人識別的系統(tǒng)結(jié)構(gòu),對其中的核心模塊進行了具體設計并予以仿真實現(xiàn),最后對各類說話人識別系統(tǒng)的性能展開測試并對測試效果分析。
[Abstract]:With the rapid development of multimedia information technology, the network speech resources show explosive growth, so how to use speech classification and recognition has important significance. Speaker recognition is a key technology in speech signal processing, which can distinguish the speaker with a small amount of sound data and realize the function of identity authentication. However, traditional speaker recognition systems often have insufficient learning, insufficient depth of the network model and insufficient corpus data to identify the real model of the system is often not enough complexity and so on. On the basis of analyzing the advantages and disadvantages of speaker recognition methods, this paper designs and implements a speaker recognition system using depth learning technology. The main work of this paper is as follows: (1) the characteristics and difficulties of speaker recognition methods and feature extraction methods are summarized, and various commonly used speaker recognition techniques are compared and analyzed. The advantages and disadvantages between the model and the algorithm. 2) the speaker recognition framework based on deep learning is studied. The depth learning theory is applied to the traditional speaker recognition system. The restricted Boltzmann machine and the backward propagation algorithm are used to train the depth belief network. It overcomes the efficiency problem of training the multilayer network model directly. It introduces the speaker recognition of i-vector analysis method under the channel environment, and improves the traditional Gao Si hybrid speaker recognition based on the i-vector method. This paper presents a method of combining uncompressed i-vector with depth learning. To test and compare the effects of traditional methods on recognition rate in depth learning speaker recognition methods using uncompressed i-vector forms; the effect of gender on recognition rate. 4) according to the processing process of speaker recognition, Furthermore, the structure of speaker recognition system based on depth learning is given, and the core modules are designed and simulated. Finally, the performance of various speaker recognition systems is tested and the test results are analyzed.
【學位授予單位】:南京郵電大學
【學位級別】:碩士
【學位授予年份】:2017
【分類號】:TN912.34
【參考文獻】
相關(guān)期刊論文 前5條
1 于俊婷;劉伍穎;易綿竹;李雪;李娜;;國內(nèi)語音識別研究綜述[J];計算機光盤軟件與應用;2014年10期
2 余凱;賈磊;陳雨強;徐偉;;深度學習的昨天、今天和明天[J];計算機研究與發(fā)展;2013年09期
3 禹琳琳;;語音識別技術(shù)及應用綜述[J];現(xiàn)代電子技術(shù);2013年13期
4 李海峰;李純果;;深度學習結(jié)構(gòu)和算法比較分析[J];河北大學學報(自然科學版);2012年05期
5 甄斌,吳璽宏,劉志敏,遲惠生;語音識別和說話人識別中各倒譜分量的相對重要性[J];北京大學學報(自然科學版);2001年03期
相關(guān)碩士學位論文 前4條
1 耿國勝;基于深度學習的說話人識別技術(shù)研究[D];大連理工大學;2014年
2 楊迪;基于多特征決策融合的說話人識別研究[D];華北電力大學;2013年
3 熊華喬;基于模型聚類的說話人識別方法研究[D];武漢理工大學;2012年
4 陸春梅;與文本無關(guān)的開集說話人識別技術(shù)研究[D];西南交通大學;2011年
,本文編號:1915554
本文鏈接:http://sikaile.net/kejilunwen/xinxigongchenglunwen/1915554.html
最近更新
教材專著