基于呼吸的身份識(shí)別研究
發(fā)布時(shí)間:2017-12-27 08:24
本文關(guān)鍵詞:基于呼吸的身份識(shí)別研究 出處:《電子科技大學(xué)》2015年碩士論文 論文類型:學(xué)位論文
更多相關(guān)文章: 身份識(shí)別 文本無關(guān)型 呼吸 MFCC
【摘要】:說話人識(shí)別指的是通過語音進(jìn)行身份識(shí)別,它在自動(dòng)電話服務(wù)、法庭音頻取證等領(lǐng)域都有著非常廣泛的用途。前人對(duì)于說話人識(shí)別的研究主要集中于文本相關(guān)型系統(tǒng),它通過限定語音的內(nèi)容來進(jìn)行識(shí)別,然而這種系統(tǒng)的應(yīng)用范圍相當(dāng)有限。對(duì)比而言,本文致力于研究一種用途更為廣泛的說話人識(shí)別系統(tǒng),即文本無關(guān)型系統(tǒng)。文本無關(guān)的系統(tǒng)不會(huì)限定說話者說話的內(nèi)容,因此具有更高的難度。這種系統(tǒng)最大的一個(gè)挑戰(zhàn)就是如何從變化豐富的語音中提取出本文需要的特征,前人從頻譜特征到聲源特征到韻律、詞匯等越來越高層的特征上都有一些嘗試,但是其識(shí)別精度和系統(tǒng)可用度都有待提高。同時(shí)在提取高維度的特征時(shí),提取特征所用到的語音識(shí)別的方法越來越復(fù)雜,對(duì)計(jì)算能力的要求也逐漸增加。本論文提出了一種高效的文本無關(guān)的說話人識(shí)別方案,首次利用呼吸音來識(shí)別說話人,使得識(shí)別系統(tǒng)完全不會(huì)受到語音信號(hào)變化多樣性的影響。本文針對(duì)呼吸音在能量和頻譜上的特點(diǎn)提出了高效的呼吸提取方案和基于呼吸的說話人識(shí)別方案。呼吸提取的過程主要使用了MFCC(mel-frequency cepstrum coefficient)、過零率和能量參數(shù),采用兩步檢測(cè)的方案(初步探測(cè)和誤報(bào)檢測(cè))提高了提取呼吸的精度;特征提取的過程中對(duì)呼吸音的各項(xiàng)語音參數(shù)進(jìn)行了提取;最后通過輕量級(jí)的高斯模型和貝葉斯理論進(jìn)行建模和決策,為了減少系統(tǒng)的復(fù)雜度,本文中使用的都是用輕量級(jí)的模型和方法。本文首先簡(jiǎn)要介紹了相關(guān)的語音識(shí)別技術(shù),隨后在此基礎(chǔ)上完成了基于呼吸的身份識(shí)別系統(tǒng)的詳細(xì)設(shè)計(jì),最后通過自己收集的34人340段的語音數(shù)據(jù)上,利用matlab中進(jìn)行了仿真和測(cè)試,得到實(shí)驗(yàn)結(jié)果。實(shí)驗(yàn)結(jié)果表明,以呼吸中的MFCC為參數(shù),得到的FAR(False Accept Rate)和FRR(False Rejection Rate)均在10%以下。而且當(dāng)測(cè)試語音時(shí)長(zhǎng)較長(zhǎng)時(shí),精度會(huì)進(jìn)一步提高,對(duì)于大于一分鐘的語音數(shù)據(jù),FAR和FRR可以達(dá)到5%以下。
[Abstract]:Speaker recognition refers to identity recognition through speech. It has a very wide range of uses in the fields of automatic telephone service, forensic audio forensics and other fields. Previous researches on speaker recognition mainly focus on text dependent system. It can recognize speech content by restricting the content of speech, but the application scope of this system is rather limited. In contrast, this paper is devoted to a more widely used speaker recognition system, that is, a text independent system. A text - independent system does not limit the content of the speaker's speech, so it has a higher degree of difficulty. A challenge to this system is how to change from the speech rich extracts feature needed in this paper, from the previous spectrum to the sound source characteristics to rhythm, vocabulary more and more high-level have some attempts, but its accuracy and availability of the system has to be improved. At the same time, when the features of high dimension are extracted, the method of speech recognition used to extract features is becoming more and more complex, and the demand for computing ability is increasing gradually. In this paper, an efficient text independent speaker recognition scheme is proposed. First, we use breath sounds to identify speakers, so that the recognition system is not affected by the diversity of speech signals. In this paper, an efficient breathing extraction scheme and a speaker recognition scheme based on respiration are proposed in view of the characteristics of respiratory sound on energy and spectrum. The extraction process of respiration mainly use the MFCC (Mel-frequency cepstrum coefficient), zero crossing rate and energy parameters, the two step detection scheme (preliminary detection and false alarm detection improves the accuracy of extraction of breath); the feature extraction process of the speech parameters of respiratory sounds were extracted; finally through the Gauss model and the theory of Bias lightweight modeling and decision-making, in order to reduce the complexity of the system, this paper use a lightweight model and method. This paper briefly introduces the speech recognition technology, then based on the detailed design of identity recognition system based on the last breath, through the voice data 34 people own collection of 340 segments, the use of MATLAB in the simulation and test, the experimental results obtained. The experimental results show that the FAR (False Accept Rate) and FRR (False Rejection Rate) obtained from the MFCC in the respiration are below 10%. And when the test speech is long, the accuracy will be further improved, for more than one minute of speech data, FAR and FRR can reach less than 5%.
【學(xué)位授予單位】:電子科技大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2015
【分類號(hào)】:TN912.34
【參考文獻(xiàn)】
相關(guān)期刊論文 前1條
1 甄斌,吳璽宏,劉志敏,遲惠生;語音識(shí)別和說話人識(shí)別中各倒譜分量的相對(duì)重要性[J];北京大學(xué)學(xué)報(bào)(自然科學(xué)版);2001年03期
,本文編號(hào):1340926
本文鏈接:http://sikaile.net/kejilunwen/wltx/1340926.html
最近更新
教材專著