基于性別分類(lèi)的說(shuō)話(huà)人識(shí)別研究
本文關(guān)鍵詞: 漢語(yǔ)方言數(shù)據(jù)庫(kù) 性別識(shí)別 說(shuō)話(huà)人識(shí)別 矢量量化 支持向量機(jī) 出處:《江蘇師范大學(xué)》2012年碩士論文 論文類(lèi)型:學(xué)位論文
【摘要】:語(yǔ)音信號(hào)既包含了說(shuō)話(huà)人的語(yǔ)義信息,又包含了說(shuō)話(huà)人的個(gè)性信息,人們從中可以提取說(shuō)話(huà)人的性別、年齡、籍貫等身份信息。說(shuō)話(huà)人識(shí)別是根據(jù)語(yǔ)音信號(hào)中反映說(shuō)話(huà)人的語(yǔ)音參數(shù)自動(dòng)確定說(shuō)話(huà)人身份的技術(shù)。作為一種生物認(rèn)證技術(shù),在信息檢索、公安破案、語(yǔ)音身份驗(yàn)證、電話(huà)銀行等領(lǐng)域具有重要的應(yīng)用價(jià)值和廣泛的應(yīng)用前景。論文從數(shù)據(jù)采集到特征提取和分類(lèi)識(shí)別進(jìn)行了系統(tǒng)研究,取得了下列創(chuàng)新性成果。 1、建立一個(gè)漢語(yǔ)方言語(yǔ)音數(shù)據(jù)庫(kù) 參照國(guó)際上語(yǔ)音語(yǔ)料庫(kù)的設(shè)計(jì)標(biāo)準(zhǔn),考慮錄音通道、方言種類(lèi)、話(huà)者年齡與性別分布的選擇。最終建立起一個(gè)涵蓋了閩、粵、吳、湘、北方、贛、客家等七種地方方言和普通話(huà)的漢語(yǔ)方言語(yǔ)音數(shù)據(jù)庫(kù)。包括寬帶語(yǔ)音(麥克風(fēng))和窄帶語(yǔ)音(手機(jī)、固定電話(huà)),,106小時(shí)的語(yǔ)音數(shù)據(jù)。 2、提出一種基于碼本模型的性別辨識(shí)方法 首次在性別識(shí)別研究中引入半監(jiān)督聚類(lèi)技術(shù),利用半監(jiān)督學(xué)習(xí)的思想對(duì)漢語(yǔ)方言的語(yǔ)音數(shù)據(jù)進(jìn)行矢量量化,形成具有監(jiān)督信息的男、女性別碼本的模型。該方法充分考慮了語(yǔ)音特征空間的概率分布狀態(tài),優(yōu)化了碼本的生成方法,提高了碼本模型的精確度,解決了傳統(tǒng)矢量量化算法中碼本生成精度低的問(wèn)題,有效提高了系統(tǒng)的識(shí)別效果。實(shí)驗(yàn)結(jié)果表明,在有噪語(yǔ)音和純凈語(yǔ)音環(huán)境下與傳統(tǒng)矢量量化算法比較,在識(shí)別精度、系統(tǒng)穩(wěn)定性魯棒性等方面都明顯提高。 3、改進(jìn)混合SVM的說(shuō)話(huà)人識(shí)別方法 SVM以結(jié)構(gòu)風(fēng)險(xiǎn)最小化為準(zhǔn)則,類(lèi)別區(qū)分能力強(qiáng),輸出結(jié)果反映了異類(lèi)樣本間的差異性,適用于處理連續(xù)輸入向量下的分類(lèi)問(wèn)題。為此,我們改進(jìn)了應(yīng)用于說(shuō)話(huà)人識(shí)別的混合SVM模型識(shí)別系統(tǒng)。該方法在將大樣本數(shù)據(jù)進(jìn)行分割和聚類(lèi)的基礎(chǔ)上,為每一類(lèi)樣本語(yǔ)音都構(gòu)造一個(gè)SVM進(jìn)行訓(xùn)練,并綜合所有的SVM輸出結(jié)果進(jìn)行決策分類(lèi)。較好的解決因話(huà)者數(shù)量增加和語(yǔ)音數(shù)據(jù)規(guī)模過(guò)大帶來(lái)的系統(tǒng)時(shí)間代價(jià)過(guò)大、識(shí)別效率低下的問(wèn)題,有效地提高了話(huà)者識(shí)別系統(tǒng)的分類(lèi)決策能力。 4、建立了分層話(huà)者識(shí)別系統(tǒng) 目前說(shuō)話(huà)人識(shí)別難以大量數(shù)據(jù)下系統(tǒng)的實(shí)時(shí)應(yīng)用,隨著語(yǔ)音數(shù)據(jù)庫(kù)規(guī)模的不斷擴(kuò)大,依據(jù)現(xiàn)有技術(shù),系統(tǒng)在識(shí)別時(shí)間、內(nèi)存需求及識(shí)別精度等方面都難以滿(mǎn)足實(shí)時(shí)辨識(shí)的需求。論文討論了MFCC、SDC等幾種不同特征在識(shí)別系統(tǒng)中的表現(xiàn),并依據(jù)分類(lèi)搜索的思想,利用方言辨識(shí)、性別辨識(shí)等技術(shù),縮小說(shuō)話(huà)人識(shí)別中的數(shù)量和范圍,再使用話(huà)者識(shí)別技術(shù)進(jìn)行辨識(shí),最終確定每一位說(shuō)話(huà)人的身份,努力尋求建立一個(gè)最優(yōu)的說(shuō)話(huà)人識(shí)別系統(tǒng)模型。
[Abstract]:Speech signals not only contain the semantic information of the speaker, but also contain the personality information of the speaker, from which people can extract the sex and age of the speaker. Speaker recognition is a technology that automatically determines the identity of the speaker according to the voice parameters of the speaker. As a biometric authentication technology, it is used in information retrieval and public security to solve a case. Voice authentication, telephone banking and other fields have important application value and wide application prospects. This paper systematically studies data acquisition, feature extraction and classification recognition, and obtains the following innovative results. 1. Establish a phonetic database of Chinese dialects Referring to the design standards of international phonetic corpus, considering the choice of recording channels, dialect types, age and gender distribution of speakers. Finally, to establish a covering Fujian, Guangdong, Wu, Xiang, northern, Jiangxi. Hakka and other seven local dialects and Mandarin Chinese dialect voice database, including broadband voice (microphone) and narrowband voice (mobile phone, fixed telephone / telephone / 106 hours of voice data). 2. A method of sex identification based on codebook model is proposed. For the first time, semi-supervised clustering technology is introduced in the research of gender recognition, and the speech data of Chinese dialects are vectorized by semi-supervised learning to form men with supervised information. This method fully considers the probability distribution state of speech feature space, optimizes the codebook generation method, and improves the accuracy of codebook model. It solves the problem of low precision of codebook generation in the traditional vector quantization algorithm and effectively improves the recognition effect of the system. The experimental results show that the algorithm is compared with the traditional vector quantization algorithm in noisy speech and pure speech environment. The recognition accuracy and system stability robustness are improved obviously. 3, improve the method of speaker recognition based on hybrid SVM SVM takes structural risk minimization as the criterion and has strong ability to distinguish categories. The output results reflect the differences between different samples and are suitable for dealing with classification problems under continuous input vectors. We improve the hybrid SVM model recognition system which is applied to speaker recognition. Based on the segmentation and clustering of large sample data, we construct a SVM for each class of speech samples. And synthesizes all the SVM output results to carry on the decision classification, which solves the problem that the system time cost is too large and the recognition efficiency is low due to the increase of the number of speakers and the large scale of speech data. The classification decision ability of speaker recognition system is improved effectively. 4. A hierarchical speaker recognition system is established At present, speaker recognition is difficult to be used in real time under a large amount of data. With the continuous expansion of the scale of speech database, according to the existing technology, the system is in the recognition time. Memory requirements and recognition accuracy are difficult to meet the needs of real-time identification. This paper discusses the performance of several different features such as MFCC / SDC in the recognition system, and according to the idea of classification and search. By using dialect identification, gender identification and other techniques, the number and scope of speaker recognition are reduced, and then the speaker recognition technology is used to identify each speaker. Try to establish an optimal speaker recognition system model.
【學(xué)位授予單位】:江蘇師范大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2012
【分類(lèi)號(hào)】:H17
【參考文獻(xiàn)】
相關(guān)期刊論文 前8條
1 馬志友,楊瑩春,吳朝暉;二次特征提取及其在說(shuō)話(huà)人識(shí)別中的應(yīng)用[J];電路與系統(tǒng)學(xué)報(bào);2003年02期
2 蔣曄;唐振民;;短語(yǔ)音說(shuō)話(huà)人辨認(rèn)的研究[J];電子學(xué)報(bào);2011年04期
3 顧明亮;馬勇;;基于高斯混合模型的漢語(yǔ)方言辨識(shí)系統(tǒng)[J];計(jì)算機(jī)工程與應(yīng)用;2007年03期
4 肖毅,李治柱;中文普通話(huà)電話(huà)語(yǔ)音數(shù)據(jù)庫(kù)的研制[J];計(jì)算機(jī)工程;2002年08期
5 顧明亮;沈兆勇;;基于語(yǔ)音配列的漢語(yǔ)方言自動(dòng)辨識(shí)[J];中文信息學(xué)報(bào);2006年05期
6 屈丹,王炳錫,魏鑫;語(yǔ)言辨識(shí)的矢量量化方法(VQ)[J];信息工程大學(xué)學(xué)報(bào);2002年03期
7 何勁松,施澤生;特征選擇方法中的信號(hào)分析方法研究[J];中國(guó)科學(xué)技術(shù)大學(xué)學(xué)報(bào);2001年01期
8 劉巖;;關(guān)于中國(guó)少數(shù)民族瀕危語(yǔ)言語(yǔ)音語(yǔ)料庫(kù)的設(shè)計(jì)[J];中央民族大學(xué)學(xué)報(bào);2006年04期
相關(guān)重要報(bào)紙文章 前1條
1 北京大學(xué)信息科學(xué)中心視覺(jué)與聽(tīng)覺(jué)信息處理國(guó)家重點(diǎn)實(shí)驗(yàn)室 吳璽宏;[N];計(jì)算機(jī)世界;2001年
相關(guān)博士學(xué)位論文 前2條
1 雷震春;支持向量機(jī)在說(shuō)話(huà)人識(shí)別中的應(yīng)用研究[D];浙江大學(xué);2006年
2 解焱陸;基于特征變換和分類(lèi)的文本無(wú)關(guān)電話(huà)語(yǔ)音說(shuō)話(huà)人識(shí)別研究[D];中國(guó)科學(xué)技術(shù)大學(xué);2007年
本文編號(hào):1473348
本文鏈接:http://sikaile.net/wenyilunwen/hanyulw/1473348.html