基于聯(lián)合因子分析的耳語(yǔ)音說話人識(shí)別研究
[Abstract]:Speaker recognition, as an important part of biometric recognition, can be widely used in public security and judicature, biomedical engineering, military security system and other fields. With the rapid development of computer and network technology, speaker recognition technology has made great progress. Ear whisper is a special form of voice communication, in many cases Because there is a great difference between the ear and the normal sound, the speaker recognition can not copy the method of the normal speaker recognition in the ear language. There are still many problems to be solved.
In this paper, the research object of the ear language speaker recognition is not related to the text. The problems facing the ear speaker recognition mainly include: the imperfect ear language database, the normal voice, the United States National Standard Technology Bureau, which is used to carry out the speaker recognition research, and the ear is used to carry out the speaker recognition research. The resources of speech are scarce, the problem of ear speech feature expression, the ear speech because of its particularity, some commonly used characteristic parameters can not be extracted, its spectrum parameters are more difficult to obtain than normal sound, ear pronunciation is gas sound, low sound level, easier to be disturbed by noise, and ear speech is often in mobile phone calls. It is easy to be affected by the channel environment; at the same time, when the ear language is pronounced, it is restricted by the place of pronunciation, the expression of emotion is limited, and the state of the pronunciation, the psychological factors will have some changes, and it is more susceptible to the influence of the speaker's psychological factors, emotion and pronunciation state. The point is: the feature parameters are more difficult to extract, and are easily affected by the speaker's own state, and are more sensitive to the channel changes.
In view of these problems, this paper has carried out the following aspects:
1. a parameter extraction algorithm which reflects the characteristics of the speaker's speech speaker is proposed. The ear speech has no fundamental frequency and the sound source features are difficult to embody. As a resonance peak parameter that characterizing the characteristics of the sound channel, the reliability of the extraction algorithm is particularly important. In this paper, a spectral segmentation algorithm for the ear speech resonance peak extraction is proposed. This method can dynamically divide the spectrum. The filter parameters are obtained by selective linear prediction, and the resonant peak is obtained by parallel inverse filtering. This method provides an effective way to solve the problem of resonance peak migration, merger and flatness caused by the ear speech sound. On the other hand, this paper combines the characteristics of the center and flatness of the variable statistics to measure the stability of the signal. The concept of the spectral flatness of the Bark subband spectrum center and the Bark subband spectrum is proposed in the human ear auditory model, and the feature parameter sets are formed with other spectrum variables, which can effectively characterize the speaker's characteristics in the ear speech sound mode.
2. an atypical speech speaker recognition method based on feature mapping and speaker model synthesis is proposed. It can solve the problem of emotional state mismatch between the training speech and the test speech speaker. Because the ear speech is not as effective as the normal sound in emotional expression, it can not make a clear emotional classification. The A, V factor classification method of the speaker state blurs the one-to-one correspondence of its emotional expression, and at the test stage, as the front end processing method of the speech signal, the speaker States each speech state, and then the compensation of the feature domain or the model domain is realized. The experiment shows that the speaker state is based on the feature mapping and the speaker model. The compensation method not only embodies the uniqueness of whispered speech, but also can effectively improve the accuracy of speaker recognition in atypical emotional whispered speech.
3. an ear whisper recognition method based on the latent factor analysis is proposed. It provides an effective way for the ear speaker state compensation. The factor analysis does not pay attention to the specific physical meaning represented by the public factors. It is only to find representative factors in many variables, and can be adjusted and reduced by the number of factors. According to the latent factor theory, the super vector of the ear speech feature can be decomposed into the speaker's super vector and the speaker's state super vector, and the speaker and speaker's state space is estimated by the balanced training speech. In the test stage, the speaker factor is estimated for each speech, and then the decision is made. The submersible factor analysis method is made. Compared with the compensation algorithm which is dependent on the classification method, the speaker recognition rate can be further improved.
4. an untypical emotional ear speaker recognition method based on joint factor analysis is proposed. The dual compensation of the channel and speaker state in the speech speaker recognition is realized. According to the basic concept of the joint factor analysis, the speech feature supervector can be decomposed into the speaker supervector, the speaker state supervector and the speaker state supervector. In order to solve the problem of speaker, speaker state and channel space at the same time, the speaker state and channel space can not be estimated at the same time. After obtaining the UBM model, the Baum-Welch statistics of speech are calculated, the speaker space is estimated and the speaker state is estimated by parallel mode, and then the speaker state is estimated by parallel mode. Channel space. In the test phase, the characteristics of the speech feature vectors are subtracted from the channel and speaker state offset, and the transformed features are used for speaker recognition. The experimental results show that the method based on the joint factor analysis can compensate the channel and speaker state at the same time, and better recognition results can be obtained compared with other algorithms.
【學(xué)位授予單位】:蘇州大學(xué)
【學(xué)位級(jí)別】:博士
【學(xué)位授予年份】:2014
【分類號(hào)】:TN912.34
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 沙丹青,栗學(xué)麗,徐柏齡;耳語(yǔ)音聲調(diào)特征的研究[J];電聲技術(shù);2003年11期
2 郭武;李軼杰;戴禮榮;王仁華;;采用非監(jiān)督得分規(guī)整和因子分析的說話人確認(rèn)[J];電子學(xué)報(bào);2009年04期
3 陳雪勤;趙鶴鳴;;基于聽覺模型的漢語(yǔ)耳語(yǔ)音聲調(diào)檢測(cè)[J];電子學(xué)報(bào);2009年04期
4 潘欣裕;趙鶴鳴;陳雪勤;徐敏;;基于EMD擬合特征的耳語(yǔ)音端點(diǎn)檢測(cè)[J];電子與信息學(xué)報(bào);2008年02期
5 黃程韋;趙艷;金峗;于寅驊;趙力;;實(shí)用語(yǔ)音情感的特征分析與識(shí)別的研究[J];電子與信息學(xué)報(bào);2011年01期
6 趙迎春;張勁松;韓晶晶;任芳;蔡汝剛;;中國(guó)兒童情感評(píng)價(jià)圖片庫(kù)(7~14歲,上海版)的建立[J];中國(guó)兒童保健雜志;2009年03期
7 楊莉莉,李燕,徐柏齡;漢語(yǔ)耳語(yǔ)音庫(kù)的建立與聽覺實(shí)驗(yàn)研究[J];南京大學(xué)學(xué)報(bào)(自然科學(xué)版);2005年03期
8 蔣丹寧;蔡蓮紅;;基于語(yǔ)音聲學(xué)特征的情感信息識(shí)別[J];清華大學(xué)學(xué)報(bào)(自然科學(xué)版);2006年01期
9 茹婷婷;謝湘;;耳語(yǔ)音數(shù)據(jù)庫(kù)的設(shè)計(jì)與采集[J];清華大學(xué)學(xué)報(bào)(自然科學(xué)版);2008年S1期
10 金峗;趙艷;黃程韋;趙力;;耳語(yǔ)音情感數(shù)據(jù)庫(kù)的設(shè)計(jì)與建立[J];聲學(xué)技術(shù);2010年01期
本文編號(hào):2158273
本文鏈接:http://sikaile.net/kejilunwen/wltx/2158273.html