語(yǔ)音轉(zhuǎn)換中聲道譜參數(shù)變換算法的研究
本文選題:語(yǔ)音轉(zhuǎn)換 + 語(yǔ)音信號(hào)處理。 參考:《南京郵電大學(xué)》2017年碩士論文
【摘要】:語(yǔ)音轉(zhuǎn)換技術(shù)就是指在維持說(shuō)話(huà)人語(yǔ)言?xún)?nèi)容不變的情況下,將源說(shuō)話(huà)人聲音的個(gè)性特征進(jìn)行轉(zhuǎn)化,使得變換后的語(yǔ)音更貼近目標(biāo)人語(yǔ)音。語(yǔ)音轉(zhuǎn)換技術(shù)屬于語(yǔ)音信號(hào)處理衍生出來(lái)的一個(gè)研究方向,語(yǔ)音轉(zhuǎn)換與語(yǔ)音信號(hào)分析、識(shí)別和合成等研究方向有著密不可分的聯(lián)系且相互之間促進(jìn)發(fā)展,還有許多實(shí)際應(yīng)用如文語(yǔ)轉(zhuǎn)換、制作影視作品配音、醫(yī)學(xué)領(lǐng)域等等。本文重點(diǎn)研究以下內(nèi)容:(1)對(duì)語(yǔ)音轉(zhuǎn)換系統(tǒng)中各個(gè)部分的作用進(jìn)行討論;主要針對(duì)聲道譜特征參數(shù)這一特征的轉(zhuǎn)換進(jìn)行研究并且依此介紹許多經(jīng)典轉(zhuǎn)換模型,如矢量量化、高斯混合、線(xiàn)性多變量回歸、人工神經(jīng)網(wǎng)絡(luò)等等。(2)徑向基函數(shù)神經(jīng)網(wǎng)絡(luò)常被用作轉(zhuǎn)換模型,該神經(jīng)網(wǎng)絡(luò)的核函數(shù)參數(shù)通常采納K-均值聚類(lèi)進(jìn)行訓(xùn)練,由于此方法具有一些缺點(diǎn)如收斂速度慢、容易落入局部最優(yōu)中、泛化能力不強(qiáng)等。本文提出改進(jìn)粒子群算法優(yōu)化徑向基函數(shù)的方法來(lái)提高此網(wǎng)絡(luò)的性能,以便于更準(zhǔn)確的獲得源說(shuō)話(huà)人與目標(biāo)人之間譜包絡(luò)的映射關(guān)系并研究其在語(yǔ)音轉(zhuǎn)換系統(tǒng)中起到的作用。實(shí)驗(yàn)成果表明,本文提出的轉(zhuǎn)換方案能夠有效提升神經(jīng)網(wǎng)絡(luò)的性能,使轉(zhuǎn)換后的語(yǔ)音更接近于目標(biāo)語(yǔ)音。(3)常規(guī)語(yǔ)音轉(zhuǎn)換系統(tǒng)中聲道譜特征參數(shù)都根據(jù)單一的徑向基函數(shù)神經(jīng)網(wǎng)絡(luò)規(guī)則進(jìn)行轉(zhuǎn)換,這樣很難匹配所有的特征參數(shù),使得轉(zhuǎn)換語(yǔ)音的質(zhì)量有所下降。為了改善上述情況,本文提出自組織特征映射與改進(jìn)粒子群優(yōu)化徑向基函數(shù)神經(jīng)網(wǎng)絡(luò)聯(lián)合轉(zhuǎn)換聲道譜特征參數(shù),利用自組織特征映射良好的分類(lèi)能力建立多轉(zhuǎn)換規(guī)則。通過(guò)主觀和客觀的評(píng)價(jià):這種多類(lèi)別映射規(guī)則可以提升轉(zhuǎn)換的精確度,使得語(yǔ)音信號(hào)的質(zhì)量得到提升。
[Abstract]:The technology of speech conversion is to transform the individual characteristics of the source speaker's voice under the condition of keeping the speaker's language content unchanged, so that the transformed speech is closer to the target person's speech. Speech conversion technology is a research direction derived from speech signal processing. Speech conversion is closely related to speech signal analysis, recognition and synthesis, and promotes the development of each other. There are many practical applications such as text-to-speech conversion, production of film and television dubbing, medical field and so on. This paper focuses on the following contents: (1) the role of each part of the speech conversion system is discussed, and the conversion of the characteristic parameter of the channel spectrum is mainly studied and many classical conversion models, such as vector quantization, are introduced. Gao Si mixing, linear multivariate regression, artificial neural network and so on. (2) Radial basis function neural network is often used as the transformation model, the kernel function parameters of the neural network are usually trained by K-means clustering. This method has some disadvantages, such as slow convergence rate, easy to fall into local optimum, weak generalization ability and so on. In this paper, an improved particle swarm optimization method is proposed to optimize the radial basis function (RBF) to improve the performance of the network, so as to obtain more accurately the mapping relationship of spectral envelope between the source speaker and the target, and to study its role in the speech conversion system. Experimental results show that the proposed conversion scheme can effectively improve the performance of neural networks. The transformed speech is closer to the target speech. (3) in the conventional speech conversion system, the characteristic parameters of the channel spectrum are converted according to a single radial basis function neural network rule, so it is difficult to match all the feature parameters. The quality of the converted speech is reduced. In order to improve the above situation, this paper presents a method of combining self-organizing feature mapping with improved particle swarm optimization radial basis function neural network to transform the acoustic spectrum feature parameters, and sets up multi-conversion rules by using the good classification ability of self-organizing feature mapping. Subjective and objective evaluation: this multi-class mapping rule can improve the accuracy of the conversion and improve the quality of speech signal.
【學(xué)位授予單位】:南京郵電大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類(lèi)號(hào)】:TN912.3;TP18
【相似文獻(xiàn)】
相關(guān)期刊論文 前1條
1 孫新建;張雄偉;楊吉斌;曹鐵勇;鐘新毅;;基于雙因子高斯過(guò)程動(dòng)態(tài)模型的聲道譜轉(zhuǎn)換方法[J];自動(dòng)化學(xué)報(bào);2014年06期
相關(guān)碩士學(xué)位論文 前10條
1 董添輝;語(yǔ)音轉(zhuǎn)換中聲道譜參數(shù)變換算法的研究[D];南京郵電大學(xué);2017年
2 楊秀峰;基于神經(jīng)網(wǎng)絡(luò)的語(yǔ)音轉(zhuǎn)換算法研究[D];西安建筑科技大學(xué);2017年
3 呂中良;基于改進(jìn)的BLFW下平行和非平行文本的語(yǔ)音轉(zhuǎn)換算法研究[D];南京郵電大學(xué);2017年
4 靳棟棟;礦井運(yùn)輸控制與語(yǔ)音融合系統(tǒng)的研究[D];中國(guó)礦業(yè)大學(xué);2017年
5 王志龍;甘肅省VoLTE優(yōu)化研究與實(shí)踐[D];蘭州交通大學(xué);2017年
6 賀偉;VOLTE互操作分析及優(yōu)化研究[D];電子科技大學(xué);2017年
7 王建偉;基于深度學(xué)習(xí)的情緒感知系統(tǒng)的研究與設(shè)計(jì)[D];電子科技大學(xué);2017年
8 劉沖沖;Sagnac/Φ-OTDR混合型光纖語(yǔ)音傳感器及其語(yǔ)音降噪方法研究[D];安徽師范大學(xué);2017年
9 鮑承毅;基于語(yǔ)音媒體的移動(dòng)學(xué)習(xí)系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)[D];華中師范大學(xué);2017年
10 水晶;語(yǔ)音調(diào)度WEB平臺(tái)服務(wù)器推送技術(shù)研究[D];長(zhǎng)安大學(xué);2017年
,本文編號(hào):2074737
本文鏈接:http://sikaile.net/kejilunwen/xinxigongchenglunwen/2074737.html