天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁 > 科技論文 > 信息工程論文 >

基于改進的BLFW下平行和非平行文本的語音轉(zhuǎn)換算法研究

發(fā)布時間:2018-06-09 02:51

  本文選題:語音轉(zhuǎn)換 + 自適應(yīng)高斯分類。 參考:《南京郵電大學(xué)》2017年碩士論文


【摘要】:在語音信號處理領(lǐng)域,語音轉(zhuǎn)換是指將一個說話人(源說話人)的語音轉(zhuǎn)換成聽起來像另一個說話人(目標(biāo)說話人)的所發(fā)出的語音,同時保持語義不變。語音中包含著豐富的信息,包括語義信息、個性信息、語言信息和情感信息等,而語音轉(zhuǎn)換主要關(guān)注點在于語音的聲學(xué)本質(zhì)特征:頻譜特性和韻律特征。在語音轉(zhuǎn)換的多種應(yīng)用場景中,如娛樂和跨語言轉(zhuǎn)換應(yīng)用中,需要語音轉(zhuǎn)換系統(tǒng)能夠提供高質(zhì)量的語音和實現(xiàn)非平行文本下的語音轉(zhuǎn)換,F(xiàn)有的語音轉(zhuǎn)換系統(tǒng)面臨著兩個主要問題:一方面是轉(zhuǎn)換后的語音不能同時獲得較高的相似度和較好的音質(zhì)效果,而不得不在轉(zhuǎn)換后語音的相似度和音質(zhì)上權(quán)衡,另一方面是轉(zhuǎn)換函數(shù)的訓(xùn)練依賴于平行語料,限制了語音轉(zhuǎn)換系統(tǒng)的通用性。首先為了實現(xiàn)較高音質(zhì)和相似度轉(zhuǎn)換的語音轉(zhuǎn)換,本文提出基于自適應(yīng)高斯分類的雙線性頻率彎折加幅度調(diào)節(jié)算法,它采用自適應(yīng)高斯分類更好地對語音的聲學(xué)特征分布建模,在實現(xiàn)合理分類的基礎(chǔ)上進行語音轉(zhuǎn)換。經(jīng)過主觀和客觀評價,本文提出的方法比固定的分類數(shù)的雙線性頻率彎折加幅度調(diào)節(jié)算法轉(zhuǎn)換后的語音的平均MOS值提高了4.7%,平均MCD值降低了2.7%,這說明本文提出的方法對語音轉(zhuǎn)換系統(tǒng)的性能有一定的改進。其次,為了解決語音轉(zhuǎn)換方法對平行語料的依賴,本文使用基于單元挑選和聲道長度歸一化的方法對非平行語料進行對齊,然后將基于自適應(yīng)高斯分類的雙線性頻率彎折加幅度調(diào)節(jié)方法應(yīng)用于非平行文本下的語音轉(zhuǎn)換領(lǐng)域。經(jīng)過主觀和客觀評價實驗對比,證實這種方法比非平行文本下INCA方法的轉(zhuǎn)換后的語音的平均MOS值提高了7.1%,平均MCD值降低了4.0%,表明轉(zhuǎn)換后的語音音質(zhì)更好,相似度更高。而與傳統(tǒng)的平行文本下的高斯混合模型語音轉(zhuǎn)換方法相比平均MCD值高了5.1%,平均MOS值低了3.9%,表明其轉(zhuǎn)換性能仍有一定的差距,但是本方法是在非平行文本條件下開展的,具有更強的通用性。
[Abstract]:In the field of speech signal processing, speech conversion is to transform the speech of one speaker (source speaker) into a speech that sounds like another speaker (target speaker), while maintaining the same semantics. Speech contains abundant information, including semantic information, personality information, language information and emotional information, while speech conversion focuses on the acoustic essential features of speech, such as spectrum characteristics and prosodic features. In many application scenarios of speech conversion, such as entertainment and cross-language conversion, it is necessary that the speech conversion system can provide high quality speech and achieve speech conversion under non-parallel text. The existing speech conversion system is faced with two main problems: on the one hand, the transformed speech can not obtain higher similarity and better sound quality at the same time, but it has to weigh the similarity and sound quality of the converted speech at the same time. On the other hand, the training of conversion function depends on parallel corpus, which limits the generality of speech conversion system. In order to realize the speech conversion of high tone quality and similarity conversion, this paper proposes a bilinear frequency bending amplitude adjustment algorithm based on adaptive Gao Si classification, which uses adaptive Gao Si classification to better model the acoustic feature distribution of speech. On the basis of reasonable classification, speech conversion is carried out. After subjective and objective evaluation, The method proposed in this paper increases the average MOS value of speech by 4.7 and reduces the average MCD value by 2.7 points compared with the bilinear frequency bending and amplitude adjustment algorithm with fixed classification number, which shows that the proposed method is effective for speech conversion system. The performance has certain improvement. Secondly, in order to solve the dependence of speech conversion methods on parallel corpus, this paper uses the method of unit selection and channel length normalization to align the non-parallel corpus. Then the bilinear frequency bending amplitude adjustment method based on adaptive Gao Si classification is applied to the field of speech conversion under non-parallel text. By comparing subjective and objective evaluation experiments, it is proved that the average MOS value and the average MCD value of the transformed speech by the INCA method under non-parallel text are 7.1 higher and 4.0% lower than those of the non-parallel text INCA method, which indicates that the transformed speech has better sound quality and higher similarity. The average Gao Si value is 5.1 higher and the average MOS value is 3.9 lower than the traditional parallel text model speech conversion method, which indicates that there is still a certain gap in the conversion performance. However, this method is developed under the condition of non-parallel text. It is more versatile.
【學(xué)位授予單位】:南京郵電大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:TN912.3

【參考文獻(xiàn)】

相關(guān)期刊論文 前10條

1 車瀅霞;俞一彪;;約束條件下的結(jié)構(gòu)化高斯混合模型及非平行語料語音轉(zhuǎn)換[J];電子學(xué)報;2016年09期

2 李陽春;俞一彪;;倒譜本征空間結(jié)構(gòu)化高斯混合模型語音轉(zhuǎn)換方法[J];聲學(xué)學(xué)報;2015年01期

3 李賢;於俊;汪增福;;面向情感語音轉(zhuǎn)換的韻律轉(zhuǎn)換方法[J];聲學(xué)學(xué)報;2014年04期

4 宋鵬;王浩;趙力;;采用模型自適應(yīng)的語音轉(zhuǎn)換方法[J];信號處理;2013年10期

5 馬振;張雄偉;楊吉斌;徐玉龍;;基于稀疏卷積非負(fù)矩陣分解的語音轉(zhuǎn)換方法研究[J];軍事通信技術(shù);2013年02期

6 宋鵬;王浩;趙力;;基于混合Gauss歸一化的語音轉(zhuǎn)換方法[J];清華大學(xué)學(xué)報(自然科學(xué)版);2013年06期

7 馬振;張雄偉;楊吉斌;;基于語音個人特征信息分離的語音轉(zhuǎn)換方法研究[J];信號處理;2013年04期

8 孫健;張雄偉;曹鐵勇;楊吉斌;孫新建;;基于卷積非負(fù)矩陣分解的語音轉(zhuǎn)換方法[J];數(shù)據(jù)采集與處理;2013年02期

9 俞一彪;曾道建;姜瑩;;采用獨立說話人模型的語音轉(zhuǎn)換[J];聲學(xué)學(xué)報;2012年03期

10 徐寧;楊震;張玲華;;基于狀態(tài)空間模型的子頻帶語音轉(zhuǎn)換算法[J];電子學(xué)報;2010年03期



本文編號:1998505

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/xinxigongchenglunwen/1998505.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶01d40***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com