天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁 > 科技論文 > 軟件論文 >

基于深度神經(jīng)網(wǎng)絡(luò)的韻律結(jié)構(gòu)預(yù)測研究

發(fā)布時間:2018-08-19 09:07
【摘要】:漢語韻律預(yù)測對合成語音的自然度起著重要作用,本文的研究重點(diǎn)是如何提高韻律層級結(jié)構(gòu)的預(yù)測準(zhǔn)確率。以往基于統(tǒng)計的韻律結(jié)構(gòu)預(yù)測模型,在輸入特征的選取中往往使用POS (Part of Speech)詞性特征,POS特征為淺層信息,無法利用詞語的語義信息,且選取的特征因?yàn)樵~與詞之間缺乏關(guān)聯(lián),往往會形成“詞匯鴻溝”現(xiàn)象,導(dǎo)致即使同義詞也無法表現(xiàn)出相應(yīng)的關(guān)聯(lián)性。因此,需要使用一種能體現(xiàn)詞語關(guān)聯(lián)性的表達(dá)方式作為模型的輸入特征。在模型選擇上,隱馬爾科夫模型、決策樹模型雖然在韻律結(jié)構(gòu)預(yù)測中取得了成功,但會產(chǎn)生適用范圍窄、過擬合等問題。隨著數(shù)據(jù)復(fù)雜性越來越大,需要使用一種對數(shù)據(jù)建模能力更強(qiáng)的模型方法,而深度神經(jīng)網(wǎng)絡(luò)對復(fù)雜數(shù)據(jù)有很好的建模能力。因此,本文圍繞以詞向量為輸入特征的基于深度神經(jīng)網(wǎng)絡(luò)模型的韻律結(jié)構(gòu)預(yù)測模型展開研究。本文一方面對詞向量進(jìn)行訓(xùn)練,由詞向量構(gòu)建韻律詞向量,采用復(fù)合向量作為模型的輸入;另一方面,對傳統(tǒng)神經(jīng)網(wǎng)絡(luò)模型進(jìn)行了改進(jìn),使神經(jīng)網(wǎng)絡(luò)的隱藏層能更好的捕捉詞與詞之間的交互聯(lián)系。主要工作包括:(1)配置Gensim詞向量訓(xùn)練模塊,通過Gensim模塊訓(xùn)練詞向量,利用訓(xùn)練后的詞向量學(xué)習(xí)韻律詞向量,通過不同層級的向量來抓取上下文中的韻律層級信息;(2)利用韻律層級標(biāo)注數(shù)據(jù)訓(xùn)練神經(jīng)網(wǎng)絡(luò)模型,使用詞典詞向量、韻律詞向量、前詞韻律層級向量和當(dāng)前詞的詞長向量作步輸入特征,通過在模型的輸入層使用復(fù)合輸入特征來提高模型的預(yù)測能力;(3)對模型隱藏層進(jìn)行改進(jìn),在隱藏層中添加張量矩陣,通過張量矩陣捕捉詞與詞之間、不同韻律層級之間的聯(lián)系,分別從窗口長度、空間維度、隱藏層單元數(shù)量、輸入特征等方面對模型的韻律結(jié)構(gòu)預(yù)測能力進(jìn)行驗(yàn)證。實(shí)驗(yàn)結(jié)果顯示:多向量結(jié)合的復(fù)合輸入特征與單一詞向量作為輸入特征的預(yù)測結(jié)果相比,韻律詞的錯誤率降低了3.2%(從15.3%降至12.1%),而韻律短語的錯誤率降低了5%(從40.3%降至35.3%);隱藏層中添加張量矩陣之后,韻律詞的錯誤率降低了0.5%(從12.1%降至11.6%)。實(shí)驗(yàn)結(jié)果表明:復(fù)合輸入特征能有效改善韻律預(yù)測的錯誤率;帶有張量矩陣的隱藏層與普通隱藏層相比,能夠更好地捕捉韻律層級之間的信息。
[Abstract]:The prediction of Chinese prosody plays an important role in the naturalness of synthetic speech. This paper focuses on how to improve the prediction accuracy of prosodic hierarchical structure. In the former statistical prosodic structure prediction model, POS (Part of Speech) feature is often used in the selection of input features for shallow information, which can not use semantic information of words, and the selected features are lack of correlation between words and words. The phenomenon of lexical gap is often formed, resulting in even synonyms can not show the corresponding relevance. Therefore, it is necessary to use a kind of expression which can reflect the relevance of words as the input feature of the model. In model selection, hidden Markov model and decision tree model have been successful in prosodic structure prediction, but some problems such as narrow scope of application and over-fitting will occur. With the increasing complexity of data, it is necessary to use a more powerful modeling method for data, while the depth neural network has a good ability to model complex data. Therefore, this paper focuses on the prosodic structure prediction model based on depth neural network model with word vector as input feature. On the one hand, we train the word vector, construct the prosodic word vector from the word vector, and use the compound vector as the input of the model; on the other hand, we improve the traditional neural network model. The hidden layer of neural network can better capture the interaction between words and words. The main work includes: (1) configure Gensim word vector training module, train word vector through Gensim module, use trained word vector to learn prosodic word vector, grab prosodic level information in context by different level vector; (2) training neural network model with prosodic level tagging data, using dictionary word vector, prosodic word vector, preword prosodic level vector and word length vector of current word as step input features. The prediction ability of the model is improved by using the compound input feature in the input layer of the model. (3) the hidden layer of the model is improved by adding the Zhang Liang matrix to the hidden layer, and the words and words are captured by the Zhang Liang matrix. The relationship between different prosodic levels verifies the prediction ability of the prosodic structure from the aspects of window length spatial dimension the number of hidden layer units input features and so on. The experimental results show that the error rate of prosodic words decreases by 3.2% (from 15.3% to 12.1%) and the error rate of prosodic phrases decreases by 5% (from 40.3% to 35.3%) compared with the prediction results of single word vector. After adding Zhang Liang matrix to the hidden layer, the error rate of prosodic words decreased by 0.5% (from 12.1% to 11.6%). The experimental results show that the compound input feature can effectively improve the error rate of prosodic prediction and that the hidden layer with Zhang Liang matrix can capture the information between prosodic levels better than the ordinary hidden layer.
【學(xué)位授予單位】:北京交通大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2016
【分類號】:TP391.1

【相似文獻(xiàn)】

相關(guān)會議論文 前8條

1 李雅;盧穎超;許小穎;陶建華;;連續(xù)語流中韻律層級和調(diào)型組合對重音感知的影響[A];第九屆中國語音學(xué)學(xué)術(shù)會議論文集[C];2010年

2 鄭秋豫;;語流中韻律結(jié)構(gòu)的主要徵信[A];第六屆全國人機(jī)語音通訊學(xué)術(shù)會議論文集[C];2001年

3 張錦玉;;普通話語篇停延與呼吸特征初探[A];第九屆中國語音學(xué)學(xué)術(shù)會議論文集[C];2010年

4 梁潔;楊新璐;;維漢廣播新聞韻律層級邊界特征研究[A];第十一屆全國人機(jī)語音通訊學(xué)術(shù)會議論文集(一)[C];2011年

5 李雅;盧穎超;許小穎;陶建華;;連續(xù)語流中韻律層級和調(diào)型組合對重音感知的影響[A];第十一屆全國人機(jī)語音通訊學(xué)術(shù)會議論文集(二)[C];2011年

6 李雅;盧穎超;許小穎;陶建華;;連續(xù)語流中韻律層級和調(diào)型組合對重音感知的影響[A];第十一屆全國人機(jī)語音通訊學(xué)術(shù)會議論文集(一)[C];2011年

7 王天慶;李愛軍;;基于SFC模型的韻律詞音高模式研究[A];第八屆全國人機(jī)語音通訊學(xué)術(shù)會議論文集[C];2005年

8 曹劍芬;;漢語韻律切分的語音學(xué)和語言學(xué)線索[A];新世紀(jì)的現(xiàn)代語音學(xué)——第五屆全國現(xiàn)代語音學(xué)學(xué)術(shù)會議論文集[C];2001年

相關(guān)重要報紙文章 前1條

1 記者胡言午 通訊員黃立鶴;工程技術(shù)手段推動韻律研究[N];中國社會科學(xué)報;2012年

相關(guān)博士學(xué)位論文 前1條

1 于澤;書面韻律信息的作用及其加工機(jī)制的即時加工研究[D];遼寧師范大學(xué);2011年

相關(guān)碩士學(xué)位論文 前1條

1 王琦;基于深度神經(jīng)網(wǎng)絡(luò)的韻律結(jié)構(gòu)預(yù)測研究[D];北京交通大學(xué);2016年

,

本文編號:2191200

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2191200.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶95aba***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com