天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

語音半自動(dòng)標(biāo)注系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)

發(fā)布時(shí)間:2018-06-19 06:50

  本文選題:DAEM算法 + STRAIGHT算法; 參考:《西北師范大學(xué)》2015年碩士論文


【摘要】:隨著當(dāng)代信息技術(shù)日新月異的發(fā)展,人們對語音合成和語音識別的效果提出更高的要求,越來越多的實(shí)驗(yàn)室研究成果被應(yīng)用到實(shí)際生活中,各種語音系統(tǒng)產(chǎn)品不斷問世。構(gòu)建大規(guī)模的語料庫是設(shè)計(jì)優(yōu)秀語音系統(tǒng)不可缺少的一項(xiàng)任務(wù),而是否對語料庫進(jìn)行精確標(biāo)注,則決定語料庫質(zhì)量的優(yōu)劣,因此語料庫的標(biāo)注在語音研究中起到關(guān)鍵性的作用。大量的人工標(biāo)注不僅耗時(shí)、耗力、成本大,而且由于人耳對于詞或語句中單個(gè)音節(jié)的邊界不敏感,標(biāo)注數(shù)據(jù)會(huì)產(chǎn)生較大的誤差。論文設(shè)計(jì)了一個(gè)語音語料的半自動(dòng)標(biāo)注系統(tǒng),能夠自動(dòng)計(jì)算出語音語料的邊界和基頻包絡(luò),在此基礎(chǔ)上手工矯正自動(dòng)標(biāo)注結(jié)果,實(shí)現(xiàn)語音語料邊界和基頻包絡(luò)的準(zhǔn)確標(biāo)注。論文的主要工作與創(chuàng)新如下:1.實(shí)現(xiàn)了語音基元邊界的自動(dòng)標(biāo)注算法。對錄制好的無時(shí)間標(biāo)注語音文件,采用基于隱Markov模型(Hidden Markov Model,HMM)的強(qiáng)制對齊算法進(jìn)行時(shí)間邊界的自動(dòng)對齊。在HMM模型訓(xùn)練過程的重估步驟中,引入了確定性模擬退火期望值最大(Deterministic Annealing Expectation Maximization,DAEM)算法,提高了語音基元邊界強(qiáng)制對齊的準(zhǔn)確性。2.實(shí)現(xiàn)了語音基頻的自動(dòng)標(biāo)注算法。在語料時(shí)長邊界標(biāo)注的基礎(chǔ)上,采用STRAIGHT(Speech Transformation and Representation based on Adaptive Interpolation of w eighted spectrogram)算法提取語音的基頻,并對提取出的基頻數(shù)據(jù)進(jìn)行平滑。根據(jù)兩峰值點(diǎn)距離是基頻周期的關(guān)系,獲得峰值點(diǎn)標(biāo)注位置,從峰值點(diǎn)形成的基頻包絡(luò)曲線,可以直接發(fā)現(xiàn)漏標(biāo)、錯(cuò)標(biāo)的峰值點(diǎn)。通過人工修正,得到更加準(zhǔn)確的標(biāo)注數(shù)據(jù)。這也就是半自動(dòng)標(biāo)注系統(tǒng)的體現(xiàn)。3.設(shè)計(jì)實(shí)現(xiàn)了一個(gè)語音半標(biāo)注系統(tǒng)。系統(tǒng)采用圖形化用戶交互界面,在語音波形上畫出每個(gè)語音基元的邊界,同時(shí)將STRAIGHT算法的基頻,轉(zhuǎn)換成語音波形上的峰值點(diǎn)標(biāo)注。在此基礎(chǔ)上,設(shè)計(jì)實(shí)現(xiàn)了手工修改語音基元邊界和峰值點(diǎn)標(biāo)注的功能,以完成更為精確的語音基元邊界以及基頻包絡(luò)的標(biāo)注,最終實(shí)現(xiàn)可視化的語音半自動(dòng)標(biāo)注系統(tǒng)的設(shè)計(jì)。4.對蘭州方言進(jìn)行了實(shí)驗(yàn)語音學(xué)分析。利用實(shí)現(xiàn)的語音半自動(dòng)標(biāo)注系統(tǒng),標(biāo)注了蘭州方言單字的邊界和基頻,并進(jìn)行了實(shí)驗(yàn)語音學(xué)分析,驗(yàn)證了蘭州方言單字的語音學(xué)結(jié)論。
[Abstract]:With the rapid development of modern information technology, people put forward higher requirements for the effect of speech synthesis and speech recognition. More and more laboratory research results have been applied to the real life, and a variety of speech system products are coming out. Constructing a large scale corpus is an indispensable task in the design of excellent speech system. Whether or not to accurately annotate the corpus determines the quality of the corpus, so the annotation of the corpus plays a key role in the phonological research. A large number of manual tagging is not only time-consuming, labor-intensive and costly, but also because the ear is insensitive to the boundary of a single syllable in a word or sentence, the tagging data will produce a large error. In this paper, a semi-automatic tagging system of speech corpus is designed, which can automatically calculate the boundary of speech corpus and the envelope of fundamental frequency. On this basis, the automatic tagging results can be corrected manually, and the accurate tagging of the boundary of speech corpus and the envelope of fundamental frequency can be realized. The main work and innovation of this paper are as follows: 1. An automatic algorithm for marking the edge of speech primitives is implemented. Based on hidden Markov model and Hidden Markov Model (HMMM), an automatic time boundary alignment algorithm is used to automatically align the recorded time-free speech files. In the revaluation step of hmm training process, deterministic Annealing expectation maximization (DAEMEM) algorithm is introduced, which improves the accuracy of speech primitive boundary forced alignment. The automatic marking algorithm of speech fundamental frequency is realized. On the basis of time-length boundary annotation, the speech and representation based on Adaptive of w eighted spectrogram) algorithm is used to extract the fundamental frequency of speech, and the extracted fundamental frequency data is smoothed. According to the relationship between the distance between two peaks and the fundamental frequency period, the tagging position of the peak point is obtained, and the fundamental frequency envelope curve formed from the peak point can directly find the missing mark and the wrong target peak point. Through manual correction, more accurate tagging data can be obtained. This is the semiautomatic tagging system. 3. A speech semi-label system is designed and implemented. The system uses a graphical user interface to draw the boundaries of each speech primitive on the speech waveform. At the same time, the fundamental frequency of the Straight algorithm is converted into the peak point annotation on the speech waveform. On this basis, the function of manually modifying the speech primitive boundary and peak point tagging is designed and realized, so as to complete the more accurate voice-element boundary and the tagging of the fundamental frequency envelope. Finally, the design of the visualized semi-automatic voice tagging system .4. This paper analyzes the experimental phonetics of Lanzhou dialect. In this paper, the boundary and fundamental frequency of single words in Lanzhou dialect are annotated by using the realized phonetic semiautomatic marking system, and the experimental phonetics analysis is carried out to verify the phonological conclusions of Lanzhou dialect words.
【學(xué)位授予單位】:西北師范大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2015
【分類號】:TN912.3

【相似文獻(xiàn)】

相關(guān)期刊論文 前10條

1 王玉林,,趙炳彥;汽車車身零件圖紙的自動(dòng)標(biāo)注方法[J];計(jì)算機(jī)輔助工程;1996年04期

2 佘晶,黃翔;特征尺寸自動(dòng)標(biāo)注方法的研究及實(shí)現(xiàn)[J];機(jī)械制造與自動(dòng)化;2005年01期

3 Q迷平

本文編號:2038994


資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/wltx/2038994.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶2c42a***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請E-mail郵箱bigeng88@qq.com