語音半自動標注系統(tǒng)的設(shè)計與實現(xiàn)

發(fā)布時間：2018-06-19 06:50

本文選題：DAEM算法 + STRAIGHT算法�。� 參考：《西北師范大學》2015年碩士論文

【摘要】：隨著當代信息技術(shù)日新月異的發(fā)展,人們對語音合成和語音識別的效果提出更高的要求,越來越多的實驗室研究成果被應(yīng)用到實際生活中,各種語音系統(tǒng)產(chǎn)品不斷問世。構(gòu)建大規(guī)模的語料庫是設(shè)計優(yōu)秀語音系統(tǒng)不可缺少的一項任務(wù),而是否對語料庫進行精確標注,則決定語料庫質(zhì)量的優(yōu)劣,因此語料庫的標注在語音研究中起到關(guān)鍵性的作用。大量的人工標注不僅耗時、耗力、成本大,而且由于人耳對于詞或語句中單個音節(jié)的邊界不敏感,標注數(shù)據(jù)會產(chǎn)生較大的誤差。論文設(shè)計了一個語音語料的半自動標注系統(tǒng),能夠自動計算出語音語料的邊界和基頻包絡(luò),在此基礎(chǔ)上手工矯正自動標注結(jié)果,實現(xiàn)語音語料邊界和基頻包絡(luò)的準確標注。論文的主要工作與創(chuàng)新如下:1.實現(xiàn)了語音基元邊界的自動標注算法。對錄制好的無時間標注語音文件,采用基于隱Markov模型(Hidden Markov Model,HMM)的強制對齊算法進行時間邊界的自動對齊。在HMM模型訓練過程的重估步驟中,引入了確定性模擬退火期望值最大(Deterministic Annealing Expectation Maximization,DAEM)算法,提高了語音基元邊界強制對齊的準確性。2.實現(xiàn)了語音基頻的自動標注算法。在語料時長邊界標注的基礎(chǔ)上,采用STRAIGHT(Speech Transformation and Representation based on Adaptive Interpolation of w eighted spectrogram)算法提取語音的基頻,并對提取出的基頻數(shù)據(jù)進行平滑。根據(jù)兩峰值點距離是基頻周期的關(guān)系,獲得峰值點標注位置,從峰值點形成的基頻包絡(luò)曲線,可以直接發(fā)現(xiàn)漏標、錯標的峰值點。通過人工修正,得到更加準確的標注數(shù)據(jù)。這也就是半自動標注系統(tǒng)的體現(xiàn)。3.設(shè)計實現(xiàn)了一個語音半標注系統(tǒng)。系統(tǒng)采用圖形化用戶交互界面,在語音波形上畫出每個語音基元的邊界,同時將STRAIGHT算法的基頻,轉(zhuǎn)換成語音波形上的峰值點標注。在此基礎(chǔ)上,設(shè)計實現(xiàn)了手工修改語音基元邊界和峰值點標注的功能,以完成更為精確的語音基元邊界以及基頻包絡(luò)的標注,最終實現(xiàn)可視化的語音半自動標注系統(tǒng)的設(shè)計。4.對蘭州方言進行了實驗語音學分析。利用實現(xiàn)的語音半自動標注系統(tǒng),標注了蘭州方言單字的邊界和基頻,并進行了實驗語音學分析,驗證了蘭州方言單字的語音學結(jié)論。
[Abstract]:With the rapid development of modern information technology, people put forward higher requirements for the effect of speech synthesis and speech recognition. More and more laboratory research results have been applied to the real life, and a variety of speech system products are coming out. Constructing a large scale corpus is an indispensable task in the design of excellent speech system. Whether or not to accurately annotate the corpus determines the quality of the corpus, so the annotation of the corpus plays a key role in the phonological research. A large number of manual tagging is not only time-consuming, labor-intensive and costly, but also because the ear is insensitive to the boundary of a single syllable in a word or sentence, the tagging data will produce a large error. In this paper, a semi-automatic tagging system of speech corpus is designed, which can automatically calculate the boundary of speech corpus and the envelope of fundamental frequency. On this basis, the automatic tagging results can be corrected manually, and the accurate tagging of the boundary of speech corpus and the envelope of fundamental frequency can be realized. The main work and innovation of this paper are as follows: 1. An automatic algorithm for marking the edge of speech primitives is implemented. Based on hidden Markov model and Hidden Markov Model (HMMM), an automatic time boundary alignment algorithm is used to automatically align the recorded time-free speech files. In the revaluation step of hmm training process, deterministic Annealing expectation maximization (DAEMEM) algorithm is introduced, which improves the accuracy of speech primitive boundary forced alignment. The automatic marking algorithm of speech fundamental frequency is realized. On the basis of time-length boundary annotation, the speech and representation based on Adaptive of w eighted spectrogram) algorithm is used to extract the fundamental frequency of speech, and the extracted fundamental frequency data is smoothed. According to the relationship between the distance between two peaks and the fundamental frequency period, the tagging position of the peak point is obtained, and the fundamental frequency envelope curve formed from the peak point can directly find the missing mark and the wrong target peak point. Through manual correction, more accurate tagging data can be obtained. This is the semiautomatic tagging system. 3. A speech semi-label system is designed and implemented. The system uses a graphical user interface to draw the boundaries of each speech primitive on the speech waveform. At the same time, the fundamental frequency of the Straight algorithm is converted into the peak point annotation on the speech waveform. On this basis, the function of manually modifying the speech primitive boundary and peak point tagging is designed and realized, so as to complete the more accurate voice-element boundary and the tagging of the fundamental frequency envelope. Finally, the design of the visualized semi-automatic voice tagging system .4. This paper analyzes the experimental phonetics of Lanzhou dialect. In this paper, the boundary and fundamental frequency of single words in Lanzhou dialect are annotated by using the realized phonetic semiautomatic marking system, and the experimental phonetics analysis is carried out to verify the phonological conclusions of Lanzhou dialect words.
【學位授予單位】：西北師范大學
【學位級別】：碩士
【學位授予年份】：2015
【分類號】：TN912.3

【相似文獻】

相關(guān)期刊論文前10條

1 王玉林，，趙炳彥;汽車車身零件圖紙的自動標注方法[J];計算機輔助工程;1996年04期

2 佘晶,黃翔;特征尺寸自動標注方法的研究及實現(xiàn)[J];機械制造與自動化;2005年01期

3 Q迷平

本文編號：2038994

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/wltx/2038994.html

上一篇：基于ADS1299的可穿戴式腦電信號采集系統(tǒng)前端設(shè)計
下一篇：智能球型攝像機預置位精度研究

論文發(fā)表

·知網(wǎng)|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

語音半自動標注系統(tǒng)的設(shè)計與實現(xiàn)