基于混合基元的藏語語音合成技術研究

發(fā)布時間：2018-04-27 01:42

本文選題：藏文信息處理 + 語音合成　；參考：《陜西師范大學》2016年博士論文

【摘要】：語音合成是人機交互的核心技術之一,也是中文信息處理中的一個難題。語音合成的目標是將文字信息自動轉(zhuǎn)換為清晰、流暢的語音,它的研究對自動控制、智能機器人和人機語音通訊系統(tǒng)等的研制具有重要的理論意義和實用價值。隨著計算機技術和通信技術的發(fā)展,基于語料庫的語音合成技術越來越引起社會的關注。藏文信息處理作為中文信息處理的重要組成部分,經(jīng)過二十多年的發(fā)展,在分詞、標注及詞頻統(tǒng)計等方面得到了長足進展,但藏語語音合成技術研究才剛剛起步。目前對藏語語音合成有價值的許多屬性未能被挖掘和描述,對藏語本體的研究也不夠深入。例如,現(xiàn)有系統(tǒng)還不能對藏語的韻律特征進行定性與定量分析,也不能通過文本分析為系統(tǒng)提供必要的控制信息等。本文立足于藏語言文字本體,從語言學和語音學角度研究藏文的文本特征和藏語韻律特征,并基于語料庫語音合成技術,設計實現(xiàn)了一個實用的基于混合基元的藏語語音合成系統(tǒng)。文章的主要工作包括以下幾個方面:(1)從藏語文本入手,研究了非藏文字符和句子邊界識別等語音合成的預處理問題,并根據(jù)藏語語音合成的實際需要,提出了基于詞性約束的藏文分詞算法。相對于傳統(tǒng)分詞算法,該算法通過詞性搭配規(guī)則避免了大多數(shù)交集型和包孕型歧義的產(chǎn)生,改進緊縮詞和未登錄詞的識別策略,使分詞的效率得到了明顯改善。另外,為了解決未登錄詞的語音合成問題,給出了藏文字構件分解算法,并通過開發(fā)藏文字構件分析系統(tǒng)驗證了算法的性能。同時,將該系統(tǒng)從大規(guī)模語料中統(tǒng)計的構件分布信息用于指導基元的選取與語料庫的構建。該內(nèi)容見第二章。(2)從聲學及語法特征入手,統(tǒng)計分析安多藏語的韻律層級結構、重音模式及語調(diào)現(xiàn)象,研究了藏語的韻律控制規(guī)則。首先,提出了藏語的韻律層級結構預測算法,該算法綜合運用虛詞頻度與韻律短語長度信息動態(tài)地標記韻律單元邊界,避免了韻律層級結構劃分過分依賴于分詞結果的現(xiàn)象,保證了韻律層級結構的完整性。其次,計算出各級重音的相對系數(shù)。合成時先分配韻律詞、韻律短語和語調(diào)短語的語法重音,然后根據(jù)各級韻律單元重音的相對系數(shù)計算目標語句的強調(diào)重音。最后,給出陳述句、疑問句、祈使句和感嘆句的語調(diào)特征及語調(diào)規(guī)則。實驗數(shù)據(jù)證明,本文的韻律規(guī)則對語音的韻律表達起到了重要作用,語音的自然度得到較大的改善。該內(nèi)容見第三章。(3)基元選擇是建立結構合理、規(guī)模適中的語料庫的基礎,也是基于語料庫語音合成的關鍵。為了提高系統(tǒng)的韻律表現(xiàn)并兼顧基元的搜索空間,提出混合基元庫構建策略,并給出相應的基元選擇算法。主、客觀實驗數(shù)據(jù)表明,混合基元庫策略與算法有效地保留了大基元的完整性與小基元的靈活性及魯棒性。為了避免語音合成時對基元做過多的算法調(diào)整,文章基于混合基元庫采用多樣本波形拼接策略,即一個(文本)基元在語音庫對應多個候選樣本。同時研究了多樣本語音庫的組織策略與搜索算法。實驗證明,與傳統(tǒng)算法相比,該算法提高了合成速度,增強了系統(tǒng)的實時性。該內(nèi)容見第四章。(4)以安多藏語語音合成系統(tǒng)為代表介紹了藏語語音合成系統(tǒng)的設計思想、目標、功能特色及性能評測結果。該系統(tǒng)在文本分析、韻律控制方面都比較有特色,為我們繼續(xù)研究語音合成技術提供了實驗平臺。該內(nèi)容見第五章。
[Abstract]:Speech synthesis is one of the core techniques of human-computer interaction. It is also a difficult problem in Chinese information processing. The target of speech synthesis is to automatically convert text information into clear and fluent speech. Its research has important theoretical significance and practical value for the development of automatic control, intelligent robot and human computer speech communication system. With the development of computer technology and communication technology, corpus based speech synthesis technology has attracted more and more attention. As an important part of Chinese information processing, Tibetan information processing has made great progress in the aspects of word segmentation, tagging and word frequency statistics after more than 20 years' development, but the study of Tibetan speech synthesis technology At present, many of the valuable attributes of Tibetan speech synthesis have not been excavated and described, and the research on the Tibetan language is not deep enough. For example, the existing system can not make qualitative and quantitative analysis of the prosody characteristics of Tibetan language, and can not provide the necessary control information for the system through text analysis. This article is based on the Tibetan language. Language and word ontology, from the perspective of linguistics and phonetics, study the features of Tibetan text and Tibetan prosody, and based on corpus speech synthesis technology, a practical Tibetan speech synthesis system based on mixed elements is designed and realized. The main work of this article includes the following aspects: (1) from the Tibetan text, the study of the non Tibetan language. According to the actual needs of speech synthesis in Tibetan language, a Tibetan word segmentation algorithm based on lexical constraints is proposed. Compared with the traditional word segmentation algorithm, the algorithm avoids the generation of most intersection and preconceiving ambiguities through the word matching rules, and improves the contraction word and the non login. In addition, in order to solve the problem of the speech synthesis of the unregistered words, the decomposition algorithm of the hidden text component is given, and the performance of the algorithm is verified by the development of the analysis system of the Tibetan text component. The selection of yuan and the construction of corpus. Second chapters. (2) from the acoustic and grammatical features, the prosodic control rules of Tibetan language are studied by statistical analysis of the rhythmic hierarchy, stress pattern and intonation. First, the prosodic hierarchy prediction algorithm of Tibetan language is proposed. The algorithm combines the frequency and rhythm of the function words. The phrase length information dynamically marks the boundary of the prosodic unit, avoids the phenomenon that the prosodic hierarchical structure is too dependent on the result of the participle, and ensures the integrity of the prosodic hierarchy. Secondly, the relative coefficients of the accents at all levels are calculated. The prosodic words, the prosody phrases and the intonation phrases are first assigned to the grammatical stress, and then according to the rhyme of the different levels. The relative coefficients of the metrical units stress the emphasis on the stress of the target sentences. Finally, the intonation characteristics and the intonation rules of the declarative sentences, interrogative sentences, imperative sentences and exclamations are given. The experimental data show that the rhythmic rules of this paper play an important role in the prosody expression of the speech, and the naturalness of the speech has been greatly improved. Third chapters are shown in this content. (3 Base element selection is the basis of establishing a corpus of reasonable structure and moderate scale. It is also the key to corpus based speech synthesis. In order to improve the prosody performance of the system and give consideration to the search space of the base element, a hybrid base element library construction strategy is proposed and the corresponding basic element selection algorithm is given. And the algorithm effectively preserves the integrity of large base elements and the flexibility and robustness of small primitives. In order to avoid overdoing the algorithm adjustment of the base element in speech synthesis, the paper uses a multi sample waveform splicing strategy based on the hybrid base element library, that is, a (text) base should have multiple candidate samples in the speech library. The experiment shows that the algorithm improves the speed of synthesis and enhances the real-time performance of the system compared with the traditional algorithm. The content of this algorithm is fourth chapters. (4) the design idea, target, feature and performance evaluation results of Tibetan speech synthesis system are introduced with the Tibetan speech synthesis system as representative. Text analysis and prosody control are more distinctive, providing an experimental platform for us to continue research on speech synthesis technology. The content is in the fifth chapter.

【學位授予單位】：陜西師范大學
【學位級別】：博士
【學位授予年份】：2016
【分類號】：TN912.33

【相似文獻】

相關期刊論文前10條

1 黃南川,鄧振杰,王嵬嵬,張皓健;語音合成技術的研究與發(fā)展[J];華北航天工業(yè)學院學報;2002年03期

2 鄧嘵紅;離線語音合成技術在“錄取查詢”業(yè)務中的應用[J];中國數(shù)據(jù)通信;2003年10期

3 喬英霞;崔鳳玉;;語音合成技術在旅游業(yè)的應用[J];信息技術與信息化;2006年05期

4 陳靜;李薇;崔忠偉;劉霞;;語音合成技術的研究及其發(fā)展[J];中國科技信息;2007年14期

5 暢新愛;;個性化語音合成技術走入尋常百姓家[J];家電科技;2011年06期

6 馬珊珊;汪劍超;;隱馬氏過程在語音合成技術中的應用[J];電聲技術;2012年S1期

7 方旭東;;分割壓縮語音合成技術方法通過鑒定[J];海軍醫(yī)學;1996年03期

8 李樹宏;語音合成技術的應用[J];鐘表;1997年02期

9 陳擁權;張羽;胡籃，

本文編號：1808655

資料下載