Research of Automatic Speech Recognition of the Asante-Twi D
發(fā)布時間:2021-12-17 07:14
自動語音識別(ASR)是語音翻譯系統(tǒng)的第一個也是最重要的階段,語音數(shù)據(jù)庫是其中最重要的資源。然而,高質(zhì)量的ASR需要一個非常大的語音數(shù)據(jù)庫資源。屬于阿肯語的阿桑特-特維方言被認為資源極為匱乏,語音數(shù)據(jù)資源的收集成為嚴(yán)重障礙。本文提出了一種利用小型數(shù)據(jù)庫構(gòu)建低資源方言ASR系統(tǒng)的新方法,并取得了良好的效果。首先分析了該方言的特點,設(shè)計并收集整理了一個典型的Asante-Twi語音數(shù)據(jù)庫,為更多的語音識別工作奠定了基礎(chǔ)。由于沒有相關(guān)人員進行過Asante-Twi方言識別的相關(guān)工作,沒有可信參照,為了選擇一個可靠地Asante Twi語音識別系統(tǒng)的算法和特征,本文利用Kaldi工具包建立了三個不同特征和方法的ASR系統(tǒng)。為了提高ASR系統(tǒng)的性能,采用倒譜均值方差歸一化(CMVN)和δ(Δ)動態(tài)特征對系統(tǒng)的所有特征提取方法進行了改進。此外,采用GMM-HMM模式分類器算法對每個ASR系統(tǒng)的聲學(xué)模型單元進行了改進,訓(xùn)練了兩個上下文相關(guān)(triphone)模型,以提供更好的性能。第一個ASR系統(tǒng)采用了MFCC特征提取方法,第二個ASR系統(tǒng)使用上下文相關(guān)參數(shù)的MFCCs,第三個ASR系統(tǒng)則使用PLP...
【文章來源】:西南科技大學(xué)四川省
【文章頁數(shù)】:72 頁
【學(xué)位級別】:碩士
【文章目錄】:
摘要
ABSTRACT
Main Symbol Table
1 Introduction
1.1 Background and Significance of Study
1.2 Problem Statement
1.3 Akan Language and the Twi Dialect
1.4 Related Work
1.5 Goals of the Thesis
1.6 Thesis Chapter Arrangement
2 Basics of Automatic Speech Recognition
2.1 Mathematical Representation of an ASR System
2.2 Basic Architecture of an ASR System
2.2.1 Signal Processing / Feature Extraction
2.2.2 Language Model
2.2.3 Lexicon
2.2.4 Acoustic Model
2.2.5 Pattern Classification of Acoustic Vectors
2.2.6 Decoding
2.3 Metrics for Performance Measurement
2.4 Summary of the Chapter
3 Approach to Asante-Twi ASR System Realization
3.1 The Kaldi Toolkit Overview
3.2 Asante-Twi Dialect Manual Data Preparation
3.2.1 Audio Data
3.2.2 Acoustic Data
3.2.3 Language Data
3.3 Asante-Twi Dialect Feature Extraction Processes
3.3.1 Mel Frequency Cepstral Coefficients (MFCC)
3.3.2 Perceptual Linear Prediction (PLP)
3.3.3 Cepstral Mean and Variance Normalization(CMVN)
3.3.4 Delta and Delta-Delta Features
3.4 Asante-Twi Dialect Language Modeling
3.5 Acoustic Modeling
3.5.1 Gaussian Mixture Model(GMM)
3.5.2 Hidden Markov Model(HMM)
3.5.3 Generative Learning Approach: GMM-HMM Algorithm
3.6 Asante-Twi Dialect ASR Systems Training
3.6.1 Monophone Training
3.6.2 First Triphone Training
3.6.3 Second Triphone Training
3.7 Asante-Twi Dialect ASR Systems Testing
3.7.1 Monophone Testing
3.7.2 First Triphone Testing
3.7.3 Second Triphone Testing
3.8 Summary of the Chapter
4 Results and Discussion of Asante-Twi ASR Systems
4.1 Performance Measurement Metrics for Asante-Twi ASR Systems
4.1.1 Word Error Rate(WER)
4.1.2 Sentence Error Rate(SER)
4.2 Analysis of Results of Decoding
4.2.1 First Asante-Twi Dialect ASR System Using MFCCs and ?(2000Leaves, 11000Gaussians) and?-?(2500Leaves, 15000Gaussians)transformations
4.2.2 Second Asante-Twi Dialect ASR System Using MFCCs and ?(2000Leaves, 10000Gaussians) and ?-?(2500Leaves, 15000Gaussians)transformations
4.2.3 Third Asante-Twi Dialect ASR System using PLPs and ?(2000Leaves, 10000Gaussians) + ?-?(2500Leaves, 15000Gaussians)transformations
4.2.4 Comparison of the Best Performances of All Three Asante-Twi Dialect ASR Systems
4.3 Summary of the Chapter
5 Conclusion
5.1 Overall Summary
5.2 Limitations, Future Works and Beyond
Acknowledgement
References
本文編號:3539633
【文章來源】:西南科技大學(xué)四川省
【文章頁數(shù)】:72 頁
【學(xué)位級別】:碩士
【文章目錄】:
摘要
ABSTRACT
Main Symbol Table
1 Introduction
1.1 Background and Significance of Study
1.2 Problem Statement
1.3 Akan Language and the Twi Dialect
1.4 Related Work
1.5 Goals of the Thesis
1.6 Thesis Chapter Arrangement
2 Basics of Automatic Speech Recognition
2.1 Mathematical Representation of an ASR System
2.2 Basic Architecture of an ASR System
2.2.1 Signal Processing / Feature Extraction
2.2.2 Language Model
2.2.3 Lexicon
2.2.4 Acoustic Model
2.2.5 Pattern Classification of Acoustic Vectors
2.2.6 Decoding
2.3 Metrics for Performance Measurement
2.4 Summary of the Chapter
3 Approach to Asante-Twi ASR System Realization
3.1 The Kaldi Toolkit Overview
3.2 Asante-Twi Dialect Manual Data Preparation
3.2.1 Audio Data
3.2.2 Acoustic Data
3.2.3 Language Data
3.3 Asante-Twi Dialect Feature Extraction Processes
3.3.1 Mel Frequency Cepstral Coefficients (MFCC)
3.3.2 Perceptual Linear Prediction (PLP)
3.3.3 Cepstral Mean and Variance Normalization(CMVN)
3.3.4 Delta and Delta-Delta Features
3.4 Asante-Twi Dialect Language Modeling
3.5 Acoustic Modeling
3.5.1 Gaussian Mixture Model(GMM)
3.5.2 Hidden Markov Model(HMM)
3.5.3 Generative Learning Approach: GMM-HMM Algorithm
3.6 Asante-Twi Dialect ASR Systems Training
3.6.1 Monophone Training
3.6.2 First Triphone Training
3.6.3 Second Triphone Training
3.7 Asante-Twi Dialect ASR Systems Testing
3.7.1 Monophone Testing
3.7.2 First Triphone Testing
3.7.3 Second Triphone Testing
3.8 Summary of the Chapter
4 Results and Discussion of Asante-Twi ASR Systems
4.1 Performance Measurement Metrics for Asante-Twi ASR Systems
4.1.1 Word Error Rate(WER)
4.1.2 Sentence Error Rate(SER)
4.2 Analysis of Results of Decoding
4.2.1 First Asante-Twi Dialect ASR System Using MFCCs and ?(2000Leaves, 11000Gaussians) and?-?(2500Leaves, 15000Gaussians)transformations
4.2.2 Second Asante-Twi Dialect ASR System Using MFCCs and ?(2000Leaves, 10000Gaussians) and ?-?(2500Leaves, 15000Gaussians)transformations
4.2.3 Third Asante-Twi Dialect ASR System using PLPs and ?(2000Leaves, 10000Gaussians) + ?-?(2500Leaves, 15000Gaussians)transformations
4.2.4 Comparison of the Best Performances of All Three Asante-Twi Dialect ASR Systems
4.3 Summary of the Chapter
5 Conclusion
5.1 Overall Summary
5.2 Limitations, Future Works and Beyond
Acknowledgement
References
本文編號:3539633
本文鏈接:http://sikaile.net/kejilunwen/xinxigongchenglunwen/3539633.html
最近更新
教材專著