基于模板匹配的語音樣例快速檢索技術(shù)研究

發(fā)布時(shí)間：2018-09-03 11:18

【摘要】：語音樣例檢索是根據(jù)用戶輸入的查詢樣例（即波形片段），在海量的語音資源中搜索并返回與之相關(guān)聯(lián)的語音片段的過程。它在信息安全、語音搜索引擎以及語音資源的分類管理等領(lǐng)域具有重要的應(yīng)用價(jià)值�；谀０迤ヅ涞恼Z音樣例檢索是當(dāng)前語音樣例檢索的主流技術(shù)之一。然而，直接運(yùn)用該方法進(jìn)行語音樣例的檢索存在時(shí)間消耗量大以及不能充分考慮聲學(xué)條件變異的缺點(diǎn)。針對(duì)上述缺點(diǎn)，本文主要在減少檢索時(shí)間消耗量以及相關(guān)區(qū)域重排序等方面開展研究，，以達(dá)到加快檢索速度、提高檢索精度的目的。本文的主要工作集中在以下三個(gè)方面：針對(duì)直接運(yùn)用動(dòng)態(tài)時(shí)間規(guī)整進(jìn)行語音樣例檢索在相關(guān)區(qū)域搜索時(shí)時(shí)間消耗量大的問題，提出融合分段累積近似下界估計(jì)的動(dòng)態(tài)時(shí)間規(guī)整算法，此算法通過大規(guī)模減少相關(guān)區(qū)域搜索時(shí)的動(dòng)態(tài)匹配次數(shù)來達(dá)到提高檢索速度的目的。該方法首先計(jì)算查詢樣例和測試語句中每個(gè)匹配區(qū)域之間動(dòng)態(tài)規(guī)整得分的分段累積近似下界估計(jì)；然后運(yùn)用K最近鄰搜索算法和動(dòng)態(tài)時(shí)間規(guī)整算法搜索與查詢樣例相關(guān)的區(qū)域。實(shí)驗(yàn)結(jié)果表明：該方法的檢索速度是直接運(yùn)用動(dòng)態(tài)時(shí)間規(guī)整進(jìn)行檢索的5.9倍，而對(duì)其檢索精度毫無影響。直接運(yùn)用動(dòng)態(tài)時(shí)間規(guī)整進(jìn)行語音樣例檢索存在大量的冗余計(jì)算和冗余匹配。針對(duì)此問題，提出了一種基于分段動(dòng)態(tài)時(shí)間規(guī)整的語音樣例檢索方法，該方法將測試語句按照一定規(guī)則劃分為一系列匹配區(qū)域；然后運(yùn)用動(dòng)態(tài)時(shí)間規(guī)整進(jìn)行語音樣例的檢索。為進(jìn)一步提高檢索效率，將分段動(dòng)態(tài)時(shí)間規(guī)整算法和分段累積近似下界估計(jì)相結(jié)合。同時(shí)為了增加對(duì)聲學(xué)條件變異的考慮，運(yùn)用虛擬相關(guān)反饋技術(shù)修正檢索結(jié)果，提出基于虛擬相似度的相關(guān)區(qū)域重排序方法。實(shí)驗(yàn)結(jié)果表明：該方法的檢索速度是直接運(yùn)用動(dòng)態(tài)時(shí)間規(guī)整進(jìn)行檢索的14.6倍，檢索精度相對(duì)于后者也提高了5.21%。針對(duì)融合下界估計(jì)的動(dòng)態(tài)時(shí)間規(guī)整算法和融合下界估計(jì)的分段動(dòng)態(tài)規(guī)整算法存在的局限，提出融合邊界信息的動(dòng)態(tài)時(shí)間規(guī)整算法。該方法首先運(yùn)用層次凝聚聚類算法將查詢樣例和測試語句的音素后驗(yàn)概率特征序列分段（即邊界檢測），計(jì)算每個(gè)分段的均值向量，并將這些均值向量組成新索引和新查詢；再運(yùn)用動(dòng)態(tài)時(shí)間規(guī)整算法進(jìn)行語音樣例的檢索；最后采用虛擬相關(guān)反饋修正檢索結(jié)果。實(shí)驗(yàn)結(jié)果表明：該方法的檢索速度是直接運(yùn)用動(dòng)態(tài)時(shí)間規(guī)整進(jìn)行檢索的15.4倍，檢索精度在后者的基礎(chǔ)上也提高了0.73%。
[Abstract]:Speech sample retrieval is a process of searching and returning the associated speech fragments in a large amount of speech resources according to the query samples (i.e. waveform fragments) input by the user. It has important application value in the fields of information security, voice search engine and classification management of speech resources. Speech sample retrieval based on template matching is one of the main techniques in speech sample retrieval. However, the direct use of this method for the retrieval of speech samples has the disadvantages of high time consumption and insufficient consideration of acoustic condition variation. In order to speed up the retrieval speed and improve the retrieval accuracy, this paper mainly focuses on reducing the retrieval time consumption and reordering the relevant areas in order to speed up the retrieval speed and improve the retrieval accuracy. The main work of this paper is focused on the following three aspects: aiming at the problem of large amount of time consumption in the search of related areas by direct use of dynamic time regularization for speech sample retrieval, A dynamic time warping algorithm based on piecewise cumulative approximate lower bound estimation is proposed. This algorithm can improve the retrieval speed by reducing the number of dynamic matching in search of relevant regions on a large scale. This method first calculates the piecewise cumulative approximate lower bound estimation of the dynamic warping scores between the query samples and each matching region in the test statement, and then uses the K-nearest neighbor search algorithm and the dynamic time warping algorithm to search the regions related to the query samples. The experimental results show that the retrieval speed of this method is 5.9 times faster than that of the direct use of dynamic time regulation, but it has no effect on the retrieval accuracy. There are a lot of redundant computation and redundant matching in speech sample retrieval using dynamic time warping. To solve this problem, a speech sample retrieval method based on piecewise dynamic temporal regularity is proposed, which divides test statements into a series of matching regions according to certain rules, and then uses dynamic time warping to retrieve speech samples. In order to further improve the retrieval efficiency, the piecewise dynamic time warping algorithm is combined with the piecewise cumulative approximate lower bound estimation. At the same time, in order to increase the consideration of acoustic condition variation, virtual correlation feedback technique is used to modify the retrieval results, and a virtual similarity based relative region reordering method is proposed. The experimental results show that the retrieval speed of this method is 14.6 times faster than that of the direct use of dynamic time warping, and the retrieval accuracy is 5.21 times higher than that of the latter. In view of the limitations of the dynamic time warping algorithm for fusion lower bound estimation and the segmented dynamic warping algorithm for fusion lower bound estimation, a dynamic time warping algorithm based on fusion boundary information is proposed. The method first uses hierarchical aggregation clustering algorithm to segment the phoneme posteriori probability feature series of query samples and test sentences (i.e. boundary detection), calculates the mean vector of each segment, and sets these mean vectors into new indexes and new queries. Then the dynamic time warping algorithm is used to retrieve the speech samples, and the virtual correlation feedback is used to correct the retrieval results. The experimental results show that the retrieval speed of this method is 15.4 times faster than that of the direct use of dynamic time warping, and the retrieval accuracy is improved by 0.73 on the basis of the latter.
【學(xué)位授予單位】：解放軍信息工程大學(xué)
【學(xué)位級(jí)別】：碩士
【學(xué)位授予年份】：2013
【分類號(hào)】：TN912.3

【共引文獻(xiàn)】

相關(guān)期刊論文前10條

1 牛濱;孔令志;羅森林;潘麗敏;郭亮;;基于MFCC和GMM的個(gè)性音樂推薦模型[J];北京理工大學(xué)學(xué)報(bào);2009年04期

2 劉剛;葉大田;;針對(duì)漢語聲母發(fā)音的輔助教師系統(tǒng)的研究[J];北京生物醫(yī)學(xué)工程;2008年02期

3 張志勇;宋陽;;基于嵌入式下的語音機(jī)器人的設(shè)計(jì)與實(shí)現(xiàn)[J];長春師范學(xué)院學(xué)報(bào)(人文社會(huì)科學(xué)版);2008年10期

4 馬志欣;王宏;李鑫;;語音識(shí)別技術(shù)綜述[J];昌吉學(xué)院學(xué)報(bào);2006年03期

5 楊占軍;楊英杰;王強(qiáng);;基于DSP的語音識(shí)別系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)[J];東北電力大學(xué)學(xué)報(bào);2006年02期

6 高翔;姬光榮;姬婷婷;王群;;基于探測過程建模的探地雷達(dá)多目標(biāo)識(shí)別[J];電波科學(xué)學(xué)報(bào);2011年03期

7 熊心美;陸勇;李廣波;;基于高速SOC的FFT頻譜分析儀的設(shè)計(jì)[J];電測與儀表;2009年01期

8 白順先;馬瑞士;;語音端點(diǎn)檢測中判決機(jī)制的研究[J];大連民族學(xué)院學(xué)報(bào);2010年03期

9 李炳男;張雪英;王峰;;基于RBF神經(jīng)網(wǎng)絡(luò)的鋼琴單音識(shí)別研究[J];電腦開發(fā)與應(yīng)用;2009年04期

10 車士偉;吾守爾·斯拉木;;淺談連續(xù)語音識(shí)別中的關(guān)鍵技術(shù)[J];電腦與信息技術(shù);2010年02期

相關(guān)會(huì)議論文前10條

1 王剛;鄔曉鈞;鄭方;王琳琳;張陳昊;;基于參考說話人模型和雙層結(jié)構(gòu)的說話人辨認(rèn)[A];第十一屆全國人機(jī)語音通訊學(xué)術(shù)會(huì)議論文集（二）[C];2011年

2 馬治飛;徐望;王炳錫;王興斌;;一種基于概率模型和倒譜差分的特征補(bǔ)償算法[A];第十二屆全國信號(hào)處理學(xué)術(shù)年會(huì)（CCSP-2005）論文集[C];2005年

3 王興斌;徐望;王炳錫;馬治飛;;噪聲環(huán)境下語音能量的MMSE估計(jì)及其在語音識(shí)別中的應(yīng)用[A];第十二屆全國信號(hào)處理學(xué)術(shù)年會(huì)（CCSP-2005）論文集[C];2005年

4 徐小峰;胡央芳;劉守快;鄭翔;俞一彪;王宇嶺;王慶才;戴云;李道明;;基于VQ算法的病癥脈象識(shí)別[A];第十三屆全國信號(hào)處理學(xué)術(shù)年會(huì)（CCSP-2007）論文集[C];2007年

5 展領(lǐng);景新幸;;矢量量化和VQ-UBM在說話人確認(rèn)中的應(yīng)用[A];中國聲學(xué)學(xué)會(huì)2009年青年學(xué)術(shù)會(huì)議[CYCA’09]論文集[C];2009年

6 漢小歡;景新幸;;一種級(jí)聯(lián)的特征參數(shù)提取方法[A];中國聲學(xué)學(xué)會(huì)2009年青年學(xué)術(shù)會(huì)議[CYCA’09]論文集[C];2009年

7 茹婷婷;謝湘;;耳語音數(shù)據(jù)庫的設(shè)計(jì)與采集[A];第九屆全國人機(jī)語音通訊學(xué)術(shù)會(huì)議論文集[C];2007年

8 熊軍軍;馬瑞堂;李成榮;;兒童語音識(shí)別的研究現(xiàn)狀[A];第九屆全國人機(jī)語音通訊學(xué)術(shù)會(huì)議論文集[C];2007年

9 沈宏余;李英;;基于TMS320VC5416的語音識(shí)別系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)[A];2007'儀表，自動(dòng)化及先進(jìn)集成技術(shù)大會(huì)論文集（二）[C];2007年

10 李志忠;滕光輝;;基于發(fā)聲信息的動(dòng)物福利評(píng)價(jià)研究現(xiàn)狀[A];農(nóng)業(yè)工程科技創(chuàng)新與建設(shè)現(xiàn)代農(nóng)業(yè)——2005年中國農(nóng)業(yè)工程學(xué)會(huì)學(xué)術(shù)年會(huì)論文集第三分冊(cè)[C];2005年

相關(guān)博士學(xué)位論文前10條

1 黃湘松;基于混淆網(wǎng)絡(luò)的漢語語音檢索技術(shù)研究[D];哈爾濱工程大學(xué);2010年

2 黃麗霞;非特定人魯棒性語音識(shí)別中前端濾波器的研究[D];太原理工大學(xué);2011年

3 尉洪;漢語基元音素獨(dú)立分量譜分析對(duì)比及語音合成研究[D];云南大學(xué);2011年

4 高翔;淺埋地層探地雷達(dá)信號(hào)處理與目標(biāo)識(shí)別研究[D];中國海洋大學(xué);2011年

5 呂釗;噪聲環(huán)境下的語音識(shí)別算法研究[D];安徽大學(xué);2011年

6 吳強(qiáng);基于聽覺感知與張量模型的魯棒語音特征提取方法研究[D];上海交通大學(xué);2010年

7 曹聞;時(shí)空數(shù)據(jù)模型及其應(yīng)用研究[D];解放軍信息工程大學(xué);2011年

8 丁琦;數(shù)字音頻篡改檢測與隱寫分析技術(shù)研究[D];解放軍信息工程大學(xué);2011年

9 李邵梅;文本無關(guān)短語音說話人識(shí)別技術(shù)研究[D];解放軍信息工程大學(xué);2011年

10 龍潛;噪聲環(huán)境下的語音識(shí)別技術(shù)研究[D];中國科學(xué)技術(shù)大學(xué);2007年

相關(guān)碩士學(xué)位論文前10條

1 王文姝;基于模糊理論的關(guān)鍵詞識(shí)別算法研究[D];哈爾濱工程大學(xué);2010年

2 楊青;手勢識(shí)別技術(shù)的研究[D];大連理工大學(xué);2010年

3 時(shí)筱惠;大連方言語音對(duì)英語語音習(xí)得的影響[D];遼寧師范大學(xué);2010年

4 張宇;基于倒譜特征的說話人識(shí)別方法研究[D];大連海事大學(xué);2010年

5 劉亞玉;限定性文本的語料庫自動(dòng)構(gòu)建[D];中國海洋大學(xué);2010年

6 郭秋雨;小詞匯量非特定人的孤立詞語音識(shí)別系統(tǒng)研究[D];中國海洋大學(xué);2010年

7 丁寧;小麥碰撞音頻信號(hào)預(yù)處理方法研究[D];河南工業(yè)大學(xué);2010年

8 吳榮娣;基于特征分類直方圖均衡的魯棒性語音識(shí)別研究[D];蘇州大學(xué);2010年

9 銀兵;基于μ’nSP~（TM）處理器的嵌入式語音控制技術(shù)研究[D];河南理工大學(xué);2010年

10 童佳寧;基于HMM和PNN的混合語音識(shí)別模型研究[D];河北工程大學(xué);2010年

本文編號(hào)：2219826

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會(huì)員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2219826.html

上一篇：中小企業(yè)電子商務(wù)差異化發(fā)展——基于博弈論的分析
下一篇：網(wǎng)絡(luò)鏈接侵權(quán)責(zé)任探析

論文發(fā)表

·知網(wǎng)|萬方|維普|龍?jiān)磡省級(jí)|國家級(jí)|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于模板匹配的語音樣例快速檢索技術(shù)研究