基于詞格的語音文檔檢索技術研究
發(fā)布時間:2018-04-22 11:28
本文選題:語音文檔檢索 + 詞格; 參考:《解放軍信息工程大學》2012年碩士論文
【摘要】:語音文檔檢索是根據(jù)用戶提出的查詢項,在海量語音資源中搜索并返回與之相關聯(lián)的語音文檔或語音片段的過程,在信息安全、語音搜索引擎以及語音資源的分類管理等領域具有重要的應用價值。近年來基于Lattice的語音文檔檢索技術迅速發(fā)展成為了當前語音文檔檢索的主流技術,受到了越來越多的重視和青睞。然而,Lattice的特殊結構在包含更多正確識別結果的同時,也帶來了新的問題和挑戰(zhàn)。本文針對漢語Lattice的特點,在Lattice結構改進、最優(yōu)識別單元和檢索單元選取、相關文檔重排序等方面開展研究,以達到加快檢索速度、提高檢索精度的目的,主要工作集中在以下三個方面: (1)針對傳統(tǒng)Lattice生成方法忽略了音位屬性等語音知識的問題,提出了一種融合音位屬性的Lattice結構改進方法。由于不同來源的Lattice具有信息互補性,該方法首先利用基于音位屬性檢測的語音識別系統(tǒng)建立Lattice,然后與傳統(tǒng)自動語音識別系統(tǒng)生成的Lattice進行信息融合。針對融合后Lattice規(guī)模增大的問題,采用基于位置的分段對齊方法對其結構進行壓縮,,得到一種結構緊湊且融合音位屬性的Lattice改進結構。實驗結果表明,改進后的Lattice包含更多的正確識別結果,其索引覆蓋率由77.83%上升到80.34%,Lattice錯誤率由25.31%下降到19.66%,同時有效地提高了語音檢索性能。 (2)針對漢語語音文檔檢索中最優(yōu)識別單元和檢索單元不一致的問題,提出了一種基于子詞PSPL的語音文檔索引方法。該方法首先以詞為識別單元對語音文檔進行解碼,得到PSPL;然后對PSPL進行子詞切分,并根據(jù)子詞弧與原始詞弧的后驗概率關系,將PSPL轉換為相應的子詞PSPL;最后以子詞PSPL作為索引進行查詢項檢索,實現(xiàn)了以詞作為識別單元、子詞作為檢索單元的目的。實驗結果表明,該檢索方法在利用豐富語言信息的同時,較好地解決了詞解碼器存在的邊界分割不正確問題,其檢索性能明顯優(yōu)于目前普遍使用的識別單元和檢索單元均為詞的PSPL索引方法。 (3)針對檢索結果中相關文檔排序不準確的問題,提出了一種基于聲學特征相似度的相關文檔重排序方法。該方法利用虛擬相關反饋技術對語音文檔檢索系統(tǒng)進行改進,首先從第一次檢索結果中選取相關度得分較高的前N篇語音文檔構成虛擬相關文檔集合,然后比較檢索出的語音文檔和虛擬相關文檔集合在查詢項出現(xiàn)時間段內的聲學特征相似度,最后對原始相關度和聲學特征相似度進行融合得到新的相關度分數(shù),并依據(jù)新的相關度分數(shù)對檢索結果進行重排序。實驗結果表明,重排序后的檢索結果中R-準確率由69.07%上升到75.82%,同時隨著迭代次數(shù)的增多,檢索性能得到了進一步提升。
[Abstract]:Speech document retrieval is the process of searching and returning the associated speech documents or speech fragments in the massive speech resources according to the query items put forward by the user. Speech search engine and classification management of speech resources have important application value. In recent years, the technology of voice document retrieval based on Lattice has rapidly developed into the mainstream technology of voice document retrieval, which has been paid more and more attention and favor. However, the special structure of lattice not only contains more correct identification results, but also brings new problems and challenges. According to the characteristics of Chinese Lattice, this paper studies on the improvement of Lattice structure, the selection of optimal identification unit and retrieval unit, and the reordering of relevant documents, in order to speed up the retrieval speed and improve the retrieval accuracy. The main focus is on the following three areas: 1) aiming at the problem that the traditional Lattice generation method neglects phonetic knowledge such as phonetic attributes, an improved Lattice structure method is proposed, which combines phonetic attributes. Because Lattice from different sources are complementary to each other, a speech recognition system based on phonological attribute detection is used to establish Lattice, and then to fuse information with Lattice generated by traditional automatic speech recognition system. In order to solve the problem of increasing the scale of Lattice after fusion, a new improved structure of Lattice with compact structure and fused phonemes is obtained by using the piecewise alignment method based on position. The experimental results show that the improved Lattice contains more correct recognition results, and its index coverage increases from 77.83% to 80.34%. The error rate of Lattice is reduced from 25.31% to 19.66%, and the performance of speech retrieval is improved effectively. 2) aiming at the inconsistency between the optimal recognition unit and the retrieval unit in Chinese speech document retrieval, a speech document indexing method based on subword PSPL is proposed. The method firstly decodes the speech document with the word recognition unit and obtains the PSPL.Then, the sub-word segmentation of the PSPL is carried out, and according to the posterior probability relation between the subword arc and the original word arc, The PSPL is transformed into the corresponding sub-word PSPL.The last, the query item is retrieved by using the sub-word PSPL as the index, which realizes the purpose of using the word as the identification unit and the sub-word as the retrieval unit. The experimental results show that the retrieval method not only makes use of rich language information, but also solves the problem of incorrect boundary segmentation in word decoders. Its retrieval performance is obviously superior to that of the PSPL indexing method, which is widely used at present, in which the recognition unit and the retrieval unit are both words. In order to solve the problem of inaccurate sorting of relevant documents in retrieval results, a new method of document reordering based on acoustic feature similarity is proposed. The method uses virtual correlation feedback technology to improve the speech document retrieval system. Firstly, the first N speech documents with high correlation score are selected from the first retrieval result to form the virtual correlation document set. Then we compare the acoustic similarity between the retrieved speech document and the virtual correlation document set in the time when the query item appears. Finally, we fuse the original correlation degree and the acoustic feature similarity to get a new correlation score. The retrieval results are reordered according to the new correlation score. The experimental results show that the R- accuracy of the reordered retrieval results is increased from 69.07% to 75.82%, and the retrieval performance is further improved with the number of iterations increasing.
【學位授予單位】:解放軍信息工程大學
【學位級別】:碩士
【學位授予年份】:2012
【分類號】:TN912.34
【參考文獻】
相關期刊論文 前10條
1 王歡良;韓紀慶;鄭鐵然;李海峰;;基于K-L散度的最大后驗弧主導的混淆網(wǎng)絡生成算法[J];電子與信息學報;2008年05期
2 倪崇嘉;劉文舉;徐波;;漢語大詞匯量連續(xù)語音識別系統(tǒng)研究進展[J];中文信息學報;2009年01期
3 孟莎;劉加;;漢語語音檢索的集外詞問題與兩階段檢索方法[J];中文信息學報;2009年06期
4 孟莎;余鵬;劉加;;基于格的漢語自然對話語音索引方法研究[J];自動化學報;2010年02期
5 鄭方,牟曉隆,徐明星,武健,宋戰(zhàn)江;漢語語音聽寫機技術的研究與實現(xiàn)[J];軟件學報;1999年04期
6 鄭鐵然;韓紀慶;李海洋;;基于詞片的語言模型及在漢語語音檢索中的應用[J];通信學報;2009年03期
7 郝杰,李星;漢語連續(xù)語音識別中關鍵詞可信度的貝葉斯估計[J];聲學學報;2002年05期
8 張家
本文編號:1786978
本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/1786978.html
最近更新
教材專著