面向服務(wù)機(jī)器人的口語(yǔ)對(duì)話系統(tǒng)和語(yǔ)言模型技術(shù)研究
發(fā)布時(shí)間:2018-07-23 13:10
【摘要】:隨著語(yǔ)音識(shí)別技術(shù)的日漸成熟,在各個(gè)領(lǐng)域的應(yīng)用層出不窮。對(duì)于服務(wù)機(jī)器人領(lǐng)域,語(yǔ)音技術(shù)主要用于服務(wù)機(jī)器人上的口語(yǔ)對(duì)話系統(tǒng),本文針對(duì)可佳機(jī)器人的具體應(yīng)用場(chǎng)景,探究了應(yīng)用于服務(wù)機(jī)器人口語(yǔ)對(duì)話系統(tǒng)設(shè)計(jì)與實(shí)現(xiàn)的過(guò)程。此外,本文還研究了與語(yǔ)音識(shí)別中語(yǔ)言模型相關(guān)的技術(shù)-聯(lián)合無(wú)監(jiān)督詞聚類的遞歸神經(jīng)網(wǎng)絡(luò)語(yǔ)言模型。 本文對(duì)面向服務(wù)機(jī)器人口語(yǔ)對(duì)話系統(tǒng)的研究主要涉及兩個(gè)方面:一是語(yǔ)音識(shí)別,二是對(duì)話管理。在語(yǔ)音識(shí)別方面,先較為詳細(xì)的介紹了語(yǔ)音識(shí)別相關(guān)基本原理,然后介紹面向可佳機(jī)器人應(yīng)用的語(yǔ)料收集,隨后對(duì)模塊所需聲學(xué)模型訓(xùn)練的完整步驟做了介紹,并對(duì)幾種聲學(xué)模型在本文提供的訓(xùn)練集和測(cè)試集下的性能做了實(shí)驗(yàn)和分析,實(shí)驗(yàn)表明,使用上下文相關(guān)的三音素模型具有最好的識(shí)別效果,最佳詞識(shí)別率達(dá)到98.39%,對(duì)應(yīng)的句子識(shí)別率為90.83%。針對(duì)機(jī)器人上機(jī)載計(jì)算設(shè)備計(jì)算能力有限和機(jī)器人在運(yùn)行過(guò)程中能提供自身狀態(tài)信息的特點(diǎn),本文設(shè)計(jì)了可以壓縮解碼時(shí)搜索空間的動(dòng)態(tài)改變語(yǔ)言模型機(jī)制,并對(duì)最后完成的語(yǔ)音識(shí)別模塊做了實(shí)驗(yàn)和分析,實(shí)驗(yàn)中基于動(dòng)態(tài)語(yǔ)言模型機(jī)制的語(yǔ)音識(shí)別模塊最佳句子識(shí)別率為87.95%,比不采用動(dòng)態(tài)語(yǔ)言模型機(jī)制的語(yǔ)音識(shí)別模塊高出12.05%。在對(duì)話管理方面,針對(duì)服務(wù)機(jī)器人的特點(diǎn),本文采用層疊狀態(tài)機(jī)的設(shè)計(jì)方法并使用python語(yǔ)言實(shí)現(xiàn)了這一對(duì)話管理框架,接著介紹了我們對(duì)話管理框架中的多模態(tài)信息加入和驗(yàn)證與確認(rèn)機(jī)制,并最后介紹了本文設(shè)計(jì)的對(duì)話管理在可佳機(jī)器人上具體任務(wù)cocktailparty上的應(yīng)用。 另外,本文還深入研究了無(wú)監(jiān)督詞聚類方法在遞歸神經(jīng)網(wǎng)絡(luò)語(yǔ)言模型上的應(yīng)用;谶f歸神經(jīng)網(wǎng)絡(luò)的語(yǔ)言模型被證明有領(lǐng)先的效果,研究表明,在遞歸神經(jīng)網(wǎng)絡(luò)語(yǔ)言模型的輸入層加入詞性標(biāo)注信息,可以顯著提高模型的效果。但使用詞性標(biāo)注需要手工標(biāo)注的數(shù)據(jù)訓(xùn)練,耗費(fèi)大量的人力物力,并且額外的標(biāo)注器增加了模型的復(fù)雜性。為解決上述問(wèn)題,本文嘗試將布朗詞聚類的結(jié)果代替詞性標(biāo)注信息加入到遞歸神經(jīng)網(wǎng)絡(luò)語(yǔ)言模型輸入層。實(shí)驗(yàn)顯示,在Penn Treebank語(yǔ)料上,加入布朗詞類信息的遞歸神經(jīng)網(wǎng)絡(luò)語(yǔ)言模型相比原遞歸神經(jīng)網(wǎng)絡(luò)語(yǔ)言模型困惑度下降8-9%。
[Abstract]:With the maturation of speech recognition technology, the applications in various fields emerge one after another. For the field of service robot, the speech technology is mainly used in the spoken dialogue system of the service robot. In this paper, the design and implementation of the oral dialogue system for the service robot are discussed in the light of the specific application scene of the good robot. In addition, this paper also studies the language model associated with the language model in speech recognition, which combines the unsupervised word clustering with the recurrent neural network language model. In this paper, the research of Service-Oriented Robot Oral Dialogue system mainly involves two aspects: one is speech recognition, the other is dialogue management. In the aspect of speech recognition, the basic principles of speech recognition are introduced in detail, and then the collection of corpus for the application of good robot is introduced, and then the complete steps of acoustic model training for the module are introduced. The performance of several acoustic models under the training set and test set provided in this paper is tested and analyzed. The experiment shows that the use of context-dependent trichonic model has the best recognition effect. The best word recognition rate is 98.39 and the corresponding sentence recognition rate is 90.83. In view of the limited computing power of the airborne computing equipment on the robot and the ability of the robot to provide its own state information in the course of operation, this paper designs a dynamic changing language model mechanism which can compress and decode the search space. The final speech recognition module is tested and analyzed. The optimal sentence recognition rate of the speech recognition module based on dynamic language model is 87.95, which is 12.05 higher than that of the speech recognition module without dynamic language model. In the aspect of dialogue management, according to the characteristics of service robot, this paper adopts the design method of stacked state machine and implements this dialog management framework with python language. Then we introduce the mechanism of multi-modal information joining, verification and validation in our dialogue management framework. Finally, we introduce the application of the dialogue management in cocktailparty. In addition, the application of unsupervised word clustering in recurrent neural network language model is also studied. The language model based on recurrent neural network has been proved to have the leading effect. The research shows that the effect of the model can be improved significantly by adding part of speech tagging information into the input layer of the language model of recurrent neural network. However, the use of part of speech tagging requires manual tagging data training, which consumes a lot of manpower and material resources, and the extra tagger increases the complexity of the model. In order to solve the above problems, this paper attempts to add the result of Brownian word clustering to the input layer of recursive neural network language model instead of part of speech tagging information. The experimental results show that the degree of confusion of the recurrent neural network language model with Brown's part of speech information is 8-9 lower than that of the original recursive neural network language model on the Penn Treebank corpus.
【學(xué)位授予單位】:中國(guó)科學(xué)技術(shù)大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2014
【分類號(hào)】:TP242;TN912.34
本文編號(hào):2139515
[Abstract]:With the maturation of speech recognition technology, the applications in various fields emerge one after another. For the field of service robot, the speech technology is mainly used in the spoken dialogue system of the service robot. In this paper, the design and implementation of the oral dialogue system for the service robot are discussed in the light of the specific application scene of the good robot. In addition, this paper also studies the language model associated with the language model in speech recognition, which combines the unsupervised word clustering with the recurrent neural network language model. In this paper, the research of Service-Oriented Robot Oral Dialogue system mainly involves two aspects: one is speech recognition, the other is dialogue management. In the aspect of speech recognition, the basic principles of speech recognition are introduced in detail, and then the collection of corpus for the application of good robot is introduced, and then the complete steps of acoustic model training for the module are introduced. The performance of several acoustic models under the training set and test set provided in this paper is tested and analyzed. The experiment shows that the use of context-dependent trichonic model has the best recognition effect. The best word recognition rate is 98.39 and the corresponding sentence recognition rate is 90.83. In view of the limited computing power of the airborne computing equipment on the robot and the ability of the robot to provide its own state information in the course of operation, this paper designs a dynamic changing language model mechanism which can compress and decode the search space. The final speech recognition module is tested and analyzed. The optimal sentence recognition rate of the speech recognition module based on dynamic language model is 87.95, which is 12.05 higher than that of the speech recognition module without dynamic language model. In the aspect of dialogue management, according to the characteristics of service robot, this paper adopts the design method of stacked state machine and implements this dialog management framework with python language. Then we introduce the mechanism of multi-modal information joining, verification and validation in our dialogue management framework. Finally, we introduce the application of the dialogue management in cocktailparty. In addition, the application of unsupervised word clustering in recurrent neural network language model is also studied. The language model based on recurrent neural network has been proved to have the leading effect. The research shows that the effect of the model can be improved significantly by adding part of speech tagging information into the input layer of the language model of recurrent neural network. However, the use of part of speech tagging requires manual tagging data training, which consumes a lot of manpower and material resources, and the extra tagger increases the complexity of the model. In order to solve the above problems, this paper attempts to add the result of Brownian word clustering to the input layer of recursive neural network language model instead of part of speech tagging information. The experimental results show that the degree of confusion of the recurrent neural network language model with Brown's part of speech information is 8-9 lower than that of the original recursive neural network language model on the Penn Treebank corpus.
【學(xué)位授予單位】:中國(guó)科學(xué)技術(shù)大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2014
【分類號(hào)】:TP242;TN912.34
【參考文獻(xiàn)】
相關(guān)期刊論文 前1條
1 黃寅飛,鄭方,燕鵬舉,徐明星,吳文虎;校園導(dǎo)航系統(tǒng)Easy Nav的設(shè)計(jì)與實(shí)現(xiàn)[J];中文信息學(xué)報(bào);2001年04期
,本文編號(hào):2139515
本文鏈接:http://sikaile.net/kejilunwen/wltx/2139515.html
最近更新
教材專著