面向服務(wù)機(jī)器人的口語(yǔ)對(duì)話系統(tǒng)研究與實(shí)現(xiàn)
本文選題:口語(yǔ)對(duì)話系統(tǒng) + 語(yǔ)義相似度; 參考:《哈爾濱工業(yè)大學(xué)》2017年碩士論文
【摘要】:隨著人工智能技術(shù)的快速發(fā)展,服務(wù)機(jī)器人已經(jīng)逐漸進(jìn)入了日常生活的各個(gè)領(lǐng)域,正扮演著越來越重要的角色。同時(shí)語(yǔ)音識(shí)別和自然語(yǔ)言處理技術(shù)的日趨成熟使口語(yǔ)對(duì)話系統(tǒng)應(yīng)用于服務(wù)機(jī)器人變成可能。本文主要針對(duì)服務(wù)機(jī)器人的具體應(yīng)用場(chǎng)景,研究了面向服務(wù)機(jī)器人的口語(yǔ)對(duì)話系統(tǒng)。本文主要研究了面向服務(wù)機(jī)器人口語(yǔ)對(duì)話系統(tǒng)中預(yù)處理模塊、基于常問問題集的問答模塊以及基于槽特征的對(duì)話管理模塊。在預(yù)處理模塊中,本文首先分析了多種分詞方法,包括基于詞典的分詞方法,基于理解的分詞方法以及基于統(tǒng)計(jì)的分詞方法,隨后比較了各個(gè)分詞方法的特點(diǎn)。并采用基于統(tǒng)計(jì)和詞典相結(jié)合的方法對(duì)語(yǔ)音識(shí)別結(jié)果進(jìn)行分詞,并對(duì)該分詞方法進(jìn)行實(shí)驗(yàn)。之后采用停用詞表去除其中的停用詞。最后分析了基于傳統(tǒng)語(yǔ)義資源進(jìn)行關(guān)鍵詞拓展的缺陷,即語(yǔ)義資源中詞匯覆蓋少不適用用口語(yǔ)對(duì)話系統(tǒng)。并提出了采用Word2Vec詞向量工具對(duì)詞串中TF-IDF值較高的詞進(jìn)行關(guān)鍵詞拓展的方法。在基于常問問題集的問答模塊中,本文首先建立了一種數(shù)據(jù)結(jié)構(gòu)以便進(jìn)行高效地候選問題集抽取,并對(duì)候選問題集的抽取比例進(jìn)行研究和實(shí)驗(yàn)。隨后分析了基于向量空間的TF-IDF相似度模型,并提出了用Word2Vec詞向量工具計(jì)算目標(biāo)問句和候選問句的語(yǔ)義相似度的方法。最后對(duì)上述兩種相似度模型進(jìn)行融合,通過實(shí)驗(yàn)確定了其權(quán)值,提高了相似度模型的匹配正確率。在基于槽特征的對(duì)話管理模塊中,本文首先改進(jìn)了基于向量空間TF-IDF的主題提取方法,采用一個(gè)滑動(dòng)窗對(duì)對(duì)話文檔進(jìn)行主題提取。并引入模擬冷卻來對(duì)主題熱度進(jìn)行監(jiān)測(cè)。在本文的最后簡(jiǎn)述了整個(gè)系統(tǒng)的具體實(shí)現(xiàn),并編寫了一個(gè)用于展示的網(wǎng)頁(yè)版應(yīng)用。
[Abstract]:With the rapid development of artificial intelligence technology, service robot has gradually entered every field of daily life and is playing a more and more important role. At the same time, speech recognition and natural language processing technology are becoming more and more mature, which makes it possible for oral dialogue systems to be used in service robots. In this paper, the oral dialogue system of service-oriented robot is studied according to the specific application scenario of service robot. In this paper, the preprocessing module, the question and answer module based on the common question set and the dialogue management module based on the slot feature in the Service-Oriented Robot Oral Dialogue system are studied. In the preprocessing module, this paper first analyzes a variety of word segmentation methods, including dictionary-based word segmentation, word segmentation based on understanding and statistical segmentation, and then compares the characteristics of each word segmentation method. The segmentation method based on the combination of statistics and dictionaries is applied to the segmentation of speech recognition results, and the experiment of the segmentation method is carried out. Then use the stop-off vocabulary to remove the stop-word. Finally, the defects of keyword expansion based on traditional semantic resources are analyzed, that is, the less lexical coverage in semantic resources is not suitable for oral dialogue systems. A method of keyword extension for words with high TF-IDF value in the string is proposed by using Word2Vec word vector tool. In the question and answer module based on the set of frequently asked questions, this paper first establishes a data structure for efficient extraction of candidate question sets, and studies and experiments on the extraction ratio of candidate question sets. Then, the TF-IDF similarity model based on vector space is analyzed, and a method to calculate the semantic similarity between target question and candidate question using Word2Vec word vector tool is proposed. Finally, the above two similarity models are fused, and the weights are determined by experiments, which improves the matching accuracy of the similarity model. In the module of conversation management based on slot feature, this paper first improves the method of topic extraction based on vector space TF-IDF, and uses a sliding window to extract the topic of dialogue document. Simulation cooling is introduced to monitor the heat of the subject. At the end of this paper, the implementation of the whole system is introduced, and a web page application is written.
【學(xué)位授予單位】:哈爾濱工業(yè)大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:TP391.1;TP242
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 張斌;全昌勤;任福繼;;語(yǔ)音合成方法和發(fā)展綜述[J];小型微型計(jì)算機(jī)系統(tǒng);2016年01期
2 王玉;任福繼;全昌勤;;口語(yǔ)對(duì)話系統(tǒng)中對(duì)話管理方法研究綜述[J];計(jì)算機(jī)科學(xué);2015年06期
3 李沛晏;朱露;吳多勝;;問答系統(tǒng)綜述[J];數(shù)字技術(shù)與應(yīng)用;2015年04期
4 熊富林;鄧怡豪;唐曉晟;;Word2vec的核心架構(gòu)及其應(yīng)用[J];南京師范大學(xué)學(xué)報(bào)(工程技術(shù)版);2015年01期
5 陳振鋒;楊曉昊;吳蔚瀾;劉加;夏善紅;;航班預(yù)定口語(yǔ)對(duì)話系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)[J];中國(guó)科學(xué)院大學(xué)學(xué)報(bào);2015年02期
6 蔣建洪;趙嵩正;羅玫;;詞典與統(tǒng)計(jì)方法結(jié)合的中文分詞模型研究及應(yīng)用[J];計(jì)算機(jī)工程與設(shè)計(jì);2012年01期
7 徐猛;劉宗田;周文;;一種基于知網(wǎng)語(yǔ)義相似度計(jì)算的應(yīng)用研究[J];微計(jì)算機(jī)信息;2010年03期
8 許云,樊孝忠,張鋒;基于知網(wǎng)的語(yǔ)義相關(guān)度計(jì)算[J];北京理工大學(xué)學(xué)報(bào);2005年05期
9 金博,史彥軍,滕弘飛;基于語(yǔ)義理解的文本相似度算法[J];大連理工大學(xué)學(xué)報(bào);2005年02期
10 陳華,韓近強(qiáng),鄧海清,李曉明;面向特定領(lǐng)域人機(jī)對(duì)話模型研究與實(shí)現(xiàn)[J];計(jì)算機(jī)工程與應(yīng)用;2004年26期
相關(guān)碩士學(xué)位論文 前5條
1 鄭梓均;基于ROS系統(tǒng)的簡(jiǎn)易服務(wù)機(jī)器人關(guān)鍵技術(shù)的研究[D];江南大學(xué);2016年
2 強(qiáng)繼朋;FAQ問答系統(tǒng)中的問句相似度研究[D];合肥工業(yè)大學(xué);2013年
3 姜鋒;基于條件隨機(jī)場(chǎng)的中文分詞研究[D];大連理工大學(xué);2006年
4 王慧慧;基于自然語(yǔ)言處理的問答系統(tǒng)研究[D];電子科技大學(xué);2006年
5 劉瀟;語(yǔ)音識(shí)別系統(tǒng)關(guān)鍵技術(shù)研究[D];哈爾濱工程大學(xué);2006年
,本文編號(hào):1784112
本文鏈接:http://sikaile.net/kejilunwen/zidonghuakongzhilunwen/1784112.html