天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁 > 科技論文 > 搜索引擎論文 >

基于短語句法組塊的中文FAQ問答系統(tǒng)研究

發(fā)布時(shí)間:2018-03-02 06:21

  本文關(guān)鍵詞: 中文問答系統(tǒng) 受限域 問句分類 組塊 編輯距離 問句相似度 出處:《昆明理工大學(xué)》2013年碩士論文 論文類型:學(xué)位論文


【摘要】:問答系統(tǒng)是自然語言處理領(lǐng)域的一個(gè)重要方向,旨在讓用戶直接用自然語言提問并獲得答案。相對(duì)于傳統(tǒng)關(guān)鍵詞方式的搜索引擎來說,自動(dòng)問答系統(tǒng)具有顯著的優(yōu)勢。在受限域,基于FAQ(常問問題)的問答系統(tǒng)把用戶經(jīng)常提問的問題和相關(guān)的答案組織在一起,在問題答案的定位上,更準(zhǔn)確,快捷和高效,在日常生活的各個(gè)領(lǐng)域,有著重要的應(yīng)用前景,是當(dāng)前研究的熱點(diǎn)。本文主要利用自然語言處理技術(shù),對(duì)受限域的中文問句分類,問句的組塊分析,問句相似度計(jì)算等問答系統(tǒng)關(guān)鍵技術(shù)進(jìn)行探討與研究,并在此基礎(chǔ)上實(shí)現(xiàn)了云南旅游領(lǐng)域FAQ問答原型系統(tǒng)。具體來說,本文主要取得了以下幾個(gè)較有特色的成果: (1)針對(duì)傳統(tǒng)的概率統(tǒng)計(jì)方法進(jìn)行問句分類,分類器的訓(xùn)練只依賴于問句中特征詞的出現(xiàn)頻率,沒有考慮到問句中詞與詞之間的語義關(guān)系的問題,本文提出了一種語義相似度與隱Markov序列分析模型相結(jié)合的問句分類方法。該方法首先提取所有問句類別的特征詞集作為不同隱Markov模型分類器的觀察序列,其次以不同類別問句特征詞集的形成演化過程作為狀態(tài)轉(zhuǎn)換序列,最后,通過詞語語義相似度計(jì)算方法計(jì)算出特征詞在不同類別狀態(tài)下的觀測值概率分布,分別構(gòu)建不同類型的問句隱Markov分類模型。對(duì)旅游領(lǐng)域問句進(jìn)行了分類實(shí)驗(yàn),結(jié)果表明提出的方法比現(xiàn)有方法在準(zhǔn)確率上有一定的提高。 (2)現(xiàn)有的組塊分析方法中,主要是通過詞語字面信息和統(tǒng)計(jì)特征來進(jìn)行組塊,沒有考慮到不同類型問句的句法結(jié)構(gòu)特征。針對(duì)以上問題,本文提出了一種基于短語句法樹的中文問句組塊分析方法。該方法首先在已經(jīng)獲取問句類別的基礎(chǔ)上,結(jié)合問句的提問方式和詞法特征,分析問句的句型,歸納總結(jié)出不同問句的結(jié)構(gòu)形態(tài)。然后利用短語句法分析器生成問句的短語句法樹,最后結(jié)合領(lǐng)域問句的特性,自定義組塊規(guī)則,對(duì)領(lǐng)域問句進(jìn)行組塊的識(shí)別和標(biāo)注。實(shí)驗(yàn)結(jié)果表明,該方法具有較好的效果。 (3)針對(duì)現(xiàn)有的漢語句子相似度計(jì)算方法,沒有充分利用句子詞匯語義信息和句子結(jié)構(gòu)信息的問題,本文提出了一種基于改進(jìn)編輯距離的領(lǐng)域問句相似度計(jì)算方法。該方法以組塊取代字符作為基本的編輯單元,根據(jù)領(lǐng)域問句的特點(diǎn),對(duì)不同的詞賦予不同的權(quán)重,并通過知網(wǎng)計(jì)算塊內(nèi)詞語相似度來衡量塊間的替換代價(jià),對(duì)不同類型的組塊賦予不同的插入、刪除代價(jià)。實(shí)驗(yàn)結(jié)果表明,該方法具有較好的效果。 (4)利用上述研究成果,并以云南旅游領(lǐng)域?yàn)槔?對(duì)領(lǐng)域問句進(jìn)行分類,組塊分析和標(biāo)注,設(shè)計(jì)并實(shí)現(xiàn)了云南旅游FAQ問答原型系統(tǒng)。
[Abstract]:Question answering system is an important direction in the field of natural language processing, which aims to let users directly use natural language to ask questions and get answers. Automatic question answering system has significant advantages. In restricted domain, FAQ-based question answering system organizes users' frequently asked questions and related answers together, and is more accurate, fast and efficient in the positioning of question answers. In every field of daily life, it has an important application prospect and is a hot research topic at present. This paper mainly uses natural language processing technology, classifies Chinese question sentence in restricted domain, and analyzes the block of question sentence. The key technologies of question answering system such as question similarity calculation are discussed and studied, and the prototype system of FAQ question answering in Yunnan tourism field is implemented on this basis. The training of classifier only depends on the frequency of feature words in question sentences, and does not take into account the semantic relationship between words and words in question sentences. In this paper, a semantic similarity method combined with the hidden Markov sequence analysis model is proposed, in which the feature word sets of all question categories are extracted as observation sequences of different hidden Markov model classifiers. Secondly, the formation and evolution of feature word sets of different types of questions are taken as the sequence of state transition. Finally, the probability distribution of the observed values of feature words in different categories is calculated by the method of semantic similarity calculation. Different types of implicit Markov classification models of question sentences are constructed, and the classification experiments of question sentences in tourism field are carried out. The results show that the proposed method is more accurate than the existing methods. (2) in the existing methods of block analysis, it is mainly through the literal information and statistical features of words, and the syntactic structure characteristics of different types of question sentences are not taken into account. In this paper, a method of Chinese question block analysis based on phrase syntax tree is proposed. The structure of different questions is summed up. Then the phrase syntax tree of question is generated by using phrase parser. Finally, according to the characteristics of domain questions, the block rules are defined. The block recognition and tagging of domain questions are carried out. The experimental results show that the proposed method is effective. (3) aiming at the problem that the existing Chinese sentence similarity calculation methods do not make full use of the semantic information of sentence vocabulary and sentence structure information, In this paper, a method for calculating the similarity of domain question sentences based on improved editing distance is proposed, in which block substitution for characters is used as the basic editing unit. According to the characteristics of domain questions, different words are given different weights. The similarity of words in blocks is calculated to measure the substitution cost of blocks, and different insertion and deletion costs are given to different types of blocks. The experimental results show that the proposed method is effective. Using the above research results and taking Yunnan tourism field as an example, this paper classifies, analyzes and annotates the domain questions, and designs and implements the FAQ question answering prototype system of Yunnan tourism.
【學(xué)位授予單位】:昆明理工大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2013
【分類號(hào)】:TP391.1

【參考文獻(xiàn)】

相關(guān)期刊論文 前10條

1 樊孝忠,李宏喬,李良富,葉江;銀行領(lǐng)域漢語自動(dòng)問答系統(tǒng)BAQS的研究與實(shí)現(xiàn)[J];北京理工大學(xué)學(xué)報(bào);2004年06期

2 夏天,樊孝忠,劉林,駱正華;基于ALICE的漢語自然語言接口[J];北京理工大學(xué)學(xué)報(bào);2004年10期

3 呂學(xué)強(qiáng),任飛亮,黃志丹,姚天順;句子相似模型和最相似句子查找算法[J];東北大學(xué)學(xué)報(bào);2003年06期

4 劉挺;馬金山;;漢語自動(dòng)句法分析的理論與方法[J];當(dāng)代語言學(xué);2009年02期

5 王樹西,劉群,白碩;一個(gè)人物關(guān)系問答的專家系統(tǒng)[J];廣西師范大學(xué)學(xué)報(bào)(自然科學(xué)版);2003年01期

6 秦兵,劉挺,王洋,鄭實(shí)福,李生;基于常問問題集的中文問答系統(tǒng)研究[J];哈爾濱工業(yè)大學(xué)學(xué)報(bào);2003年10期

7 趙軍,黃昌寧;結(jié)合句法組成模板識(shí)別漢語基本名詞短語的概率模型[J];計(jì)算機(jī)研究與發(fā)展;1999年11期

8 李素建,劉群,白碩;統(tǒng)計(jì)和規(guī)則相結(jié)合的漢語組塊分析[J];計(jì)算機(jī)研究與發(fā)展;2002年04期

9 李鑫;黃萱菁;吳立德;;基于錯(cuò)誤驅(qū)動(dòng)算法組合分類器及其在問題分類中的應(yīng)用[J];計(jì)算機(jī)研究與發(fā)展;2008年03期

10 李素建;基于語義計(jì)算的語句相關(guān)度研究[J];計(jì)算機(jī)工程與應(yīng)用;2002年07期

,

本文編號(hào):1555388

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/1555388.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶30a33***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com
免费黄色一区二区三区| 精品一区二区三区不卡少妇av| 护士又紧又深又湿又爽的视频| 99久久成人精品国产免费| 国产一区日韩二区欧美| 亚洲中文字幕剧情在线播放| 亚洲品质一区二区三区| 亚洲中文字幕亲近伦片| 初尝人妻少妇中文字幕在线| 草草视频精品在线观看| 色综合久久中文综合网| 好吊妞视频这里有精品| 大胆裸体写真一区二区| 熟女少妇一区二区三区蜜桃| 久久精品免费视看国产成人| 情一色一区二区三区四| 极品少妇一区二区三区精品视频| 插进她的身体里在线观看骚| 粗暴蹂躏中文一区二区三区| 日韩一级免费中文字幕视频| 欧洲日本亚洲一区二区| 91欧美日韩一区人妻少妇| 暴力性生活在线免费视频| 办公室丝袜高跟秘书国产| 国产精品一区二区高潮| 国产中文另类天堂二区| 欧美人妻免费一区二区三区| 亚洲中文在线男人的天堂| 日本午夜精品视频在线观看| 国产成人午夜av一区二区| 亚洲国产精品一区二区毛片| 五月激情婷婷丁香六月网| 一区中文字幕人妻少妇| 免费黄片视频美女一区| 欧美亚洲三级视频在线观看| 麻豆剧果冻传媒一二三区| 懂色一区二区三区四区| 一区二区在线激情视频| 国产精品不卡高清在线观看| 久久国产亚洲精品成人| 香蕉久久夜色精品国产尤物|