中文問(wèn)答系統(tǒng)關(guān)鍵技術(shù)研究
發(fā)布時(shí)間:2018-08-25 11:46
【摘要】:問(wèn)答系統(tǒng)是融合了自然語(yǔ)言處理技術(shù)和信息檢索技術(shù)于一身的新一代搜索引擎,其有著非常重要的應(yīng)用前景,是自然語(yǔ)言處理領(lǐng)域和信息檢索領(lǐng)域的一個(gè)重要分支,已吸引大量科學(xué)研究人員的興趣。本文圍繞問(wèn)答系統(tǒng)實(shí)現(xiàn)過(guò)程中中文分詞,問(wèn)題分類,,問(wèn)題關(guān)鍵詞抽取,候選答案集的構(gòu)建等關(guān)鍵技術(shù)進(jìn)行了一系列的研究與探討,在以下方面做出了一些嘗試性研究成果: (1)實(shí)驗(yàn)生成依存骨架規(guī)則庫(kù),并且利用條件隨機(jī)場(chǎng)進(jìn)行問(wèn)題焦點(diǎn)詞提取方法。問(wèn)題分類模塊結(jié)合了規(guī)則與統(tǒng)計(jì)兩種方法的優(yōu)點(diǎn),對(duì)未知類別的問(wèn)題依次進(jìn)行疑問(wèn)詞-類別,疑問(wèn)詞+焦點(diǎn)詞-類別和依存骨架規(guī)則庫(kù)進(jìn)行分類,對(duì)于不能用規(guī)則庫(kù)解決的問(wèn)題則通過(guò)貝葉斯模型進(jìn)行確定。在小規(guī)模語(yǔ)料上取得了76%的分類準(zhǔn)確率。實(shí)驗(yàn)結(jié)果說(shuō)明疑問(wèn)詞-詞性三元組規(guī)則的利用以及焦點(diǎn)詞提取方法的改進(jìn)對(duì)問(wèn)題分類具有積極的效果。 (2)在實(shí)驗(yàn)中利用條件隨機(jī)場(chǎng)模型進(jìn)行關(guān)鍵詞提取的方法。通過(guò)利用條件隨機(jī)場(chǎng)模型,在學(xué)習(xí)了已標(biāo)注關(guān)鍵詞的問(wèn)題語(yǔ)料庫(kù)基礎(chǔ)上對(duì)測(cè)試問(wèn)題集進(jìn)行標(biāo)注。在小規(guī)模的問(wèn)題測(cè)試語(yǔ)料上取得了較高的正確率。 (3)對(duì)計(jì)算候選句子分值的公式進(jìn)行了修改。在候選句子排序中考慮了同義關(guān)鍵詞位置相似度,通過(guò)計(jì)算用戶問(wèn)題和候選句子的同義關(guān)鍵詞相似度、同義關(guān)鍵詞位置相似度和句子長(zhǎng)度相似度三個(gè)句子結(jié)構(gòu)信息,從而對(duì)候選句子進(jìn)行排序。實(shí)驗(yàn)結(jié)果表明這種計(jì)算方法對(duì)人物、地點(diǎn)、數(shù)字和時(shí)間等事實(shí)性問(wèn)題類型效果較好。
[Abstract]:Question Answering System is a new generation of search engine which combines natural language processing technology and information retrieval technology. It has a very important application prospect. It is an important branch in the field of natural language processing and information retrieval. It has attracted the interest of a large number of scientific researchers. The key technologies such as word segmentation, question classification, question keyword extraction, candidate answer set construction and so on have been studied and discussed in a series of ways. Some tentative research results have been made in the following aspects:
(1) The dependency skeleton rule base is generated experimentally, and the problem focus words are extracted by conditional random fields. The problem classification module combines the advantages of rule and statistic methods to classify the unknown categories of problems in turn into interrogative words-categories, interrogative words+focus words-categories and dependency skeleton rule base. The problem solved by the database is determined by the Bayesian model, and the classification accuracy is 76% in the small-scale corpus. The experimental results show that the use of the interrogative-part-of-speech ternary rule and the improvement of the focus word extraction method have a positive effect on the problem classification.
(2) Conditional random field model is used to extract keywords in the experiment. The test question set is labeled on the basis of the problem corpus with labeled keywords by using the conditional random field model.
(3) The formula for calculating candidate sentence scores is modified. The position similarity of synonymous keywords is considered in candidate sentence ranking. The candidate sentences are sorted by computing the similarity of synonymous keywords between user questions and candidate sentences, the position similarity of synonymous keywords and the length similarity of sentences. The experimental results show that this method is effective for factual problems such as people, places, numbers and time.
【學(xué)位授予單位】:寧波大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2012
【分類號(hào)】:TP391.1
本文編號(hào):2202788
[Abstract]:Question Answering System is a new generation of search engine which combines natural language processing technology and information retrieval technology. It has a very important application prospect. It is an important branch in the field of natural language processing and information retrieval. It has attracted the interest of a large number of scientific researchers. The key technologies such as word segmentation, question classification, question keyword extraction, candidate answer set construction and so on have been studied and discussed in a series of ways. Some tentative research results have been made in the following aspects:
(1) The dependency skeleton rule base is generated experimentally, and the problem focus words are extracted by conditional random fields. The problem classification module combines the advantages of rule and statistic methods to classify the unknown categories of problems in turn into interrogative words-categories, interrogative words+focus words-categories and dependency skeleton rule base. The problem solved by the database is determined by the Bayesian model, and the classification accuracy is 76% in the small-scale corpus. The experimental results show that the use of the interrogative-part-of-speech ternary rule and the improvement of the focus word extraction method have a positive effect on the problem classification.
(2) Conditional random field model is used to extract keywords in the experiment. The test question set is labeled on the basis of the problem corpus with labeled keywords by using the conditional random field model.
(3) The formula for calculating candidate sentence scores is modified. The position similarity of synonymous keywords is considered in candidate sentence ranking. The candidate sentences are sorted by computing the similarity of synonymous keywords between user questions and candidate sentences, the position similarity of synonymous keywords and the length similarity of sentences. The experimental results show that this method is effective for factual problems such as people, places, numbers and time.
【學(xué)位授予單位】:寧波大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2012
【分類號(hào)】:TP391.1
【引證文獻(xiàn)】
相關(guān)碩士學(xué)位論文 前2條
1 梁曉月;中文問(wèn)答系統(tǒng)中問(wèn)題分類相關(guān)技術(shù)的研究[D];遼寧科技大學(xué);2015年
2 虞勇勇;頻繁依存子樹(shù)模式在問(wèn)題分類中的應(yīng)用研究[D];合肥工業(yè)大學(xué);2014年
本文編號(hào):2202788
本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2202788.html
最近更新
教材專著