天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁(yè) > 科技論文 > 搜索引擎論文 >

統(tǒng)計(jì)中文問(wèn)句分類(lèi)研究

發(fā)布時(shí)間:2018-03-12 11:43

  本文選題:問(wèn)答系統(tǒng) 切入點(diǎn):中文問(wèn)句分類(lèi) 出處:《昆明理工大學(xué)》2012年碩士論文 論文類(lèi)型:學(xué)位論文


【摘要】:問(wèn)答系統(tǒng)(Question Answer System)能為人們提供自然語(yǔ)言的問(wèn)句方式的提問(wèn),直接返回答案,而不是大量網(wǎng)頁(yè)。相對(duì)于傳統(tǒng)搜索引擎來(lái)說(shuō),問(wèn)答系統(tǒng)能夠更好地表達(dá)用戶的需求,適應(yīng)用戶的習(xí)慣,回答信息也更準(zhǔn)確,更快捷,更高效,其克服傳統(tǒng)搜索引擎存在的缺陷,是當(dāng)前研究的熱點(diǎn)問(wèn)題。問(wèn)句分類(lèi)是問(wèn)答系統(tǒng)的一個(gè)重要組成部分,它能為問(wèn)答系統(tǒng)的答案抽取環(huán)節(jié)提供答案的選取策略,所以分類(lèi)的準(zhǔn)確性直接影響問(wèn)答系統(tǒng)的性能。本文對(duì)問(wèn)句分類(lèi)中特征選取與降維、問(wèn)句屬性核函數(shù)等方面開(kāi)展了系列的研究和探討。主要成果如下: 1、針對(duì)問(wèn)句分類(lèi)過(guò)程中詞袋方式選取特征所面臨的特征空間維數(shù)過(guò)高以及數(shù)據(jù)稀疏的問(wèn)題,本文提出了一種結(jié)合詞語(yǔ)相關(guān)性與流形學(xué)習(xí)的特征提取方法,該方法首先選取訓(xùn)練語(yǔ)料庫(kù)中文檔頻率(DF)值高的詞作為分類(lèi)特征的屬性維,其次以詞匯語(yǔ)義相似度方法獲取問(wèn)句特征空間特征值,再次使用有監(jiān)督局部線性嵌入算法對(duì)特征空間進(jìn)行非線性降維,從而獲得問(wèn)句分類(lèi)特征向量,最后使用支持向量機(jī)建立問(wèn)句分類(lèi)模型,在旅游領(lǐng)域7000多中文文句上的實(shí)驗(yàn)結(jié)果表明。本文所提方法能夠有效解決特征空間維數(shù)過(guò)高與數(shù)據(jù)稀疏問(wèn)題。 2、在使用支持向量機(jī)的標(biāo)準(zhǔn)核函數(shù)進(jìn)行問(wèn)句分類(lèi)過(guò)程中,問(wèn)句的內(nèi)在結(jié)構(gòu)常常被忽略。針對(duì)以上問(wèn)題,本文提出了一種結(jié)合問(wèn)句依存關(guān)系與詞性的屬性核函數(shù)方法,該方法首先提取問(wèn)句中的詞、詞性、核心詞依存關(guān)系、疑問(wèn)詞依存關(guān)系等特征,其次通過(guò)問(wèn)句中的詞的依存關(guān)系,詞性以及共有的依存路徑進(jìn)行計(jì)算核函數(shù)的值,最后采用SMO算法優(yōu)化求解。在旅游領(lǐng)域中文問(wèn)句進(jìn)行了不同核函數(shù)的中文問(wèn)句分類(lèi)對(duì)實(shí)驗(yàn),結(jié)果表明提出的核函數(shù)能夠有效利用問(wèn)句內(nèi)在依存結(jié)構(gòu),提高模型的訓(xùn)練速率以及分類(lèi)準(zhǔn)確率。 3、采用本文中所提的算法,分別設(shè)計(jì)并實(shí)現(xiàn)了結(jié)合流形學(xué)習(xí)的問(wèn)句分類(lèi)系統(tǒng),基于問(wèn)句屬性核函數(shù)的問(wèn)句分類(lèi)系統(tǒng)。
[Abstract]:Question Answer system can provide people with questions in natural language and return answers directly instead of a lot of web pages. Compared with traditional search engines, question answering system can better express the needs of users. Adapting to the habits of users, answering information is more accurate, faster and more efficient. It overcomes the shortcomings of traditional search engines and is a hot issue in current research. Question classification is an important part of question answering system. It can provide the answer selection strategy for the question answering system, so the accuracy of the classification directly affects the performance of the question answering system. A series of researches and discussions have been carried out on the attribute kernel function of question sentence. The main results are as follows:. 1. In order to solve the problem of high dimension of feature space and sparse data, a feature extraction method combining word correlation and manifold learning is proposed in this paper. The method firstly selects the words with high document frequency and DFV value in the training corpus as the attribute dimension of the classification feature, and then obtains the feature space feature value of question sentence by the method of lexical semantic similarity. Thirdly, the linear embedding algorithm is used to reduce the dimension of the feature space, and then the feature vector of question sentence classification is obtained. Finally, the question sentence classification model is established by using support vector machine (SVM). The experimental results on more than 7000 Chinese sentences in the tourism field show that the proposed method can effectively solve the problem of high dimension of feature space and sparse data. 2. In the process of classifying question sentences with support vector machine (SVM) standard kernel function, the internal structure of question sentences is often ignored. In view of the above problems, a method of attribute kernel function combining question dependency and part of speech is proposed in this paper. The method firstly extracts the features of words, parts of speech, core words and interrogative words in question sentences, and then calculates the values of kernel functions through the dependency of words, parts of speech and common dependency paths in question sentences. Finally, the SMO algorithm is used to optimize the solution. The Chinese question classification experiments with different kernel functions are carried out in the tourism field. The results show that the proposed kernel function can effectively utilize the internal dependency structure of the question. The training rate and classification accuracy of the model are improved. 3. Using the algorithm proposed in this paper, the question sentence classification system combined with manifold learning and the question sentence classification system based on question attribute kernel function are designed and implemented respectively.
【學(xué)位授予單位】:昆明理工大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2012
【分類(lèi)號(hào)】:H146.3

【相似文獻(xiàn)】

相關(guān)期刊論文 前10條

1 王新方;;none,no one與nothing用法知多少[J];中學(xué)英語(yǔ)之友(下旬);2010年08期

2 劉利;;why問(wèn)句的“問(wèn)”外之意[J];英語(yǔ)知識(shí);1999年09期

3 謝玉潔;;一組形似而意異的問(wèn)句?[J];中學(xué)生英語(yǔ)(初中版);2006年16期

4 劉連營(yíng);陸吉鳳;;英語(yǔ)課堂問(wèn)句批評(píng)性分析——高一英語(yǔ)課堂個(gè)案研究[J];湖北經(jīng)濟(jì)學(xué)院學(xué)報(bào)(人文社會(huì)科學(xué)版);2009年10期

5 樂(lè)若魚(yú);;“你的中文真好”[J];半月選讀;2009年07期

6 阮繼;中英文標(biāo)點(diǎn)符號(hào)的使用比較[J];中山大學(xué)學(xué)報(bào)論叢;2002年02期

7 八八;;別欺負(fù)老外不懂中文[J];劍南文學(xué)(經(jīng)典閱讀);2008年09期

8 冉正萬(wàn);;飛鼠[J];廈門(mén)文學(xué);2006年07期

9 佚名;欺負(fù)老外不懂中文的尷尬[J];世界中學(xué)生文摘;2005年06期

10 丁紅艷;;一組“迷人”的問(wèn)句[J];中學(xué)英語(yǔ)園地(初一版);2007年11期

相關(guān)會(huì)議論文 前10條

1 王中卿;李壽山;朱巧明;李培峰;周?chē)?guó)棟;;基于不平衡數(shù)據(jù)的中文情感分類(lèi)[A];中國(guó)計(jì)算語(yǔ)言學(xué)研究前沿進(jìn)展(2009-2011)[C];2011年

2 張偉男;張宇;劉挺;;基于中心理論的中文對(duì)話省略恢復(fù)研究[A];第六屆全國(guó)信息檢索學(xué)術(shù)會(huì)議論文集[C];2010年

3 王佳;;對(duì)中文屋思想實(shí)驗(yàn)四個(gè)主要版本的考察[A];第三屆全國(guó)科技哲學(xué)暨交叉學(xué)科研究生論壇文集[C];2010年

4 仇偉;黃高輝;姚天f ;;基于HowNet的漢語(yǔ)情感問(wèn)句二層分類(lèi)[A];第六屆全國(guó)信息檢索學(xué)術(shù)會(huì)議論文集[C];2010年

5 金朝;蔣宗禮;;中文機(jī)構(gòu)名的識(shí)別討論[A];2011高等職業(yè)教育電子信息類(lèi)專(zhuān)業(yè)學(xué)術(shù)暨教學(xué)研討會(huì)論文集[C];2011年

6 劉立;余正濤;王蒙;毛存禮;郭劍毅;;結(jié)合詞相關(guān)特征與流行學(xué)習(xí)的中文問(wèn)句分類(lèi)[A];第六屆全國(guó)信息檢索學(xué)術(shù)會(huì)議論文集[C];2010年

7 吳法洲;蘇昊;周明;李春平;;利用英文搜索日志建立中文新詞同義詞詞表[A];第二十三屆中國(guó)數(shù)據(jù)庫(kù)學(xué)術(shù)會(huì)議論文集(技術(shù)報(bào)告篇)[C];2006年

8 計(jì)峰;邱錫鵬;黃萱菁;;中文不確定性句子的識(shí)別研究[A];第六屆全國(guó)信息檢索學(xué)術(shù)會(huì)議論文集[C];2010年

9 周小甲;李昊e,

本文編號(hào):1601406


資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/1601406.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶be15e***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com