基于遷移學習的中文問句分類方法研究

發(fā)布時間：2019-03-26 15:33

【摘要】：問答系統(tǒng)是新一代的搜索引擎,它可以更好地滿足用戶的查詢要求,更精確地檢索出用戶所想要的答案。問句分類是問答系統(tǒng)的關鍵部分,它的分類結(jié)果直接影響答案抽取的準確率。通常問句分類模型的構(gòu)建是通過標記一定規(guī)模的語料訓練獲得,然而,在不同領域構(gòu)建問句分類模型就必須在每個領域都要標記一定樣本,因此樣本標記代價昂貴。由于不用領域之間可能存在一定的關聯(lián)性,因此,本文利用遷移學習的思想,針對不同領域問句分類特點,研究不同領域問句分類中的特征選取、問句分類模型遷移方法。主要完成以下特色工作： 1、根據(jù)領域之間的相關性,基于領域間特征互信息構(gòu)建了不同領域問句特征空間。首先選取不同問句領域中訓練語料的詞頻較高的詞以及問句中的疑問詞、主謂賓等詞匯,分別作為各自問句領域分類特征的特征詞。其次使用互信息計算源領域特征空間的特征詞與目標領域特征詞之間的相關性,定義閥值,選取相關性大的特征詞分別作為各自領域特征空間的特征詞。最后,以詞匯語義相似度方法獲取各個領域的問句特征空間特征值。 2、在中文問句領域分類移植方面,提出了一種基于特征映射的問句分類遷移學習方法。該方法首先統(tǒng)計源領域和目標領域的公共特征詞,并采用詞語相似度計算挖掘領域間相似的特征詞。然后改變源領域的每一個問句特征向量,使其特征詞改變?yōu)槟繕祟I域共現(xiàn)或者相似特征詞。接著使用改進的聚類算法,把源領域問句實例映射到目標領域各個類別中。最后使用支持向量機的分類算法進行分類模型的訓練。在源領域為金融領域、目標領域為云南旅游領域進行了中文問句分類領遷移實驗,結(jié)果表明借助源領域已標記的樣本大大提高了目標領域的分類準確率。 3、設計并實現(xiàn)了基于特征映射遷移學習的中文問句分類原型系統(tǒng)。
[Abstract]:Q & A system is a new generation of search engine, it can better meet the user's query requirements, more accurate retrieval of the user's desired answers. Question classification is a key part of question answering system, and its classification results directly affect the accuracy of answer extraction. The construction of question classification model is usually obtained through the training of tagging a certain scale of corpus. However, the construction of question classification model in different fields must mark certain samples in each domain, so the sample marking is expensive. Because there may be some relevance between different domains, this paper makes use of the idea of transfer learning to study the feature selection of question classification in different domains and the transfer method of question classification model according to the characteristics of question classification in different domains. The main work is as follows: 1. According to the correlation between domains, the feature spaces of different domains are constructed based on the mutual information of inter-domain features. Firstly, the words with higher frequency of training corpus in different question fields, interrogative words, subject-predicate objects and other words in question are selected as the feature words of the classification features of each question field respectively. Secondly, we use mutual information to calculate the correlation between the feature words in the source domain and the feature words in the target domain, define the threshold, and select the feature words with high correlation as the feature words in the feature space of each domain. Finally, the lexical semantic similarity method is used to obtain the feature values of question feature space in each domain. 2. In the aspect of Chinese question domain classification transplantation, this paper proposes a learning method of question classification transfer based on feature mapping. Firstly, the common feature words in the source domain and the target domain are counted, and the similarity of words is used to calculate and mine the similar feature words between the domains. Then each question feature vector in the source domain is changed into a co-occurrence or similar feature word in the target domain. Then we use the improved clustering algorithm to map the source domain question instance to each category of the target domain. Finally, the classification algorithm of support vector machine is used to train the classification model. In the source domain is the financial domain and the target domain is the Yunnan tourism domain Chinese question classification transfer experiment is carried out. The results show that the labeled samples in the source domain greatly improve the classification accuracy of the target domain. Thirdly, the prototype system of Chinese question classification based on feature mapping transfer learning is designed and implemented.
【學位授予單位】：昆明理工大學
【學位級別】：碩士
【學位授予年份】：2012
【分類號】：TP391.1

【參考文獻】

相關期刊論文前5條

1 黃發(fā)良,鐘智;用于分類的支持向量機[J];廣西師范學院學報(自然科學版);2004年03期

2 劉偉;張化祥;;數(shù)據(jù)集動態(tài)重構(gòu)的集成遷移學習[J];計算機工程與應用;2010年12期

3 鄭實福,劉挺,秦兵,李生;自動問答綜述[J];中文信息學報;2002年06期

4 張宇,劉挺,文勖;基于改進貝葉斯模型的問題分類[J];中文信息學報;2005年02期

5 林昌,康泰兆;基于自組織特征映射的矢量量化方法[J];南京理工大學學報;1999年05期

，

本文編號：2447687

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2447687.html

上一篇：OpenURL技術發(fā)展及創(chuàng)新應用研究
下一篇：基于特征路徑的XML文檔變化檢測算法

論文發(fā)表

·知網(wǎng)|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于遷移學習的中文問句分類方法研究