基于遷移學(xué)習(xí)的中文問句分類方法研究
發(fā)布時(shí)間:2019-03-26 15:33
【摘要】:問答系統(tǒng)是新一代的搜索引擎,它可以更好地滿足用戶的查詢要求,更精確地檢索出用戶所想要的答案。問句分類是問答系統(tǒng)的關(guān)鍵部分,它的分類結(jié)果直接影響答案抽取的準(zhǔn)確率。通常問句分類模型的構(gòu)建是通過標(biāo)記一定規(guī)模的語(yǔ)料訓(xùn)練獲得,然而,在不同領(lǐng)域構(gòu)建問句分類模型就必須在每個(gè)領(lǐng)域都要標(biāo)記一定樣本,因此樣本標(biāo)記代價(jià)昂貴。由于不用領(lǐng)域之間可能存在一定的關(guān)聯(lián)性,因此,本文利用遷移學(xué)習(xí)的思想,針對(duì)不同領(lǐng)域問句分類特點(diǎn),研究不同領(lǐng)域問句分類中的特征選取、問句分類模型遷移方法。主要完成以下特色工作: 1、根據(jù)領(lǐng)域之間的相關(guān)性,基于領(lǐng)域間特征互信息構(gòu)建了不同領(lǐng)域問句特征空間。首先選取不同問句領(lǐng)域中訓(xùn)練語(yǔ)料的詞頻較高的詞以及問句中的疑問詞、主謂賓等詞匯,分別作為各自問句領(lǐng)域分類特征的特征詞。其次使用互信息計(jì)算源領(lǐng)域特征空間的特征詞與目標(biāo)領(lǐng)域特征詞之間的相關(guān)性,定義閥值,選取相關(guān)性大的特征詞分別作為各自領(lǐng)域特征空間的特征詞。最后,以詞匯語(yǔ)義相似度方法獲取各個(gè)領(lǐng)域的問句特征空間特征值。 2、在中文問句領(lǐng)域分類移植方面,提出了一種基于特征映射的問句分類遷移學(xué)習(xí)方法。該方法首先統(tǒng)計(jì)源領(lǐng)域和目標(biāo)領(lǐng)域的公共特征詞,并采用詞語(yǔ)相似度計(jì)算挖掘領(lǐng)域間相似的特征詞。然后改變?cè)搭I(lǐng)域的每一個(gè)問句特征向量,使其特征詞改變?yōu)槟繕?biāo)領(lǐng)域共現(xiàn)或者相似特征詞。接著使用改進(jìn)的聚類算法,把源領(lǐng)域問句實(shí)例映射到目標(biāo)領(lǐng)域各個(gè)類別中。最后使用支持向量機(jī)的分類算法進(jìn)行分類模型的訓(xùn)練。在源領(lǐng)域?yàn)榻鹑陬I(lǐng)域、目標(biāo)領(lǐng)域?yàn)樵颇下糜晤I(lǐng)域進(jìn)行了中文問句分類領(lǐng)遷移實(shí)驗(yàn),結(jié)果表明借助源領(lǐng)域已標(biāo)記的樣本大大提高了目標(biāo)領(lǐng)域的分類準(zhǔn)確率。 3、設(shè)計(jì)并實(shí)現(xiàn)了基于特征映射遷移學(xué)習(xí)的中文問句分類原型系統(tǒng)。
[Abstract]:Q & A system is a new generation of search engine, it can better meet the user's query requirements, more accurate retrieval of the user's desired answers. Question classification is a key part of question answering system, and its classification results directly affect the accuracy of answer extraction. The construction of question classification model is usually obtained through the training of tagging a certain scale of corpus. However, the construction of question classification model in different fields must mark certain samples in each domain, so the sample marking is expensive. Because there may be some relevance between different domains, this paper makes use of the idea of transfer learning to study the feature selection of question classification in different domains and the transfer method of question classification model according to the characteristics of question classification in different domains. The main work is as follows: 1. According to the correlation between domains, the feature spaces of different domains are constructed based on the mutual information of inter-domain features. Firstly, the words with higher frequency of training corpus in different question fields, interrogative words, subject-predicate objects and other words in question are selected as the feature words of the classification features of each question field respectively. Secondly, we use mutual information to calculate the correlation between the feature words in the source domain and the feature words in the target domain, define the threshold, and select the feature words with high correlation as the feature words in the feature space of each domain. Finally, the lexical semantic similarity method is used to obtain the feature values of question feature space in each domain. 2. In the aspect of Chinese question domain classification transplantation, this paper proposes a learning method of question classification transfer based on feature mapping. Firstly, the common feature words in the source domain and the target domain are counted, and the similarity of words is used to calculate and mine the similar feature words between the domains. Then each question feature vector in the source domain is changed into a co-occurrence or similar feature word in the target domain. Then we use the improved clustering algorithm to map the source domain question instance to each category of the target domain. Finally, the classification algorithm of support vector machine is used to train the classification model. In the source domain is the financial domain and the target domain is the Yunnan tourism domain Chinese question classification transfer experiment is carried out. The results show that the labeled samples in the source domain greatly improve the classification accuracy of the target domain. Thirdly, the prototype system of Chinese question classification based on feature mapping transfer learning is designed and implemented.
【學(xué)位授予單位】:昆明理工大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2012
【分類號(hào)】:TP391.1
本文編號(hào):2447687
[Abstract]:Q & A system is a new generation of search engine, it can better meet the user's query requirements, more accurate retrieval of the user's desired answers. Question classification is a key part of question answering system, and its classification results directly affect the accuracy of answer extraction. The construction of question classification model is usually obtained through the training of tagging a certain scale of corpus. However, the construction of question classification model in different fields must mark certain samples in each domain, so the sample marking is expensive. Because there may be some relevance between different domains, this paper makes use of the idea of transfer learning to study the feature selection of question classification in different domains and the transfer method of question classification model according to the characteristics of question classification in different domains. The main work is as follows: 1. According to the correlation between domains, the feature spaces of different domains are constructed based on the mutual information of inter-domain features. Firstly, the words with higher frequency of training corpus in different question fields, interrogative words, subject-predicate objects and other words in question are selected as the feature words of the classification features of each question field respectively. Secondly, we use mutual information to calculate the correlation between the feature words in the source domain and the feature words in the target domain, define the threshold, and select the feature words with high correlation as the feature words in the feature space of each domain. Finally, the lexical semantic similarity method is used to obtain the feature values of question feature space in each domain. 2. In the aspect of Chinese question domain classification transplantation, this paper proposes a learning method of question classification transfer based on feature mapping. Firstly, the common feature words in the source domain and the target domain are counted, and the similarity of words is used to calculate and mine the similar feature words between the domains. Then each question feature vector in the source domain is changed into a co-occurrence or similar feature word in the target domain. Then we use the improved clustering algorithm to map the source domain question instance to each category of the target domain. Finally, the classification algorithm of support vector machine is used to train the classification model. In the source domain is the financial domain and the target domain is the Yunnan tourism domain Chinese question classification transfer experiment is carried out. The results show that the labeled samples in the source domain greatly improve the classification accuracy of the target domain. Thirdly, the prototype system of Chinese question classification based on feature mapping transfer learning is designed and implemented.
【學(xué)位授予單位】:昆明理工大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2012
【分類號(hào)】:TP391.1
【參考文獻(xiàn)】
相關(guān)期刊論文 前5條
1 黃發(fā)良,鐘智;用于分類的支持向量機(jī)[J];廣西師范學(xué)院學(xué)報(bào)(自然科學(xué)版);2004年03期
2 劉偉;張化祥;;數(shù)據(jù)集動(dòng)態(tài)重構(gòu)的集成遷移學(xué)習(xí)[J];計(jì)算機(jī)工程與應(yīng)用;2010年12期
3 鄭實(shí)福,劉挺,秦兵,李生;自動(dòng)問答綜述[J];中文信息學(xué)報(bào);2002年06期
4 張宇,劉挺,文勖;基于改進(jìn)貝葉斯模型的問題分類[J];中文信息學(xué)報(bào);2005年02期
5 林昌,康泰兆;基于自組織特征映射的矢量量化方法[J];南京理工大學(xué)學(xué)報(bào);1999年05期
,本文編號(hào):2447687
本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2447687.html
最近更新
教材專著