天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁(yè) > 科技論文 > 搜索引擎論文 >

基于遷移學(xué)習(xí)的中文問句分類方法研究

發(fā)布時(shí)間:2019-03-26 15:33
【摘要】:問答系統(tǒng)是新一代的搜索引擎,它可以更好地滿足用戶的查詢要求,更精確地檢索出用戶所想要的答案。問句分類是問答系統(tǒng)的關(guān)鍵部分,它的分類結(jié)果直接影響答案抽取的準(zhǔn)確率。通常問句分類模型的構(gòu)建是通過標(biāo)記一定規(guī)模的語(yǔ)料訓(xùn)練獲得,然而,在不同領(lǐng)域構(gòu)建問句分類模型就必須在每個(gè)領(lǐng)域都要標(biāo)記一定樣本,因此樣本標(biāo)記代價(jià)昂貴。由于不用領(lǐng)域之間可能存在一定的關(guān)聯(lián)性,因此,本文利用遷移學(xué)習(xí)的思想,針對(duì)不同領(lǐng)域問句分類特點(diǎn),研究不同領(lǐng)域問句分類中的特征選取、問句分類模型遷移方法。主要完成以下特色工作: 1、根據(jù)領(lǐng)域之間的相關(guān)性,基于領(lǐng)域間特征互信息構(gòu)建了不同領(lǐng)域問句特征空間。首先選取不同問句領(lǐng)域中訓(xùn)練語(yǔ)料的詞頻較高的詞以及問句中的疑問詞、主謂賓等詞匯,分別作為各自問句領(lǐng)域分類特征的特征詞。其次使用互信息計(jì)算源領(lǐng)域特征空間的特征詞與目標(biāo)領(lǐng)域特征詞之間的相關(guān)性,定義閥值,選取相關(guān)性大的特征詞分別作為各自領(lǐng)域特征空間的特征詞。最后,以詞匯語(yǔ)義相似度方法獲取各個(gè)領(lǐng)域的問句特征空間特征值。 2、在中文問句領(lǐng)域分類移植方面,提出了一種基于特征映射的問句分類遷移學(xué)習(xí)方法。該方法首先統(tǒng)計(jì)源領(lǐng)域和目標(biāo)領(lǐng)域的公共特征詞,并采用詞語(yǔ)相似度計(jì)算挖掘領(lǐng)域間相似的特征詞。然后改變?cè)搭I(lǐng)域的每一個(gè)問句特征向量,使其特征詞改變?yōu)槟繕?biāo)領(lǐng)域共現(xiàn)或者相似特征詞。接著使用改進(jìn)的聚類算法,把源領(lǐng)域問句實(shí)例映射到目標(biāo)領(lǐng)域各個(gè)類別中。最后使用支持向量機(jī)的分類算法進(jìn)行分類模型的訓(xùn)練。在源領(lǐng)域?yàn)榻鹑陬I(lǐng)域、目標(biāo)領(lǐng)域?yàn)樵颇下糜晤I(lǐng)域進(jìn)行了中文問句分類領(lǐng)遷移實(shí)驗(yàn),結(jié)果表明借助源領(lǐng)域已標(biāo)記的樣本大大提高了目標(biāo)領(lǐng)域的分類準(zhǔn)確率。 3、設(shè)計(jì)并實(shí)現(xiàn)了基于特征映射遷移學(xué)習(xí)的中文問句分類原型系統(tǒng)。
[Abstract]:Q & A system is a new generation of search engine, it can better meet the user's query requirements, more accurate retrieval of the user's desired answers. Question classification is a key part of question answering system, and its classification results directly affect the accuracy of answer extraction. The construction of question classification model is usually obtained through the training of tagging a certain scale of corpus. However, the construction of question classification model in different fields must mark certain samples in each domain, so the sample marking is expensive. Because there may be some relevance between different domains, this paper makes use of the idea of transfer learning to study the feature selection of question classification in different domains and the transfer method of question classification model according to the characteristics of question classification in different domains. The main work is as follows: 1. According to the correlation between domains, the feature spaces of different domains are constructed based on the mutual information of inter-domain features. Firstly, the words with higher frequency of training corpus in different question fields, interrogative words, subject-predicate objects and other words in question are selected as the feature words of the classification features of each question field respectively. Secondly, we use mutual information to calculate the correlation between the feature words in the source domain and the feature words in the target domain, define the threshold, and select the feature words with high correlation as the feature words in the feature space of each domain. Finally, the lexical semantic similarity method is used to obtain the feature values of question feature space in each domain. 2. In the aspect of Chinese question domain classification transplantation, this paper proposes a learning method of question classification transfer based on feature mapping. Firstly, the common feature words in the source domain and the target domain are counted, and the similarity of words is used to calculate and mine the similar feature words between the domains. Then each question feature vector in the source domain is changed into a co-occurrence or similar feature word in the target domain. Then we use the improved clustering algorithm to map the source domain question instance to each category of the target domain. Finally, the classification algorithm of support vector machine is used to train the classification model. In the source domain is the financial domain and the target domain is the Yunnan tourism domain Chinese question classification transfer experiment is carried out. The results show that the labeled samples in the source domain greatly improve the classification accuracy of the target domain. Thirdly, the prototype system of Chinese question classification based on feature mapping transfer learning is designed and implemented.
【學(xué)位授予單位】:昆明理工大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2012
【分類號(hào)】:TP391.1

【參考文獻(xiàn)】

相關(guān)期刊論文 前5條

1 黃發(fā)良,鐘智;用于分類的支持向量機(jī)[J];廣西師范學(xué)院學(xué)報(bào)(自然科學(xué)版);2004年03期

2 劉偉;張化祥;;數(shù)據(jù)集動(dòng)態(tài)重構(gòu)的集成遷移學(xué)習(xí)[J];計(jì)算機(jī)工程與應(yīng)用;2010年12期

3 鄭實(shí)福,劉挺,秦兵,李生;自動(dòng)問答綜述[J];中文信息學(xué)報(bào);2002年06期

4 張宇,劉挺,文勖;基于改進(jìn)貝葉斯模型的問題分類[J];中文信息學(xué)報(bào);2005年02期

5 林昌,康泰兆;基于自組織特征映射的矢量量化方法[J];南京理工大學(xué)學(xué)報(bào);1999年05期

,

本文編號(hào):2447687

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2447687.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶7a90c***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com
深夜日本福利在线观看| 国产一区二区熟女精品免费| 欧美在线观看视频免费不卡| 亚洲综合色在线视频香蕉视频| 粉嫩内射av一区二区| 99热九九在线中文字幕| 欧美区一区二在线播放| 日韩在线一区中文字幕| 精品国自产拍天天青青草原| 国产视频一区二区三区四区| 国产不卡的视频在线观看| 日本久久精品在线观看| 97人妻精品免费一区二区| 国产一区国产二区在线视频| 九九热精彩视频在线免费| 日韩精品一级片免费看| 污污黄黄的成年亚洲毛片| 日本深夜福利视频在线| 日韩精品中文字幕亚洲| 精品人妻一区二区三区在线看| 视频在线免费观看你懂的| 热情的邻居在线中文字幕| 久久99热成人网不卡| 国产又粗又猛又大爽又黄| 91亚洲国产成人久久| 欧美黑人暴力猛交精品| 精品老司机视频在线观看| 日本一品道在线免费观看| 在线观看欧美视频一区| 久久精品亚洲精品一区| 婷婷色网视频在线播放| 午夜成年人黄片免费观看| 国产女同精品一区二区| 超碰在线免费公开中国黄片| 亚洲清纯一区二区三区| 久久99亚洲小姐精品综合| 国产精品久久精品国产| 久久三级国外久久久三级| 中文字幕禁断介一区二区| 国产综合香蕉五月婷在线| 日韩性生活视频免费在线观看|