社區(qū)問答系統(tǒng)中答案排序遷移學習的方法研究
本文關(guān)鍵詞: 社區(qū)問答系統(tǒng) 用戶特征 排序?qū)W習 遷移學習 排序模型 出處:《昆明理工大學》2017年碩士論文 論文類型:學位論文
【摘要】:隨著互聯(lián)網(wǎng)技術(shù)的不斷發(fā)展使得人們獲取知識、解決問題的方式變得越來越便捷。傳統(tǒng)的搜索引擎公司,例如雅虎、谷歌等為日益增多的互聯(lián)網(wǎng)用戶提供了更為方便的信息獲取方式,用戶可以通過在搜索對話框中輸入相關(guān)關(guān)鍵詞從而快速得到自己想要的信息。但是隨著互聯(lián)網(wǎng)的普及以及互聯(lián)網(wǎng)自身內(nèi)容的不斷豐富,人們在獲取答案的同時,也對得到最佳答案的便易性提出了更高的要求。基于社區(qū)問答的個性化服務(wù)有效的彌補了傳統(tǒng)搜索引擎技術(shù)上的不足從而越來越受到各個互聯(lián)網(wǎng)公司的重視。社區(qū)問答系統(tǒng)是一種新興知識共享模式,通過用戶提交問題和答案,社區(qū)積累了大量的問答對(question answering pairs)。當用戶提交新問題時,如何通過排序,為用戶提供準確的答案序列,是社區(qū)問答系統(tǒng)的重要環(huán)節(jié)。傳統(tǒng)的排序算法主要利用監(jiān)督學習的方法構(gòu)建排序模型,它需要通過大量人工標記數(shù)據(jù)來訓練模型。目前國內(nèi)外學者提出了許多基于監(jiān)督排序?qū)W習的方法并且在實際生活中得到了很好的應用,例如排序支持向量機,它就是基于監(jiān)督學習的排序算法中的典型代表,通過大量的標注數(shù)據(jù),輸入到指定的學習機當中,然后自動訓練得到一個排序模型。基于監(jiān)督排序?qū)W習的方法往往需要相當規(guī)模的標注數(shù)據(jù),保證訓練模型的可靠性,但是在實際環(huán)境當中由于標注數(shù)據(jù)的不足。當數(shù)據(jù)缺乏的時候監(jiān)督排序?qū)W算法的可靠性就會相應的降低。某個特定領(lǐng)域訓練好的排序模型,在新的領(lǐng)域往往不能獲得好的效果。并且互聯(lián)網(wǎng)中數(shù)據(jù)更新很快,之前標注的數(shù)據(jù)隨著時間的推移就無法適應當前模型的訓練。針對實際應用中標注不足的問題借助遷移學習的思想對傳統(tǒng)的排序?qū)W習方法進行改進。利用基于特征選擇的遷移學習排序算法,假設(shè)源領(lǐng)域與目標領(lǐng)域存在共享的低維特征表示,以用戶的多個興趣為源領(lǐng)域和目標領(lǐng)域的共享特征,從而使目標領(lǐng)域達到知識遷移的目的。我們通過分析社區(qū)問答系統(tǒng)自身的特點可以觀察到它存在許多基于用戶行為的標簽。結(jié)合基于特征的遷移學習方法將這些用戶特征融入到特征空間,通過選取社區(qū)中具體價值的用戶標簽和用戶行為標簽對基于特征的遷移學習排序算法進行優(yōu)化。例如問題回答者的擅長領(lǐng)域這個特征,一個問題的回答者可能會擅長多個領(lǐng)域(比如網(wǎng)球和羽毛球)在特征向量中該特征主要以布爾類型來表示,擅長為1不擅長為0。那么這個特征在羽毛球和網(wǎng)球類別中的布爾類型均為1,即這個特征可以作為羽毛球和網(wǎng)球兩個不同類別共性特征來使用,從而改善了排序?qū)W習方法。通過實驗的驗證,證實了融入用戶特征的遷移學習答案排序算法能夠有效的提高答案排序的效果。
[Abstract]:With the development of Internet technology, it is becoming more and more convenient for people to acquire knowledge and solve problems. Google and others have provided a more convenient way to access information to a growing number of Internet users. Users can quickly get the information they want by entering relevant keywords in the search dialog box. But with the popularity of the Internet and the continuous enrichment of the content of the Internet, people get the answers at the same time. The personalized service based on community Q & A effectively makes up for the technical deficiency of traditional search engine and is paid more and more attention to by various Internet companies. Q & A system is a new knowledge sharing model. By submitting questions and answers, the community has accumulated a large number of Q & A questions answering airs.When users submit new questions, how to sort them to provide them with accurate answer sequences, It is an important part of community question answering system. Traditional sorting algorithms mainly use supervised learning method to construct sort model. It needs a lot of artificial marking data to train the model. At present, scholars at home and abroad have put forward a lot of supervised ranking learning methods and have been applied in real life, such as sort support vector machine. It is a typical representative of the sorting algorithm based on supervised learning, which is input into the designated learning machine through a large amount of annotated data. Then a sort model is obtained by automatic training. The method based on supervised ranking learning often requires a considerable scale of tagging data to ensure the reliability of the training model. But in the actual environment, due to the shortage of annotated data, the reliability of the supervised sorting algorithm will be reduced when the data is lacking. It often doesn't work well in new areas. And data updates quickly on the Internet. The previously annotated data can not adapt to the training of the current model with the passage of time. In order to solve the problem of insufficient tagging in practical application, the traditional sorting learning method is improved by the idea of transfer learning. A shift Learning sorting algorithm based on sign selection, Assuming that there is a shared low-dimensional feature representation between the source domain and the target domain, the shared feature of the source domain and the target domain is based on the user's multiple interests. By analyzing the characteristics of the community Q & A system, we can observe that there are many tags based on user behavior. User features are incorporated into the feature space, By selecting user tags and user behavior tags for specific values in the community, the feature-based migration learning sorting algorithm is optimized. The answer to a question may be good at more than one area (such as tennis and badminton) in a feature vector that is mainly represented as a Boolean type. Good at 1 is not good at 0. Then this feature has a Boolean type of 1 in both badminton and tennis classes, which means that this feature can be used as a common feature of two different categories of badminton and tennis. Through the experimental verification, it is proved that the migration learning answer sorting algorithm can effectively improve the result of the answer sorting.
【學位授予單位】:昆明理工大學
【學位級別】:碩士
【學位授予年份】:2017
【分類號】:TP391.3
【參考文獻】
相關(guān)期刊論文 前5條
1 莊福振;羅平;何清;史忠植;;遷移學習研究進展[J];軟件學報;2015年01期
2 毛先領(lǐng);李曉明;;問答系統(tǒng)研究綜述[J];計算機科學與探索;2012年03期
3 田久樂;趙蔚;;基于同義詞詞林的詞語相似度計算方法[J];吉林大學學報(信息科學版);2010年06期
4 李波;高文君;邱錫鵬;;基于語法分析和統(tǒng)計方法的答案排序模型[J];中文信息學報;2009年02期
5 游斕,周雅倩,黃萱菁,吳立德;基于最大熵模型的QA系統(tǒng)置信度評分算法[J];軟件學報;2005年08期
相關(guān)博士學位論文 前2條
1 程凡;基于排序?qū)W習的信息檢索模型研究[D];中國科學技術(shù)大學;2012年
2 陳德品;基于遷移學習的跨領(lǐng)域排序?qū)W習算法研究[D];中國科學技術(shù)大學;2010年
相關(guān)碩士學位論文 前3條
1 李yN陽;社區(qū)問答系統(tǒng)中融入用戶標簽和用戶行為的列表排序方法研究[D];昆明理工大學;2016年
2 楊彬;社區(qū)問答中文問句分類的遷移學習方法研究[D];昆明理工大學;2015年
3 宗煥云;領(lǐng)域問答系統(tǒng)答案排序研究[D];昆明理工大學;2011年
,本文編號:1545928
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/1545928.html