基于機(jī)器學(xué)習(xí)的個(gè)性化信息檢索的研究
本文選題:信息檢索 + 個(gè)性化。 參考:《吉林大學(xué)》2017年碩士論文
【摘要】:近幾年來,互聯(lián)網(wǎng)快速發(fā)展使得信息資源數(shù)據(jù)規(guī)模暴漲,促使了人們對于網(wǎng)絡(luò)的依賴性不斷的增加。快速的生活節(jié)奏使得大眾在繁雜的網(wǎng)絡(luò)中迅速而準(zhǔn)確的獲取自己想要的信息變得至關(guān)重要,搜索引擎作為普通大眾尋找網(wǎng)絡(luò)資源最為重要的入口,其重要性日趨明顯。隨著越來越多的用戶依賴于搜索引擎獲取資源,搜索引擎的體驗(yàn)的好壞已經(jīng)嚴(yán)重影響著人們的生活,其中影響用戶體驗(yàn)效果最為重要的就是檢索的結(jié)果和用戶需求的相關(guān)性程度。從當(dāng)今搜索引擎的發(fā)展來看,目前的搜索引擎還遠(yuǎn)沒有達(dá)到能返回完全符合用戶需求的資源。決定搜索引擎返回結(jié)果和用戶需求的相關(guān)性的關(guān)鍵性技術(shù),是搜索引擎的檢索模型,早期對于檢索模型的主要是研究方向都是基于用戶的輸入搜索關(guān)鍵詞對相關(guān)文檔進(jìn)行排序。但是通過研究發(fā)現(xiàn)存在兩個(gè)問題,一個(gè)是用戶可能對自己所要搜尋的資源不明確,二是用戶通過搜索引擎輸入的關(guān)鍵詞通常不能完全表達(dá)自己的需求。基于以上兩個(gè)問題,研究者們提出把機(jī)器學(xué)習(xí)應(yīng)用到搜索引擎的檢索模型中,但是這種方案目前還正在處于研究階段,本文的目的就是討論和研究怎樣把機(jī)器學(xué)習(xí)應(yīng)用到檢索模型中,提高信息檢索的準(zhǔn)確率,縮短查詢信息的時(shí)間。機(jī)器學(xué)習(xí)應(yīng)用到信息檢索中的方法稱為學(xué)習(xí)排序,而目前常見的學(xué)習(xí)排序分為三類,單文檔方法、文檔對方法、文檔列表法,其中文檔列表法是機(jī)器學(xué)習(xí)應(yīng)用于信息檢索被認(rèn)為最為有效的也最有研究前景的方法。目前在文檔列表法中最為有效的方法是Christopher J.C.Burges提出的Lambda MART。本文提出結(jié)合用戶個(gè)性化的信息來提高信息檢索結(jié)果的準(zhǔn)確性,即為個(gè)性化信息檢索,個(gè)性化信息檢索是一個(gè)彌補(bǔ)傳統(tǒng)搜索引擎無法準(zhǔn)確獲取用戶搜索意圖的一種手段,針對如何把個(gè)性化信息加入搜索結(jié)果排序中,本文在Lambda MART算法的基礎(chǔ)上對其進(jìn)行了改進(jìn),結(jié)合了用戶的個(gè)性化信息,包括用戶的性別、年齡、職業(yè)、地址信息、歷史網(wǎng)絡(luò)瀏覽信息,然后根據(jù)用戶的搜索關(guān)鍵詞,預(yù)測用戶的搜索意圖并把預(yù)測結(jié)果融合在排序結(jié)果中。Lambda MART是以決策迭代樹做為框架,并根據(jù)Rank Net和Lambda Rank來推出的負(fù)梯度方向做為每次迭代的方向,該梯度是具有實(shí)際的物理意義的梯度。并且該算法最大的優(yōu)勢在于能結(jié)合信息檢索中的評價(jià)指標(biāo),使得其在實(shí)際應(yīng)用中更加有效。本文提出在使用決策迭代樹進(jìn)行模型訓(xùn)練時(shí),特征的選擇加入用戶的個(gè)性化信息,并對Lambda MART在無初始模型的情況下提出通過優(yōu)化每次迭代的學(xué)習(xí)率來達(dá)到快速收斂的效果,解決了原始算法在無初始模型情況下無法訓(xùn)練的缺陷。接著本文對比了Rank Net、GBDT與本文采用的Lambda MART算法進(jìn)行實(shí)驗(yàn),通過MAP與NDCG指標(biāo)得出結(jié)論,Lambda MART做為文檔列表法算法在信息檢索具有很大的優(yōu)勢。接著本文在Lambda MART的基礎(chǔ)上加入個(gè)性化信息,提出了本文的個(gè)性化信息檢索模型,與原始Lambda MART,以及Rank Net,GBDT進(jìn)行實(shí)驗(yàn)對比,并參照MAP與NDCG指標(biāo)發(fā)現(xiàn),在加入個(gè)性化信息之后,模型的信息檢索準(zhǔn)確率有大幅度提升,尤其是在主題性較強(qiáng)的領(lǐng)域。本文不僅提出算法,給出算法的具體過程,給出了實(shí)驗(yàn)驗(yàn)證,并且在最后給出了實(shí)際應(yīng)用結(jié)果數(shù)據(jù)。結(jié)果顯示,本文的個(gè)性化信息檢索模型,在檢索準(zhǔn)確率,以及用戶滿意度,對比原始的算法有較大的提升,個(gè)性化檢索是信息檢索的未來的方向,本文算法的提出,以及系統(tǒng)的設(shè)計(jì)實(shí)現(xiàn)對未來個(gè)性化檢索都有重要的參考價(jià)值。
[Abstract]:In recent years, the rapid development of the Internet has made the scale of information resources skyrocketing, prompting people to continue to increase their dependence on the network. The fast pace of life makes it very important for the masses to get the information they want quickly and accurately in the complex network. Search engines are the most common people in search of network resources. The important entrance is becoming more and more important. As more and more users rely on the search engine to obtain resources, the experience of the search engine has seriously affected people's life. The most important thing that affects the effect of the user experience is the degree of correlation between the results of the retrieval and the needs of the users. The key technology to determine the correlation between the return of the search engine and the needs of the user is the retrieval model of the search engine. The main research direction of the early search model is based on the user's input search keyword to Xiang Guanwen. But through the study, there are two problems, one is that the user may not have clear resources to search for themselves, and the two is that the key words that the user input through the search engine usually do not fully express their needs. Based on the above two questions, the researchers bring up the retrieval model that applies the machine learning to the search engine. But this scheme is still at the stage of research. The purpose of this paper is to discuss and study how to apply machine learning to the retrieval model, to improve the accuracy of information retrieval and to shorten the time of query information. The method of applying the machine learning to information retrieval is called learning sort, and the common learning sort is at present. For the three class, single document method, document pair method, and document list method, document list method is the most effective and the most promising method for machine learning to be applied to information retrieval. The most effective method in the document list method is the Lambda MART. proposed by Christopher J.C.Burges, which combines user personalization in this paper. Information retrieval results are more accurate, that is, personalized information retrieval, personalized information retrieval is a means to make up for the traditional search engine can not accurately obtain the user's search intention. In view of how to sort the personalized information into the search results, this paper changes it on the basis of the Lambda MART algorithm. It combines the user's personalized information, including the user's gender, age, occupation, address information, historical network browsing information, and then according to the user's search key words, predict the user's search intention and merge the prediction results into the ranking results.Lambda MART is the decision of the iterative tree as the framework, and based on the Rank Net and Lambda Rank The negative gradient direction is introduced as the direction of each iteration, and the gradient is the gradient of actual physical meaning. And the greatest advantage of the algorithm is that it can combine the evaluation index in information retrieval so that it is more effective in practical application. The user's personalized information and the effect of fast convergence by optimizing the learning rate of each iteration by optimizing the learning rate of each iteration in the absence of the initial model, and solving the defects that the original algorithm can not train in the absence of the initial model. Then this paper compares the Rank Net, GBDT and the Lambda MART algorithm used in this paper to carry out the experiment. Through the MAP and NDCG indicators, it is concluded that the Lambda MART as the document list algorithm has a great advantage in information retrieval. Then the personalized information is added to the Lambda MART, and the personalized information retrieval model is proposed, which is compared with the original Lambda MART, as well as Rank Net, GBDT. It is found that after adding personalized information, the accuracy of information retrieval of the model has been greatly improved, especially in the field of strong theme. This paper not only proposes algorithms, gives the specific process of the algorithm, gives the experimental verification, and finally gives the actual application result data. The results show that the personalized information retrieval model of this paper is shown. In the retrieval accuracy and the user satisfaction, the original algorithm has been greatly improved. The personalized retrieval is the future direction of the information retrieval. The proposed algorithm and the design of the system have important reference value for the future personalized retrieval.
【學(xué)位授予單位】:吉林大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:TP391.3;TP181
【相似文獻(xiàn)】
相關(guān)期刊論文 前10條
1 余肖生;張芳芳;;面向用戶的個(gè)性化信息檢索[J];圖書館理論與實(shí)踐;2006年06期
2 李樹青;;個(gè)性化信息檢索技術(shù)綜述[J];情報(bào)理論與實(shí)踐;2009年05期
3 易明;操玉杰;毛進(jìn);;基于點(diǎn)擊流的個(gè)性化信息檢索研究[J];情報(bào)科學(xué);2011年04期
4 楊林;;淺析個(gè)性化信息檢索模型[J];蘭臺世界;2013年02期
5 楊濤;;個(gè)性化信息檢索及其實(shí)現(xiàn)方式探析[J];圖書情報(bào)論壇;2002年02期
6 陳小華;趙捧未;;基于關(guān)聯(lián)規(guī)則的個(gè)性化信息檢索系統(tǒng)研究[J];情報(bào)科學(xué);2006年06期
7 郭新明;趙薔;弋改珍;;基于相關(guān)反饋的個(gè)性化信息檢索模型研究[J];咸陽師范學(xué)院學(xué)報(bào);2008年06期
8 田曉珍;張敏;;基于元搜索引擎的個(gè)性化信息檢索系統(tǒng)[J];科技情報(bào)開發(fā)與經(jīng)濟(jì);2008年02期
9 朱曉斌;周源;;個(gè)性化信息檢索在網(wǎng)絡(luò)營銷中的應(yīng)用[J];科技信息(學(xué)術(shù)研究);2008年05期
10 徐險(xiǎn)峰;;2001—2008年我國個(gè)性化信息檢索研究綜述[J];新世紀(jì)圖書館;2009年03期
相關(guān)會議論文 前2條
1 鄒博偉;張宇;范基禮;鄭偉;劉挺;;基于改進(jìn)的TextTiling方法的用戶新興趣發(fā)現(xiàn)的研究[A];第四屆全國信息檢索與內(nèi)容安全學(xué)術(shù)會議論文集(上)[C];2008年
2 張艷;周國祥;;Web挖掘在個(gè)性化信息檢索中的應(yīng)用[A];計(jì)算機(jī)技術(shù)與應(yīng)用進(jìn)展·2007——全國第18屆計(jì)算機(jī)技術(shù)與應(yīng)用(CACIS)學(xué)術(shù)會議論文集[C];2007年
相關(guān)重要報(bào)紙文章 前1條
1 應(yīng)曉敏 竇文華;條條道路通羅馬[N];計(jì)算機(jī)世界;2003年
相關(guān)博士學(xué)位論文 前1條
1 王曉春;基于用戶搜索歷史的個(gè)性化信息檢索研究[D];哈爾濱工業(yè)大學(xué);2015年
相關(guān)碩士學(xué)位論文 前10條
1 王劍;基于用戶偏好分析的個(gè)性化信息檢索關(guān)鍵技術(shù)研究[D];蘇州大學(xué);2016年
2 胡曠達(dá);基于神經(jīng)網(wǎng)絡(luò)的個(gè)性化信息檢索模型研究[D];沈陽航空航天大學(xué);2016年
3 金眾威;基于機(jī)器學(xué)習(xí)的個(gè)性化信息檢索的研究[D];吉林大學(xué);2017年
4 劉宏;基于語義的個(gè)性化信息檢索研究[D];華北電力大學(xué)(河北);2010年
5 余肖生;數(shù)字圖書館的個(gè)性化信息檢索研究[D];華中師范大學(xué);2004年
6 紀(jì)明奎;基于語義網(wǎng)的個(gè)性化信息檢索模型研究[D];黑龍江大學(xué);2007年
7 尹紅麗;基于本體的個(gè)性化信息檢索系統(tǒng)模型研究[D];山東大學(xué);2006年
8 陳小華;數(shù)據(jù)挖掘技術(shù)在個(gè)性化信息檢索系統(tǒng)中的應(yīng)用研究[D];西安電子科技大學(xué);2006年
9 萬里;基于本體的個(gè)性化信息檢索研究[D];蘭州理工大學(xué);2013年
10 林霞;個(gè)性化信息檢索技術(shù)在勘探門戶中的應(yīng)用研究[D];西安石油大學(xué);2011年
,本文編號:1928012
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/1928012.html