基于用戶(hù)興趣模型的個(gè)性化搜索算法研究

發(fā)布時(shí)間：2019-04-27 10:33

【摘要】：隨著Internet上的信息量迅速增長(zhǎng)，人們?yōu)榱怂阉鞯脚c自己相關(guān)的信息，開(kāi)發(fā)了搜索引擎，這是查詢(xún)資源發(fā)展過(guò)程中的一次重大里程碑。但是隨著人們的需求不斷提高，傳統(tǒng)搜索引擎的檢索精確度低、重復(fù)網(wǎng)頁(yè)多等缺點(diǎn)逐漸顯露，以至于已經(jīng)不能滿(mǎn)足用戶(hù)的需求。為了能更好地滿(mǎn)足用戶(hù)的需求，個(gè)性化、智能化成為了搜索引擎發(fā)展的趨勢(shì)。本文對(duì)搜索引擎的個(gè)性化作了比較深入的研究，主要研究?jī)?nèi)容如下：首先，通過(guò)對(duì)現(xiàn)有的用戶(hù)興趣模型的研究，提出了一種新的用戶(hù)興趣模型構(gòu)建算法。即在不同粒度上多次使用奇異值分解和k-means聚類(lèi)算法，將用戶(hù)瀏覽歷史及其所包含的詞在不同層次上進(jìn)行文檔聚類(lèi)和詞聚類(lèi)，進(jìn)而創(chuàng)建兩棵加權(quán)興趣樹(shù)：文檔類(lèi)樹(shù)和詞類(lèi)樹(shù)。其中，樹(shù)中每個(gè)節(jié)點(diǎn)的權(quán)值表示用戶(hù)對(duì)該類(lèi)文檔或該類(lèi)詞的感興趣程度。實(shí)驗(yàn)結(jié)果表明本文提出的用戶(hù)興趣模型在計(jì)算頁(yè)面興趣分類(lèi)的準(zhǔn)確率上有較大的提高。其次，針對(duì)向量空間模型的不足，提出了一種改進(jìn)方法。即采用奇異值分解技術(shù)對(duì)其進(jìn)行降維處理，由此得到的文檔-詞類(lèi)矩陣能很好地解決向量空間模型的高維性、稀疏性以及同義詞和多義詞現(xiàn)象等問(wèn)題。實(shí)驗(yàn)結(jié)果表明本文提出的改進(jìn)的向量空間模型在計(jì)算頁(yè)面分類(lèi)的準(zhǔn)確率上比傳統(tǒng)的向量空間模型有較大提高。最后，針對(duì)現(xiàn)有的搜索引擎排序算法的不足，提出了一個(gè)新的排序算法。即在本文提出的用戶(hù)興趣模型的基礎(chǔ)上，利用樸素貝葉斯分類(lèi)器對(duì)傳統(tǒng)搜索引擎檢索得到的文檔進(jìn)行文檔分類(lèi)和詞分類(lèi)，，并根據(jù)分類(lèi)結(jié)果進(jìn)行文檔評(píng)分，最后將文檔根據(jù)文檔得分降序排列。實(shí)驗(yàn)結(jié)果表明本文提出的個(gè)性化排序算法在相同條件下比基于概率模型的個(gè)性化搜索算法的精確度更高，能更好地滿(mǎn)足用戶(hù)的個(gè)性化需求。
[Abstract]:With the rapid growth of information on Internet, people have developed a search engine in order to search for information related to themselves, which is a major milestone in the development of query resources. However, with the increasing demand of people, the shortcomings of traditional search engine, such as low retrieval accuracy, repeated pages and so on, are becoming more and more obvious, so that they can not meet the needs of users. In order to better meet the needs of users, individuation, intelligence has become the trend of search engine development. In this paper, the personalization of search engine is deeply studied. The main contents are as follows: firstly, through the study of existing user interest model, a new algorithm for constructing user interest model is proposed. The singular value decomposition (SVD) and k-means clustering algorithm are used to cluster the user's browsing history and its words at different levels, and then two weighted interest trees are created: document class tree and class of speech tree. The weights of each node in the tree represent the degree of interest of the user in this class of documents or words. The experimental results show that the user interest model proposed in this paper has a great improvement in calculating the accuracy of page interest classification. Secondly, aiming at the deficiency of vector space model, an improved method is proposed. In other words, the singular value decomposition (SVD) technique is used to reduce the dimension of the vector space model. The obtained document-class matrix can solve the problems of high dimension, sparsity, synonym and polysemy phenomenon of vector space model. The experimental results show that the improved vector space model is more accurate than the traditional vector space model in calculating page classification. Finally, a new sorting algorithm is proposed to overcome the shortcomings of existing search engine sorting algorithms. On the basis of the user interest model proposed in this paper, the naive Bayesian classifier is used to classify the documents retrieved by the traditional search engine and classify the words, and then the documents are graded according to the classification results. Finally, the document is arranged in descending order according to the document score. The experimental results show that the proposed personalized sorting algorithm is more accurate than the probabilistic model-based personalized search algorithm under the same conditions and can better meet the personalized needs of users.
【學(xué)位授予單位】：太原科技大學(xué)
【學(xué)位級(jí)別】：碩士
【學(xué)位授予年份】：2013
【分類(lèi)號(hào)】：TP391.3

【參考文獻(xiàn)】

相關(guān)期刊論文前7條

1 王繼成,潘金貴,張福炎;Web文本挖掘技術(shù)研究[J];計(jì)算機(jī)研究與發(fā)展;2000年05期

2 曾春,邢春曉,周立柱;基于內(nèi)容過(guò)濾的個(gè)性化搜索算法[J];軟件學(xué)報(bào);2003年05期

3 蘇貴洋,馬穎華,李建華;一種基于內(nèi)容的信息過(guò)濾改進(jìn)模型[J];上海交通大學(xué)學(xué)報(bào);2004年12期

4 常璐,夏祖奇;搜索引擎的幾種常用排序算法[J];圖書(shū)情報(bào)工作;2003年06期

5 李廣建,黃];用戶(hù)模型及其學(xué)習(xí)方法[J];現(xiàn)代圖書(shū)情報(bào)技術(shù);2002年06期

6 楊思洛;搜索引擎的排序技術(shù)研究[J];現(xiàn)代圖書(shū)情報(bào)技術(shù);2005年01期

7 陳彤兵,汪保友,胡金化,施伯樂(lè);一個(gè)實(shí)時(shí)搜索引擎的設(shè)計(jì)[J];小型微型計(jì)算機(jī)系統(tǒng);2004年05期

相關(guān)博士學(xué)位論文前1條

1 劉云峰;基于潛在語(yǔ)義分析的中文概念檢索研究[D];華中科技大學(xué);2005年

相關(guān)碩士學(xué)位論文前10條

1 李彥輝;基于用戶(hù)興趣的個(gè)性化搜索引擎研究[D];山西財(cái)經(jīng)大學(xué);2011年

2 裴仰軍;個(gè)性化服務(wù)中用戶(hù)興趣模型的研究[D];重慶大學(xué);2005年

3 張園園;基于用戶(hù)興趣的個(gè)性化搜索引擎的分析與研究[D];燕山大學(xué);2006年

4 李?lèi)?ài)明;個(gè)性化搜索引擎用戶(hù)模型研究[D];華中師范大學(xué);2007年

5 陳玉娥;個(gè)性化服務(wù)中用戶(hù)模型的研究與設(shè)計(jì)[D];山東科技大學(xué);2007年

6 王禮禮;基于潛在語(yǔ)義索引的文本聚類(lèi)算法研究[D];西南交通大學(xué);2008年

7 趙權(quán);基于粒度分析原理的模糊聚類(lèi)算法研究[D];山西大學(xué);2008年

8 時(shí)延軍;基于Nutch的分布式搜索引擎的設(shè)計(jì)與研究[D];長(zhǎng)春理工大學(xué);2010年

9 張躍火;基于用戶(hù)興趣偏好模型的個(gè)性化搜索算法[D];重慶大學(xué);2010年

10 賈欣;基于用戶(hù)興趣模型的元搜索結(jié)果排序算法研究[D];華中科技大學(xué);2012年

本文編號(hào)：2466907

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會(huì)員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2466907.html

上一篇：用百度實(shí)現(xiàn)站內(nèi)搜索
下一篇：第2屆全國(guó)搜索引擎和網(wǎng)上信息挖掘?qū)W術(shù)研討會(huì)征文通知

論文發(fā)表

·知網(wǎng)|萬(wàn)方|維普|龍?jiān)磡省級(jí)|國(guó)家級(jí)|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于用戶(hù)興趣模型的個(gè)性化搜索算法研究