天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當前位置:主頁 > 科技論文 > 搜索引擎論文 >

基于用戶興趣模型的個性化搜索排序研究

發(fā)布時間:2018-03-06 20:46

  本文選題:用戶興趣模型 切入點:個性化因子 出處:《浙江理工大學》2015年碩士論文 論文類型:學位論文


【摘要】:隨著信息時代的到來,互聯(lián)網(wǎng)上數(shù)據(jù)規(guī)模呈指數(shù)增長。一方面搜索引擎的數(shù)據(jù)抓取覆蓋率遠不及信息增長的速度,另一方面網(wǎng)民的數(shù)量和質(zhì)量都在提高,這對搜索引擎提出了更高的要求。搜索引擎如何提供更好的用戶體驗,更精確的個性需求排序結(jié)果,是現(xiàn)代個性化搜索引擎的研究熱點和發(fā)展方向。 本課題從搜索引擎整體的架構(gòu)原理開始分析,提出個性化因子概念,對用戶興趣模型的構(gòu)建和更新進行分析,最終實現(xiàn)基于用戶興趣模型的個性化搜索引擎原型系統(tǒng)。主要工作體現(xiàn)在以下幾個方面: 1.分析總結(jié)目前個性化搜索引擎構(gòu)建方案。包括基于查詢改進、設置頁面權(quán)重、元搜索引擎合并和網(wǎng)絡爬蟲采集個性化方案,進而確定本課題使用查詢改進與頁面權(quán)重相結(jié)合方式來構(gòu)建個性化搜索引擎。 2.用戶興趣模型構(gòu)建。根據(jù)興趣頁面概念提出興趣頁面判定公式,獨創(chuàng)性提出興趣模型與用戶興趣模型解耦合方式。利用ODP生成興趣模型,形成具有興趣等級的樹狀結(jié)構(gòu)模型,用戶興趣模型則是用關(guān)鍵詞及權(quán)重構(gòu)成向量,通過兩者之間的映射關(guān)系在實際應用中進行轉(zhuǎn)換處理。重點研究用戶興趣模型構(gòu)建方案,,從興趣頁面提取頁面特征詞,利用判定公式得到用戶興趣特征詞,根據(jù)興趣特征詞出現(xiàn)的位置重新計算興趣特征詞的權(quán)重值。用戶興趣模型更新策略體現(xiàn)在權(quán)值的變化上,對長期興趣和短期興趣以及興趣詞所在層級關(guān)系分別使用不同的遺忘因子對權(quán)值進行更新。 3.在Lucene公式中引入個性化因子。對Lucene評分算法機制進行分析,利用其開源和良好的擴展性,將用戶興趣模型的權(quán)重加到排序算法中,使得排序結(jié)果體現(xiàn)用戶興趣偏好。 4.實現(xiàn)個性化搜索引擎原型系統(tǒng),并對結(jié)果進行比較分析。利用Nutch和封裝了Lucene功能的Solr開源框架搭建個性化搜索引擎,在程序代碼中調(diào)用Solr應用服務?紤]到Solr自帶分詞器對中文不支持,使用了第三方IKAnalyzer插件進行分詞。最后選取了幾組關(guān)鍵詞進行查詢并對結(jié)果進行比較分析,證明本課題所使用的個性化因子在應用中的可行性。
[Abstract]:With the advent of the information age, the scale of data on the Internet has increased exponentially. On the one hand, the data capture coverage of search engines is far from the speed of information growth, and on the other hand, the quantity and quality of Internet users are improving. How search engines provide better user experience and more accurate ranking results of personality requirements is the research focus and development direction of modern personalized search engines. This topic begins with the analysis of the whole structure principle of search engine, puts forward the concept of personalization factor, and analyzes the construction and updating of user interest model. Finally, the prototype system of personalized search engine based on user interest model is implemented. The main work is as follows:. 1. Analyze and summarize the current personalized search engine construction scheme, including query improvement, page weight setting, meta-search engine merging and web crawler acquisition personalized scheme, Furthermore, this paper uses query improvement and page weight to construct personalized search engine. 2. Constructing user interest model. According to the concept of interest page, this paper puts forward an interest page judging formula, and originality puts forward the decoupling method between interest model and user interest model. The interest model is generated by ODP, and a tree structure model with interest level is formed. On the other hand, the user interest model is composed of keywords and weights, and the mapping relationship between them is transformed in practical application. The construction scheme of user interest model is studied, and the page feature words are extracted from interest pages. According to the location of interest feature words, the weight of interest feature words is re-calculated by using the decision formula. The updating strategy of user interest model is reflected in the change of weights. Different forgetting factors are used to update the weights of long-term interest and short-term interest, as well as the hierarchy of interest words. 3. The individuation factor is introduced into the Lucene formula, the mechanism of Lucene scoring algorithm is analyzed, and the weight of user interest model is added to the sorting algorithm by using its open source and good expansibility, which makes the sorting result reflect the preference of user interest. 4. The prototype system of personalized search engine is implemented, and the results are compared and analyzed. The personalized search engine is built by using Nutch and the open source framework of Solr, which encapsulates the function of Lucene. Solr application service is called in the program code. Considering that the Solr native word particifier does not support Chinese, the third party IKAnalyzer plug-in is used for word segmentation. Finally, several groups of keywords are selected for query and the results are compared and analyzed. It is proved that the individuation factor used in this paper is feasible in application.
【學位授予單位】:浙江理工大學
【學位級別】:碩士
【學位授予年份】:2015
【分類號】:TP391.3

【參考文獻】

相關(guān)期刊論文 前10條

1 張丹;;中文分詞算法綜述[J];黑龍江科技信息;2012年08期

2 陳一峰;趙恒凱;余小清;萬旺根;;基于本體的用戶興趣模型構(gòu)建研究[J];計算機工程;2010年21期

3 邵秀麗;乜聚科;侯樂彩;田振雷;;基于綜合用戶信息的用戶興趣建模研究[J];南開大學學報(自然科學版);2009年03期

4 李偉;;基于Nutch和Hadoop的分布式搜索引擎探究[J];信息通信;2012年05期

5 李超;謝坤武;;用戶搜索體驗質(zhì)量及搜索結(jié)果排序[J];計算機工程與應用;2014年01期

6 徐樹振;羅學禮;王森;楊莉;段嘉杰;張德剛;;企業(yè)非結(jié)構(gòu)化數(shù)據(jù)檢索研究[J];信息技術(shù);2014年04期

7 王瑋璇;;基于Lucene的自定義檢索模型在內(nèi)容管理系統(tǒng)全文檢索中的應用[J];機電產(chǎn)品開發(fā)與創(chuàng)新;2014年02期

8 牛凱;;Web數(shù)據(jù)挖掘在校園網(wǎng)搜索引擎系統(tǒng)中的應用研究[J];中國信息化;2014年11期

9 李樹青;崔北亮;;基于個性化信息推薦服務的Web搜索引擎技術(shù)綜述[J];情報雜志;2007年08期

10 胡吉明;;個性化搜索引擎中的用戶興趣提取技術(shù)[J];圖書館學刊;2006年04期



本文編號:1576455

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/1576455.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶8ee9f***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com