天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁 > 科技論文 > 軟件論文 >

智慧旅游中信息檢索算法的研究和應(yīng)用

發(fā)布時(shí)間:2018-06-07 01:11

  本文選題:搜索引擎 + 興趣模型; 參考:《浙江理工大學(xué)》2017年碩士論文


【摘要】:隨著生活水平的逐漸提高,旅游已成為絕大多數(shù)人的休閑活動(dòng)之一,且在當(dāng)今信息技術(shù)快速普及的趨勢下,用戶在制定旅游計(jì)劃時(shí),一般會(huì)優(yōu)先通過檢索平臺(tái)去查詢相關(guān)的旅游信息。但互聯(lián)網(wǎng)中存儲(chǔ)的旅游信息量日漸龐大且愈來愈錯(cuò)綜復(fù)雜,用戶對檢索平臺(tái)所提供的旅游信息的相關(guān)性也就越來越關(guān)注,用戶通過檢索平臺(tái)輸入檢索項(xiàng)后,總是希望與檢索項(xiàng)最相關(guān)且最可靠的旅游信息呈現(xiàn)于搜索結(jié)果的最頂端,如何將最相關(guān)且最可靠的信息源作為搜索結(jié)果呈現(xiàn)給用戶,讓用戶真正享受智慧旅游,是檢索平臺(tái)迫切要解決的問題之一。因此,檢索排序算法成為當(dāng)前搜索引擎重點(diǎn)研究的方向之一。本文針對智慧旅游中的信息檢索算法進(jìn)行了以下研究:(1)分析傳統(tǒng)Page Rank算法原理。分析傳統(tǒng)PageRank算法存在的不足,以及參考前人對其不足所進(jìn)行的改進(jìn),提出了一種基于鏈接頁面相似度的SM-PageRank算法,該算法將頁面和其鏈接網(wǎng)頁間的相似度引入到PageRank算法的計(jì)算中去,且通過這種計(jì)算能夠合理地對鏈接頁面的權(quán)值進(jìn)行分配。(2)基于用戶興趣模型對排序結(jié)果進(jìn)行二次排序;驹硎:首先為每個(gè)用戶建立用戶興趣模型,當(dāng)用戶進(jìn)行搜索時(shí),檢索引擎返回第一次排序的結(jié)果集,并將結(jié)果集中的每個(gè)頁面和用戶興趣模型進(jìn)行相似度的計(jì)算,然后使用計(jì)算好的相似度對每個(gè)頁面的得分值進(jìn)行重新計(jì)算,最后根據(jù)新的得分值進(jìn)行降序排序,并將最終的排序結(jié)果展現(xiàn)給用戶。因?yàn)槎闻判虻幕A(chǔ)是用戶興趣模型,所以需要對用戶興趣的獲取、用戶興趣模型的建立和用戶興趣模型的更新進(jìn)行更深層次地分析,以便更好地通過用戶興趣模型對第一次搜索結(jié)果集進(jìn)行二次排序。(3)使用Nutch和Solr來搭建智慧旅游檢索實(shí)驗(yàn)平臺(tái)。首先通過Nutch對實(shí)驗(yàn)數(shù)據(jù)源進(jìn)行抓取,然后將SM-Page Rank算法和傳統(tǒng)PageRank算法分別應(yīng)用到Nutch中。在Solr中使用IKAnalyzer工具進(jìn)行中文分詞,最后調(diào)用Solr所提供的應(yīng)用服務(wù)進(jìn)行搜索查詢。實(shí)驗(yàn)結(jié)果證明,與傳統(tǒng)Page Rank算法相比,優(yōu)化后的SM-PageRank算法的排序結(jié)果準(zhǔn)確率更高,且二次排序的應(yīng)用也使得搜索結(jié)果的準(zhǔn)確率進(jìn)一步提升,使最終的排序結(jié)果更加符合用戶的需求。
[Abstract]:With the gradual improvement of living standards, tourism has become one of the leisure activities of the vast majority of people, and under the trend of rapid popularization of information technology, when users make travel plans, In general, priority will be given to query related travel information through the search platform. However, the amount of tourism information stored in the Internet is becoming larger and more complex, and users pay more and more attention to the relevance of tourism information provided by the retrieval platform. It is always hoped that the most relevant and reliable tourist information will be presented at the top of the search results. How to use the most relevant and reliable information sources as the search results to make the users really enjoy the intelligent travel, It is one of the urgent problems to be solved by the retrieval platform. Therefore, search sorting algorithm has become one of the key research directions of search engine. In this paper, the information retrieval algorithm in intelligent tourism is studied as follows: 1) the principle of traditional Page Rank algorithm is analyzed. This paper analyzes the shortcomings of the traditional PageRank algorithm and proposes a SM-PageRank algorithm based on the similarity of linked pages. The algorithm introduces the similarity between the pages and its linked web pages into the calculation of the PageRank algorithm, and reasonably allocates the weights of the linked pages. The basic principle is: firstly, the user interest model is established for each user. When the user searches, the search engine returns the first sorted result set, and calculates the similarity between each page in the result set and the user interest model. Then the calculated similarity is used to recalculate the score value of each page. Finally, according to the new score value, the descending order is arranged, and the final sorting result is presented to the user. Because the second order is based on the user interest model, it is necessary to further analyze the acquisition of user interest, the establishment of user interest model and the updating of user interest model. In order to better use the user interest model to sort the first search result set twice. (3) using Nutch and Solr to build the intelligent travel retrieval experimental platform. First, the experimental data source is fetched by Nutch, and then the SM-Page Rank algorithm and the traditional PageRank algorithm are applied to Nutch. In Solr, IKAnalyzer is used to segment Chinese words, and finally, the application service provided by Solr is called to search and query. The experimental results show that, compared with the traditional Page Rank algorithm, the optimized SM-PageRank algorithm has a higher accuracy rate, and the application of the secondary sorting algorithm can further improve the accuracy of the search results. Make the final sorting results more in line with the needs of the user.
【學(xué)位授予單位】:浙江理工大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:TP391.3

【參考文獻(xiàn)】

相關(guān)期刊論文 前10條

1 王豐;俞成海;汪佳文;徐立波;;基于頁面相似度的PageRank算法[J];浙江理工大學(xué)學(xué)報(bào)(自然科學(xué)版);2017年02期

2 榮國婷;羅勇;孫建軍;;基于日志分析的圖書館主頁網(wǎng)站用戶行為研究[J];圖書館雜志;2015年07期

3 曹姍姍;王沖;;基于網(wǎng)頁鏈接與用戶反饋的PageRank算法改進(jìn)研究[J];計(jì)算機(jī)科學(xué);2014年12期

4 代寬;趙輝;韓冬;宋天勇;;基于向量空間模型的中文網(wǎng)頁主題特征項(xiàng)抽取[J];吉林大學(xué)學(xué)報(bào)(信息科學(xué)版);2014年01期

5 胡飛;黃軍建;成平廣;席海;李軍;;基于統(tǒng)計(jì)的網(wǎng)頁凈化模板生成算法[J];科學(xué)技術(shù)與工程;2013年04期

6 郝水龍;吳共慶;胡學(xué)鋼;;基于層次向量空間模型的用戶興趣表示及更新[J];南京大學(xué)學(xué)報(bào)(自然科學(xué)版);2012年02期

7 馬瑞新;鄧貴仕;王曉;;基于擴(kuò)散理論的HITS算法在Web挖掘中的研究與優(yōu)化[J];計(jì)算機(jī)應(yīng)用研究;2012年01期

8 王鐘斐;王彪;;基于錨文本相似度的PageRank改進(jìn)算法[J];計(jì)算機(jī)工程;2010年24期

9 張璇;程京;王敏;;用戶興趣模型在個(gè)性化搜索引擎中的應(yīng)用研究[J];微計(jì)算機(jī)信息;2010年33期

10 王德廣;周志剛;梁旭;;PageRank算法的分析及其改進(jìn)[J];計(jì)算機(jī)工程;2010年22期

相關(guān)碩士學(xué)位論文 前4條

1 楊晶;用戶興趣模型及實(shí)時(shí)個(gè)性化推薦算法研究[D];南京郵電大學(xué);2013年

2 黃華東;基于用戶模型的個(gè)性化搜索研究[D];華東理工大學(xué);2013年

3 南智敏;基于網(wǎng)頁興趣度的用戶興趣模型體系研究[D];復(fù)旦大學(xué);2012年

4 張寧;基于查詢分類的增量式用戶個(gè)性化建模技術(shù)研究[D];浙江大學(xué);2008年

,

本文編號(hào):1988995

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/1988995.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶683b3***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請E-mail郵箱bigeng88@qq.com