基于Lucene的個(gè)性化搜索引擎系統(tǒng)
[Abstract]:The rapid development of the Internet has brought about a knowledge explosion. Users have to use the help of retrieval tools to find the information they need in the face of massive data. Demand determines the market, and various search applications represented by Google and Baidu emerge as the times require, changing the Internet. The traditional retrieval technology has been quite mature in theory and practice. The open source community has developed third-party API libraries such as Xapianli Lucene and a complete search solution based on third-party libraries. In this paper, the principle, composition and workflow of search engine are analyzed in detail, the theoretical basis of each module is introduced, and the improvement of the famous API class library Luceneis emphatically studied, and the module structure, file format, indexing process of Lucene are analyzed. The results are sorted. Currently, mainstream search solutions do not provide support for javascript scripts, making a compromise on the number and speed of web crawls. The emergence of fast javascript interpretation engines in recent years makes it possible to solve this problem. In this paper, the script interpretation engine is introduced into the crawler module to improve the crawler's understanding of the javascript script, imitating the principle of C overloading operator, reloading the operators involved in the assignment of URLs into set operations in Javascript, and realizing the extraction of script URLs. A comparative experiment is made between the intranet and the extranet, and the reasons for the success and failure of the intranet are summarized. Link analysis is an important parameter to measure the quality of web pages. In this paper, PageRank algorithm is introduced into Lucene's original scoring formula to improve the accuracy of web pages, so as to improve the quality of search results. And on the basis of the original power iterative calculation, a simple calculation method is proposed, which improves the speed of calculation. Lucene design is excellent, and a large number of interfaces are opened in each functional module to meet the user's custom requirements. Using these interfaces, this paper has done the engineering practice, and has made the experiment contrast with the original scoring formula. Finally, this article has made the practice exploration in the search engine personalization aspect. Traditional search engines are generally based on keyword matching, do not fully use the user's personality information, lack of personalized function. This paper introduces the collection of user information, the establishment and use of user model, and under the guidance of these theories, a simple personalized search module is designed and implemented under the guidance of these theories and combining with the engineering difficulty. The experimental results show that the personalized function module designed in this paper is effective.
【學(xué)位授予單位】:中國(guó)艦船研究院
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2013
【分類號(hào)】:TP391.3
【參考文獻(xiàn)】
相關(guān)博士學(xué)位論文 前1條
1 何因;排序?qū)W習(xí)中基于直接優(yōu)化信息檢索評(píng)價(jià)準(zhǔn)則算法的理論分析[D];中國(guó)科學(xué)技術(shù)大學(xué);2010年
相關(guān)碩士學(xué)位論文 前8條
1 金艷偉;基于馬爾可夫隨機(jī)場(chǎng)的蒙古文信息檢索模型研究[D];內(nèi)蒙古大學(xué);2011年
2 梁萍;搜索引擎中網(wǎng)絡(luò)爬蟲(chóng)及結(jié)果聚類的研究與實(shí)現(xiàn)[D];中國(guó)科學(xué)技術(shù)大學(xué);2011年
3 張校乾;基于Lucene的全文檢索系統(tǒng)的研究與應(yīng)用[D];大連理工大學(xué);2005年
4 何世林;基于Java技術(shù)的搜索引擎研究與實(shí)現(xiàn)[D];西南交通大學(xué);2006年
5 岑杰;面向情報(bào)領(lǐng)域的文本自動(dòng)分類系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)[D];西安電子科技大學(xué);2008年
6 亢明;基于矢量量化的語(yǔ)音識(shí)別及全文檢索研究[D];重慶大學(xué);2009年
7 楊光偉;基于Lucene的個(gè)性化搜索引擎的研究與實(shí)現(xiàn)[D];內(nèi)蒙古大學(xué);2009年
8 胡鵬飛;Lucene與中文分詞技術(shù)的研究及應(yīng)用[D];北京交通大學(xué);2010年
本文編號(hào):2119904
本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2119904.html