基于solr的分布式搜索引擎研究
本文關(guān)鍵詞:基于solr的分布式搜索引擎研究,由筆耕文化傳播整理發(fā)布。
網(wǎng)友sanshengyuanting近日為您收集整理了關(guān)于基于solr的分布式搜索引擎研究-碩士畢業(yè)論文的文檔,希望對(duì)您的工作和學(xué)習(xí)有所幫助。以下是文檔介紹:基于 solr 的分布式搜索引擎研究-碩士畢業(yè)論文分類號(hào)學(xué)號(hào) M201076071 學(xué)校代碼 10487 密級(jí)碩士學(xué)位論文基于 solr 的分布式搜索引擎研究學(xué)位申請(qǐng)人: 張新生學(xué)科專業(yè): 軟件工程指導(dǎo)教師: 薛志東副教授2012.5.17答辯日期:A Thesis Submitted in Partial Fulfillment of the Requirementsfor the Degree for the Master of EngineeringThe research of Distributed SearchEngine Based on SolrCandidate : Zhang XinshengMajor: Software EngineeringSupervisor : Assoc. Prof. XueZhidongHuazhong University of Science and TechnologyWuhan 430074, P. R.ChinaMay, 2012華中科技大學(xué)碩士學(xué)位論文摘要隨著中小型企業(yè)的迅猛發(fā)展以及電腦信息化的大量普及,快速發(fā)展的企業(yè)信息量呈指數(shù)增長(zhǎng)。企業(yè)用戶想要在龐大的海量信息庫(kù)中找到自己需要的準(zhǔn)確信息,就如同在大海中撈針不太現(xiàn)實(shí)。而解決這一難題的可用方法就是搜索引擎技術(shù), 利用它可以為用戶提供比較簡(jiǎn)單的信息檢索服務(wù)。為了能夠更好的處理企業(yè)海量數(shù)據(jù)及搜索的準(zhǔn)確性,在搜索引擎系統(tǒng)中引入了分布式計(jì)算和 Solr 全文檢索技術(shù)。主要針對(duì)海量數(shù)據(jù)的處理及高并發(fā)請(qǐng)求的處理來構(gòu)架分布式的搜索引擎。提出分布式搜索引擎的主要研究工作在于對(duì)傳統(tǒng)的搜索引擎進(jìn)行分布式計(jì)算。應(yīng)對(duì)海量數(shù)據(jù)的處理主要采用分布式建立索引及分布式搜索的策略。并采用分布式文件系統(tǒng)進(jìn)行存儲(chǔ)索引文件。然后對(duì)系統(tǒng)的整體流程框架進(jìn)行深入探討,得到能夠有效應(yīng)對(duì)海量數(shù)據(jù)處理的結(jié)構(gòu)及流程。在應(yīng)對(duì)處理高并發(fā)請(qǐng)求方面,給出軟硬負(fù)載均衡及優(yōu)化每個(gè)分布式節(jié)點(diǎn)的策略。通過負(fù)載均衡策略和對(duì)每個(gè)分布式有效節(jié)點(diǎn)進(jìn)行優(yōu)化處理, 使其性能達(dá)到能夠快速處理高并發(fā)的請(qǐng)求的水平。并且針對(duì) Solr 索引處理機(jī)制采用主從式復(fù)制集群部署,使其更好的適應(yīng)海量數(shù)據(jù)及并發(fā)請(qǐng)求的處理。最后, 在實(shí)驗(yàn)室的環(huán)境下構(gòu)建了一個(gè)有兩個(gè)有效節(jié)點(diǎn)的小型分布式搜索引擎系統(tǒng), 其中每個(gè)有效節(jié)點(diǎn)是集群部署的兩臺(tái)計(jì)算機(jī)。對(duì)其建立海量索引,并且通過對(duì)引擎的壓力測(cè)試, 得到實(shí)驗(yàn)數(shù)據(jù)結(jié)果。通過分析理解實(shí)驗(yàn)數(shù)據(jù)結(jié)果, 驗(yàn)證了系統(tǒng)架構(gòu)的可靠性、擴(kuò)展性和穩(wěn)定性。關(guān)鍵詞: 分布式計(jì)算海量信息高并發(fā) Solr 搜索引擎 I 華中科技大學(xué)碩士學(xué)位論文AbstractWith the rapid development of small and medium-sized enterprises, aswell as theincreasing popularity puter information technology, the rapiddevelopment ofenterprise’s amount of information has grown exponentially.Business users want accurateinformation they need to find a huge mass of information the library,it is not realistic asfishing for a needle in the ocean. The search engine technology isan effective way tosolve this problem, which allows you to provide users with arelatively simple informationretrieval service. In the search engine system in order to be ableto better deal with hugeamounts of data and search accuracy, use of puting andthe Solr full textretrieval technologyThe search engine user distributed processingarchitecture for massive dataprocessing and high concurrent requests. Proposed a distributedsearch engine, the mainresearch work is puting on traditional search engines.Massive dataprocessing should be distributed indexing and distributed searchstrategy. And distributedfile system to store the index file. And then conduct in-depthdiscussion on the overallprocess framework to effectively deal with massive data processing,structure andprocesses. In response to the treatment of high concurrent requests,given the software andhardware load balancing and optimization strategies of eachdistributed node. Loadbalancing strategy and for each distributed active nodes to optimizeits performance to beable to quickly deal with the high concurrent requests. Master-slavereplication clusterdeployment, and to better adapt to the huge amounts of data and theprocessing ofconcurrent requests and handling mechanism for Solr indexFinally, inmy laboratory environment to build a small two active nodes distributedsearch engine system, where each active node cluster deployed puters. Establishits mass ind
12>
播放器加載中,,請(qǐng)稍候...
系統(tǒng)無法檢測(cè)到您的Adobe Flash Player版本
建議您在線安裝最新版本的Flash Player 在線安裝
本文關(guān)鍵詞:基于solr的分布式搜索引擎研究,由筆耕文化傳播整理發(fā)布。
本文編號(hào):58038
本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/58038.html