天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁 > 科技論文 > 搜索引擎論文 >

基于搜索框/資源池的云計(jì)算模型研究

發(fā)布時間:2018-04-02 00:06

  本文選題:搜索框/資源池 切入點(diǎn):分布式搜索器 出處:《南京航空航天大學(xué)》2012年碩士論文


【摘要】:本文以電子商務(wù)平臺供應(yīng)鏈云預(yù)研項(xiàng)目的研究開發(fā)為應(yīng)用背景,對基于搜索框/資源池的云計(jì)算模型進(jìn)行了研究,在分析當(dāng)前集中式搜索引擎系統(tǒng)的基礎(chǔ)上,總結(jié)了現(xiàn)有系統(tǒng)的優(yōu)缺點(diǎn),針對現(xiàn)有系統(tǒng)的缺陷和不足,建立基于搜索框/資源池的云計(jì)算框架,采用面向?qū)ο蟮姆椒ㄕ摓橹笇?dǎo),重點(diǎn)側(cè)重于云搜索引擎的分析設(shè)計(jì)與實(shí)現(xiàn)。主要研究工作在于對傳統(tǒng)搜索引擎的功能模塊加以改進(jìn),對搜索器、索引器、查詢器各部分進(jìn)行了詳細(xì)分析,采用分布式云計(jì)算技術(shù)進(jìn)行重新設(shè)計(jì)。同時,應(yīng)用Map/Reduce編程模型思想,把數(shù)據(jù)計(jì)算任務(wù)封裝到Map函數(shù)中,把數(shù)據(jù)合并任務(wù)封裝到Reduce函數(shù)中。經(jīng)過以上改進(jìn)的搜索引擎系統(tǒng)可以部署在廉價(jià)機(jī)器構(gòu)成的Hadoop分布式環(huán)境中,并顯著提高搜索引擎系統(tǒng)的查準(zhǔn)率、查全率和響應(yīng)速度。 首先,文章給出基于搜索框/資源池的云計(jì)算框架,對云計(jì)算定義及特點(diǎn)、搜索框、搜索引擎、資源池各部分做了初步的探討。之后探討了云計(jì)算關(guān)鍵技術(shù)和典型的云計(jì)算平臺、并行工作流和并行編程模式以及資源池和分布式文件系統(tǒng),分析了分布式文件系統(tǒng)以及Map/Reduce編程模型在數(shù)據(jù)處理中的具體應(yīng)用,同時討論了搜索引擎相關(guān)的倒排文檔索引機(jī)制,中文分詞原理以及相關(guān)的工具軟件的使用范圍,對傳統(tǒng)搜索引擎并行化改進(jìn)也進(jìn)行了初步討論。 接著,依據(jù)搜索引擎各子系統(tǒng)業(yè)務(wù)功能,闡述了分布式搜索引擎系統(tǒng)的設(shè)計(jì)思想。并對搜索引擎的核心三大部件:搜索器、索引器和查詢器進(jìn)行了功能設(shè)計(jì),并給出了各個部件的工作執(zhí)行流程,,并用BPEL業(yè)務(wù)流程描述語言給出了形式化描述,利用Map/Reduce模型改進(jìn)這三個功能部件,使它們具有分布式處理的能力。 然后,根據(jù)功能設(shè)計(jì)和分布式搜索器、索引器、查詢器部件的內(nèi)部執(zhí)行過程,結(jié)合BPEL業(yè)務(wù)流程描述,給出類關(guān)系實(shí)現(xiàn)圖,對單機(jī)運(yùn)行模式的nutch搜索引擎部件進(jìn)行云分布式改進(jìn),進(jìn)行系統(tǒng)的代碼編程實(shí)現(xiàn),并探討了實(shí)現(xiàn)過程中的難點(diǎn),及其解決方法。 最后,進(jìn)行開發(fā)環(huán)境搭建,實(shí)現(xiàn)了面向電子商務(wù)的云搜索引擎,給出典型分布式搜索器部件實(shí)驗(yàn)抓取數(shù)據(jù)的比對和分析,充分驗(yàn)證了云計(jì)算模式下搜索引擎的性能優(yōu)越性和技術(shù)可行性。 本文所提出的基于搜索框/資源池的云計(jì)算模型在云搜索引擎的實(shí)際開發(fā)實(shí)現(xiàn)中得到了應(yīng)用,具有一定的理論意義和工程實(shí)踐價(jià)值。
[Abstract]:In this paper, the cloud computing model based on search box / resource pool is studied based on the research and development of supply chain cloud pre-research project of e-commerce platform. Based on the analysis of the current centralized search engine system,The main research work is to improve the function module of traditional search engine, analyze the parts of searcher, indexer and query in detail, and redesign it with distributed cloud computing technology.At the same time, using the idea of Map/Reduce programming model, the data computing task is encapsulated into the Map function, and the data merge task is encapsulated into the Reduce function.The improved search engine system can be deployed in the Hadoop distributed environment composed of cheap machines, and can significantly improve the precision, recall and response speed of the search engine system.Firstly, the paper gives the cloud computing framework based on search box / resource pool, and discusses the definition and characteristics of cloud computing, search box, search engine and resource pool.Then the key technologies of cloud computing, typical cloud computing platform, parallel workflow and parallel programming mode, resource pool and distributed file system are discussed. The application of distributed file system and Map/Reduce programming model in data processing is analyzed.At the same time, the indexing mechanism of inverted documents related to search engine, the principle of Chinese word segmentation and the scope of application of related tool software are discussed, and the improvement of parallelization of traditional search engine is also discussed.Then, according to the business function of each subsystem of search engine, the design idea of distributed search engine system is expounded.At the same time, it designs the function of three key parts of search engine: searcher, indexer and query, and gives the work execution flow of each part, and gives the formal description with BPEL business process description language.The Map/Reduce model is used to improve these three functional components so that they have the ability of distributed processing.Then, according to the function design and the internal execution process of distributed searcher, indexer and query unit, combined with the description of BPEL business process, the realization diagram of class relation is given, and the cloud distributed improvement of nutch search engine part in single machine running mode is carried out.The implementation of the system code programming, and the implementation of the process of the difficulties, and the solution.Finally, the development environment is built, and the cloud search engine oriented to electronic commerce is realized, and the comparison and analysis of the data captured by the typical distributed searcher components are given.It fully verifies the performance superiority and technical feasibility of search engine in cloud computing mode.The cloud computing model based on search box / resource pool proposed in this paper has been applied in the actual development and implementation of cloud search engine, which has certain theoretical significance and engineering practical value.
【學(xué)位授予單位】:南京航空航天大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2012
【分類號】:TP391.3

【參考文獻(xiàn)】

相關(guān)期刊論文 前8條

1 董守斌;趙鐵柱;;面向搜索引擎的分布式文件系統(tǒng)性能分析[J];華南理工大學(xué)學(xué)報(bào)(自然科學(xué)版);2011年04期

2 何紹榮;鮮乾坤;;新型分布式Web Spider的設(shè)計(jì)[J];計(jì)算機(jī)工程與應(yīng)用;2011年16期

3 詹恒飛;楊岳湘;方宏;;Nutch分布式網(wǎng)絡(luò)爬蟲研究與優(yōu)化[J];計(jì)算機(jī)科學(xué)與探索;2011年01期

4 張偉哲;張宏莉;許笑;何慧;;分布式搜索引擎系統(tǒng)效能建模與評價(jià)[J];軟件學(xué)報(bào);2012年02期

5 曾劍平;吳承榮;龔凌暉;;面向分布式搜索引擎的索引庫動態(tài)維護(hù)算法[J];山東大學(xué)學(xué)報(bào)(理學(xué)版);2011年05期

6 唐華姣;何友全;徐小樂;徐澄;;基于Lucene的分布式并行索引[J];計(jì)算機(jī)技術(shù)與發(fā)展;2011年02期

7 李遠(yuǎn)方;鄧世昆;聞玉彪;韓月陽;;Hadoop-MapReduce下的PageRank矩陣分塊算法[J];計(jì)算機(jī)技術(shù)與發(fā)展;2011年08期

8 吳文忠;易平;;MapReduce在分布式搜索引擎中的應(yīng)用[J];計(jì)算機(jī)系統(tǒng)應(yīng)用;2012年02期



本文編號:1697888

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/1697888.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶b530f***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com