天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁(yè) > 科技論文 > 搜索引擎論文 >

基于Hadoop的分布式垂直搜索引擎研究與設(shè)計(jì)

發(fā)布時(shí)間:2018-08-07 14:59
【摘要】:隨著互聯(lián)網(wǎng)的發(fā)展,網(wǎng)絡(luò)技術(shù)日趨成熟,互聯(lián)網(wǎng)上的站點(diǎn)越來越多,信息量非常的巨大。但是由于網(wǎng)絡(luò)技術(shù)的發(fā)展與網(wǎng)絡(luò)資源增長(zhǎng)速度加快,網(wǎng)絡(luò)信息的用戶也越來越多,相比之下,傳統(tǒng)綜合搜索引擎存在覆蓋率范圍有限、返回結(jié)果多而繁雜、更新周期長(zhǎng)以及查詢歧義等諸多問題。 與此同時(shí),信息多元化的不斷增長(zhǎng),不同用戶的檢索需求存在很大差異,傳統(tǒng)綜合搜索引擎已不能有針對(duì)性地滿足不同的檢索需求。且目前成功運(yùn)營(yíng)的商業(yè)搜索引擎大部分采用了集中式體系結(jié)構(gòu),系統(tǒng)對(duì)單臺(tái)服務(wù)器性能要求高,易出現(xiàn)故障、擴(kuò)展性差等。針對(duì)這些缺點(diǎn),一個(gè)性能佳、容錯(cuò)好、擴(kuò)展容易、分類細(xì)致精確、數(shù)據(jù)全面深入、更新及時(shí)的分布式垂直搜索便應(yīng)運(yùn)而生。 分布式是指多臺(tái)服務(wù)器構(gòu)建一個(gè)集群,服務(wù)器之間相互協(xié)調(diào)進(jìn)行工作;垂直搜索是指針對(duì)某一行業(yè)的專業(yè)搜索,其特點(diǎn)是“專、精、深”,具有鮮明行業(yè)特色,是通用搜索引擎的細(xì)分和延伸。本課題采用Hadoop搭建了分布式集群,然后對(duì)開源搜索組件Nutch和Solr進(jìn)行源碼分析,接著深入了解搜索引擎相關(guān)理論知識(shí)和研究搜索引擎的關(guān)鍵技術(shù),在此基礎(chǔ)上借鑒已有學(xué)術(shù)成果,,在主題相關(guān)性判別、網(wǎng)頁(yè)檢索排序等方面做了一些改進(jìn),利用領(lǐng)域本體知識(shí)構(gòu)建鋼鐵領(lǐng)域本體庫(kù),擴(kuò)展用戶查詢條件,使信息的定位和查找更加的精確,最后修改開源搜索組件源代碼基于Hadoop設(shè)計(jì)并實(shí)現(xiàn)了分布式垂直搜索引擎雛形,并與百度商業(yè)搜索引擎比較搜索結(jié)果,對(duì)實(shí)驗(yàn)結(jié)果進(jìn)行分析和評(píng)價(jià)后,證明本系統(tǒng)具有明顯的主題傾向性,查準(zhǔn)率優(yōu)于通用搜索引擎。
[Abstract]:With the development of the Internet, network technology is becoming more and more mature, more and more sites on the Internet, the amount of information is very huge. However, due to the rapid development of network technology and the rapid growth of network resources, more and more users of network information, by contrast, the traditional comprehensive search engine has limited coverage, returns many and complex results. Long update period and query ambiguity and many other issues. At the same time, with the increasing of information diversification, the retrieval needs of different users are very different. The traditional integrated search engine can no longer meet the different retrieval needs. Most of the successful commercial search engines use centralized architecture. The system requires high performance of a single server, prone to failure, poor scalability and so on. In order to solve these problems, a distributed vertical search with timely updating is proposed, which has the advantages of good performance, good fault tolerance, easy expansion, precise classification and thorough data. Distributed refers to the construction of a cluster of multiple servers, where servers work in coordination with each other. Vertical search refers to a professional search for a particular industry, which is characterized by "specialty, precision, depth", with distinctive industry characteristics. General search engine is the subdivision and extension. This paper uses Hadoop to build a distributed cluster, then analyzes the open source search components Nutch and Solr, then deeply understand the relevant theoretical knowledge of search engine and research the key technologies of search engine, and draw lessons from the existing academic achievements. Some improvements have been made in the aspects of topic correlation discrimination, web search and ranking. The domain ontology knowledge is used to construct the steel domain ontology database, and the query conditions of users are extended, so that the information can be located and searched more accurately. Finally, the prototype of distributed vertical search engine is designed and implemented based on Hadoop, and the search results are compared with those of Baidu commercial search engine, and the experimental results are analyzed and evaluated. It is proved that the system has obvious thematic tendency and the precision rate is superior to that of the general search engine.
【學(xué)位授予單位】:河北工業(yè)大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2012
【分類號(hào)】:TP391.3

【參考文獻(xiàn)】

相關(guān)期刊論文 前8條

1 譚月輝;肖冰;陳建泗;齊京禮;李志勇;;Jena推理機(jī)制及應(yīng)用研究[J];河北省科學(xué)院學(xué)報(bào);2009年04期

2 宋玉銀,蔡復(fù)之,張伯鵬,許隆文;面向并行工程的集成產(chǎn)品信息建模技術(shù)研究[J];計(jì)算機(jī)研究與發(fā)展;1998年02期

3 鄭霄;李宏亮;吳東;原昊;;分布式狀態(tài)空間生成的設(shè)計(jì)與實(shí)現(xiàn)[J];計(jì)算機(jī)工程與應(yīng)用;2009年32期

4 胡玉杰,李善平,郭鳴;基于本體的產(chǎn)品知識(shí)表達(dá)[J];計(jì)算機(jī)輔助設(shè)計(jì)與圖形學(xué)學(xué)報(bào);2003年12期

5 孫正興,張福炎;特征設(shè)計(jì)方法在方案設(shè)計(jì)中的應(yīng)用初探[J];機(jī)械設(shè)計(jì)與研究;1999年01期

6 劉琳娜;薛建武;汪小梅;;領(lǐng)域本體構(gòu)建方法的研究[J];情報(bào)雜志;2007年04期

7 封碩;趙捧未;施水才;;基于RSS的分布式博客搜索引擎的研究[J];情報(bào)雜志;2007年08期

8 耿科明;袁方;;Jena推理機(jī)在基于本體的信息檢索中的應(yīng)用[J];微型機(jī)與應(yīng)用;2005年10期



本文編號(hào):2170395

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2170395.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶24bfd***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com