天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁 > 科技論文 > 搜索引擎論文 >

校園搜索引擎開發(fā)及其流量測量

發(fā)布時間:2018-06-08 02:16

  本文選題:搜索引擎 + Lucene; 參考:《北京郵電大學(xué)》2012年碩士論文


【摘要】:搜索引擎往往是用戶訪問互聯(lián)網(wǎng)的第一站,它幫助用戶從海量網(wǎng)頁中挑選出自己真正關(guān)心的信息。搜索引擎技術(shù)雖然已經(jīng)相對成熟,但是核心技術(shù)還是掌握在處于壟斷地位的大公司手里。這些大的搜索引擎公司對外提供的是整個互聯(lián)網(wǎng)數(shù)據(jù)的檢索功能,而一些公司和機(jī)構(gòu)也希望擁有針對自己內(nèi)部網(wǎng)的搜索工具,這樣更具有針對性,會使得搜索效果更好并且可以防止信息泄露。本文作者所在高校的內(nèi)網(wǎng)信息相當(dāng)豐富,而目前校內(nèi)還沒有一個類似搜索引擎的工具對其進(jìn)行整理,給校園用戶帶來諸多不便。 基于方便校內(nèi)師生查找校內(nèi)網(wǎng)絡(luò)資源這個出發(fā)點(diǎn),本文開發(fā)了一個校園搜索引擎,對校內(nèi)網(wǎng)頁進(jìn)行索引,為師生查詢提供良好的搜索結(jié)果。本文校園搜索引擎的開發(fā)是基于優(yōu)秀的開源軟件Lucene和Nutch的框架,根據(jù)校內(nèi)網(wǎng)頁的特點(diǎn)和獨(dú)特的需求提出并實(shí)現(xiàn)了新的網(wǎng)頁數(shù)據(jù)集更新算法、去重算法、排序算法等,并且對很多模塊都進(jìn)行了重新定制。最終結(jié)果是開發(fā)出了一個稱之為“暢郵”的校園搜索引擎,測試結(jié)果顯示“暢郵”能夠?yàn)橛脩籼峁┍容^滿意的服務(wù)。“暢郵”的排序算法等實(shí)現(xiàn)有很好的擴(kuò)展性,以后可以根據(jù)需求進(jìn)行逐步的改進(jìn)。 同時,由于搜索業(yè)務(wù)計算量很大,單機(jī)實(shí)現(xiàn)速度太慢,本文開發(fā)的校園搜索引擎部署在Hadoop分布式平臺上。隨著越來越多的公司和機(jī)構(gòu)開始使用Hadoop運(yùn)行他們的業(yè)務(wù),關(guān)于Hadoop的研究也受到人們的廣泛關(guān)注。但是,關(guān)于運(yùn)行Hadoop的數(shù)據(jù)中心的流量測量工作目前幾乎沒有,測量工作的缺乏阻礙了Hadoop及數(shù)據(jù)中心研究的發(fā)展。本文在運(yùn)行“暢郵”的Hadoop集群基礎(chǔ)上,對運(yùn)行Hadoop的數(shù)據(jù)中心流量特性進(jìn)行了測量。根據(jù)數(shù)據(jù)中心網(wǎng)絡(luò)的固有特點(diǎn),提出了一個有針對性的測量方法,并且開發(fā)出了一個名為HADE的軟件專門用來處理和分析網(wǎng)絡(luò)數(shù)據(jù)。本文最后給出了流量特性的測量結(jié)果,并對這些測量結(jié)果做出了一定分析,為Hadoop及數(shù)據(jù)中心研究者提供有價值的研究依據(jù)。
[Abstract]:Search engine is often the first station for users to visit the Internet. It helps users pick out the information they really care about from the massive web pages. Search engine technology has been relatively mature, but the core technology is still in the monopoly of the hands of large companies. These large search engine companies provide the entire Internet data retrieval function, and some companies and institutions also want to have search tools for their own intranet, which is more targeted. Will make the search more effective and prevent information disclosure. The author has abundant information on the intranet in colleges and universities, but at present there is no search engine tool to sort it out. Based on the convenience for teachers and students to find the campus network resources, this paper develops a campus search engine to index the campus web pages to provide good search results for teachers and students. The development of campus search engine is based on the framework of the excellent open source software Lucene and Nutch. According to the characteristics and unique requirements of the campus web pages, this paper proposes and implements a new algorithm for updating web pages, reshuffling algorithms, sorting algorithms, etc. And many modules have been recustomized. The final result is to develop a campus search engine called "Changyou". The test results show that "Changyou" can provide satisfactory services to users. The sorting algorithm of "Changyou Post" has good expansibility, and can be improved step by step according to the demand. At the same time, because of the large amount of calculation of search service, the speed of single machine realization is too slow. The campus search engine developed in this paper is deployed on Hadoop distributed platform. As more and more companies and organizations begin to use Hadoop to run their business, the research on Hadoop has been paid more and more attention. However, there is almost no traffic measurement work on the data center running Hadoop, and the lack of measurement work hinders the development of Hadoop and data center research. Based on the Hadoop cluster running Changyou, this paper measures the traffic characteristics of the data center running Hadoop. According to the inherent characteristics of the data center network, a targeted measurement method is proposed, and a software named Hade is developed to process and analyze the network data. At the end of this paper, the measurement results of the flow characteristics are given, and the results are analyzed to provide a valuable basis for the research of Hadoop and data center researchers.
【學(xué)位授予單位】:北京郵電大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2012
【分類號】:TP391.3

【參考文獻(xiàn)】

相關(guān)期刊論文 前8條

1 龍樹全;趙正文;唐華;;中文分詞算法概述[J];電腦知識與技術(shù);2009年10期

2 楊小平,丁浩,黃都培;基于向量空間模型的中文信息檢索技術(shù)研究[J];計算機(jī)工程與應(yīng)用;2003年15期

3 姚文琳;劉文;;一種基于本體的PageRank算法的改進(jìn)策略[J];計算機(jī)工程;2009年06期

4 陳偉柱,陳英,吳燕;基于分類技術(shù)的搜索引擎排名算法——CategoryRank[J];計算機(jī)應(yīng)用;2005年05期

5 馬維e,

本文編號:1993945


資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/1993945.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶f5e05***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com
69老司机精品视频在线观看| 午夜福利视频六七十路熟女| 亚洲国产四季欧美一区| 日本黄色美女日本黄色| 精品国产品国语在线不卡| 国产大屁股喷水在线观看视频| 丰满少妇被猛烈撞击在线视频| 日韩欧美第一页在线观看| 亚洲欧美中文字幕精品| 少妇激情在线免费观看| 99在线视频精品免费播放| 在线日本不卡一区二区| 91人妻人人做人碰人人九色| 国产美女精品午夜福利视频 | 1024你懂的在线视频| 日韩不卡一区二区三区色图| 国产亚洲精品岁国产微拍精品| 国产老熟女乱子人伦视频| 好吊妞视频只有这里有精品| 五月天婷亚洲天婷综合网| 91偷拍裸体一区二区三区| 国产传媒高清视频在线| 国产色一区二区三区精品视频 | 操白丝女孩在线观看免费高清| 国产精品日韩欧美一区二区 | 国产一区二区三区精品免费| 日本少妇三级三级三级| 国产精品制服丝袜美腿丝袜| 亚洲免费黄色高清在线观看| 爱在午夜降临前在线观看| 老富婆找帅哥按摩抠逼视频| 91福利免费一区二区三区| 国产精品免费精品一区二区| 四季精品人妻av一区二区三区| 精品欧美国产一二三区| 天海翼精品久久中文字幕| 日韩欧美第一页在线观看| 国产美女网红精品演绎| 特黄大片性高水多欧美一级 | 空之色水之色在线播放| 亚洲一区在线观看蜜桃|