IPv6信息采集系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)
發(fā)布時(shí)間:2018-04-17 04:11
本文選題:IPv6資源 + 信息采集。 參考:《華南理工大學(xué)》2012年碩士論文
【摘要】:隨著互聯(lián)網(wǎng)的快速發(fā)展,網(wǎng)絡(luò)資源越來越豐富,使得通用信息采集系統(tǒng)和搜索引擎面臨著巨大的挑戰(zhàn)。人們對(duì)信息服務(wù)的要求越來越高、越來越專業(yè),通用搜索引擎不能滿足用戶對(duì)專業(yè)信息領(lǐng)域的需求。在這種情況下,主題信息采集應(yīng)運(yùn)而生。當(dāng)前IPv4地址已經(jīng)枯竭,正在向IPv6發(fā)展。中國IPv6地址數(shù)量也在近一年內(nèi)飛速增長,人們對(duì)IPv6資源的需求也越來越大。這種情況下,我們需要IPv6主題信息采集系統(tǒng)更快地抓取的IPv6的資源。 本文旨在設(shè)計(jì)并實(shí)現(xiàn)一個(gè)高效的、健壯的、可配置的、準(zhǔn)確的IPv6主題信息采集系統(tǒng),,為搜索引擎提供可靠的IPv6資源,以滿足人們對(duì)IPv6資源的需求。本文首先研究國內(nèi)外信息采集系統(tǒng)的發(fā)展?fàn)顩r。然后介紹搜索引擎的相關(guān)理論知識(shí),主要包括搜索引擎的發(fā)展、信息采集的基本原理、主題爬蟲和網(wǎng)頁分析的算法。使用分布式系統(tǒng)的框架和MVC的分層模式設(shè)計(jì)來實(shí)現(xiàn)IPv6信息采集系統(tǒng)。系統(tǒng)中加入DNS緩存、robots緩存、站點(diǎn)信息緩存來改善系統(tǒng)的性能。本文還提出了教育網(wǎng)站點(diǎn)優(yōu)先、大站點(diǎn)優(yōu)先和基于站點(diǎn)鏈接結(jié)構(gòu)的分值傳遞的采集策略來指導(dǎo)采集系統(tǒng)進(jìn)行IPv6資源。使用RMI技術(shù)實(shí)現(xiàn)分布式節(jié)點(diǎn)間的通信,主節(jié)點(diǎn)向子節(jié)點(diǎn)發(fā)送執(zhí)行命令,從節(jié)點(diǎn)通過發(fā)送心跳信息給主節(jié)點(diǎn)報(bào)告節(jié)點(diǎn)狀態(tài)。 本文對(duì)系統(tǒng)進(jìn)行以下測(cè)試: DNS緩存效果測(cè)試、系統(tǒng)采集性能測(cè)試、IPv6采集策略效果測(cè)試,并在采集IPv6資源后進(jìn)行站點(diǎn)信息的統(tǒng)計(jì),獲取和分析IPv6站點(diǎn)的拓?fù)浣Y(jié)構(gòu)及資源分布。
[Abstract]:With the rapid development of the Internet, the network resources are more and more abundant, which makes the general information collection system and search engine face enormous challenges.The requirement of information service is getting higher and higher, and the general search engine can not meet the needs of users in the field of professional information.In this case, subject information collection emerged as the times require.The current IPv4 address has dried up and is moving towards IPv6.The number of IPv6 addresses in China has also increased rapidly in the past year, and the demand for IPv6 resources is also increasing.In this case, we need the IPv6 topic Information Collection system to grab the IPv6 resources faster.The purpose of this paper is to design and implement an efficient, robust, configurable and accurate IPv6 subject information collection system, and to provide reliable IPv6 resources for search engines to meet the needs of IPv6 resources.Firstly, this paper studies the development of information collection system at home and abroad.Then it introduces the relevant theoretical knowledge of search engine, including the development of search engine, the basic principle of information collection, the subject crawler and the algorithm of web page analysis.The framework of distributed system and the layered design of MVC are used to realize the IPv6 information collection system.System add DNS cache robots cache, site information cache to improve the system performance.This paper also proposes a collection strategy of education network site priority, large site priority and value transfer based on site link structure to guide the acquisition system to carry out IPv6 resources.The communication between distributed nodes is realized by using RMI technology. The master node sends the execution command to the child node and the slave node reports the state of the node by sending heartbeat information to the master node.This paper tests the system as follows: DNS cache effect test, system acquisition performance test and IPv6 acquisition strategy effect test. After collecting IPv6 resources, the site information is counted and the topology structure and resource distribution of IPv6 site are obtained and analyzed.
【學(xué)位授予單位】:華南理工大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2012
【分類號(hào)】:TP393.092
【參考文獻(xiàn)】
相關(guān)期刊論文 前6條
1 汪濤,樊孝忠,顧益軍,劉林;基于概念分析的主題爬蟲設(shè)計(jì)[J];北京理工大學(xué)學(xué)報(bào);2004年10期
2 黃皓凌;張凡;;6搜-高效的專用IPv6搜索引擎[J];電子設(shè)計(jì)工程;2011年23期
3 印鑒,陳憶群,張鋼;搜索引擎技術(shù)研究與發(fā)展[J];計(jì)算機(jī)工程;2005年14期
4 汪濤,樊孝忠;鏈接分析對(duì)主題爬蟲的改進(jìn)[J];計(jì)算機(jī)應(yīng)用;2004年S2期
5 韓客松,王永成;一種用于主題提取的非線性加權(quán)方法[J];情報(bào)學(xué)報(bào);2000年06期
6 李學(xué)勇,歐陽柳波,李國徽;非貪婪策略在WEB搜索中的應(yīng)用[J];中央民族大學(xué)學(xué)報(bào)(自然科學(xué)版);2004年03期
本文編號(hào):1761998
本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/1761998.html
最近更新
教材專著