搜索引擎系統(tǒng)網(wǎng)頁消重的研究與實現(xiàn).pdf 全文
本文關(guān)鍵詞:搜索引擎系統(tǒng)網(wǎng)頁消重的研究與實現(xiàn),,由筆耕文化傳播整理發(fā)布。
中南民族大學(xué)
碩士學(xué)位論文
搜索引擎系統(tǒng)網(wǎng)頁消重的研究與實現(xiàn)
姓名:范小源
申請學(xué)位級別:碩士
專業(yè):計算機(jī)應(yīng)用技術(shù)
指導(dǎo)教師:陸際光
20070520- I -
Internet? URL- II - Windows? JavaLucene??Lucene- III -
Abstract
The rapid popularization and development of Internet makes people face a sea of
information. It becomes essential to obtain really important informat ion from it. The
search engine mainly referred to the full text search system is a kind of tool that
provides this function. However, in the retrieval results from the search engine, there
are a large number of duplicated web pages which mainly come from the reproduction
among the websites. Those repetitive web pages not only occupy the network
bandwidth but also waste storage resources. Users do not want to see a pile of search
results with the same or approximate contents, and truly useful results are often
drowned in this redundant information and can’t be easily discovered. Effective
removal of those duplicate web pages will enhance the accuracy in searching and save
time and energy for users, so that the search system itself can save a lot of storage
resources and improve work efficiencyThis paper mainly studies the problem of removing duplicated web pages for
search engine. At present the effective methods of removing duplicated web pages are
still few, and most of them are realized in the server end, it means duplicated web
pages are dispeled during the process of collecting web pages. At present the common
used methods are the method based on the same URL, the method based on cluster,
the method based on feature codes and the method based on signature. In the method
based on cluster, a text is expressed as a vector in a vector spatial model, then various
methods are used to achieve clustering or classification. In this method calculating
the angle between vectors has high computational complexity which will take up more
proce
本文關(guān)鍵詞:搜索引擎系統(tǒng)網(wǎng)頁消重的研究與實現(xiàn),由筆耕文化傳播整理發(fā)布。
本文編號:159480
本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/159480.html