石油企業(yè)海量網(wǎng)頁檢索系統(tǒng)設(shè)計與實現(xiàn)
[Abstract]:At present, with the continuous development of the integration of exploration and exploration in oil field enterprises, the demand for paperless office electronic documents and web pages for production, management, scientific research data analysis, and unified organization and management has increased and appeared. The number of documents multiplied every year and the amount of documents saved became very large. Electronic documents in the existing search engine can not provide a good specialized information index. Enterprise document retrieval is customized to quickly find and call document information and solve the problem of low efficiency and inaccuracy of information retrieval in the past. According to the latest Internet survey, there are more than hundreds of millions of website information on the Internet so far. Google, the world's largest search engine, contains more than 8 billion pages of information. The search engine's web page extraction system, also known as the crawler, is one of the main application modules of the search engine, and the speed of the crawler. Grabbing the quality of web pages is also the main standard for engine search efficiency. In order to satisfy the need of enterprise data collection, the crawler can reduce the unnecessary data duplication caused by repeated collection of information. In this paper, a new information retrieval method is proposed according to the specific business requirements of oil field enterprises, aiming at the defects of the current massive web page retrieval of enterprises, and the idea of multiple Field is introduced into the institutional structure of the retrieval system. In addition, aiming at the hardware condition of the enterprise LAN, the system adopts the lexical analysis method based on Lucene, carries on the page data analysis to the webpage, and extracts the pure text content of the webpage efficiently. Finally, the integrity of the system and performance analysis. Finally, the system is tested, and the test results show that the system meets the requirements of mass web search, and has some advantages in reliability, practicability, stability, speed and security.
【學(xué)位授予單位】:電子科技大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2013
【分類號】:TP391.3;TP393.092
【相似文獻(xiàn)】
相關(guān)期刊論文 前10條
1 常莉;網(wǎng)頁檢索技術(shù)[J];河南科技;2004年09期
2 高廣太!本刊記者;網(wǎng)頁檢索加速[J];電腦知識與技術(shù);2001年18期
3 童建平;網(wǎng)頁檢索功能的簡單實現(xiàn)[J];計算機(jī)周刊;2000年49期
4 豆豆;搜索的是財富[J];軟件世界;2001年09期
5 李村合;呂克強(qiáng);;Nutch搜索引擎的頁面排序修改方法研究[J];計算機(jī)工程與設(shè)計;2009年06期
6 劉凱鵬;方濱興;;一種基于社會性標(biāo)注的網(wǎng)頁排序算法[J];計算機(jī)學(xué)報;2010年06期
7 向元平,曾鵬,胡曉;中國公用分組交換網(wǎng)網(wǎng)絡(luò)信息搜集系統(tǒng)介紹[J];中國數(shù)據(jù)通訊網(wǎng)絡(luò);2000年05期
8 陳季梅;;日本讀者怎樣利用Internet[J];圖書館雜志;2000年05期
9 馮家俊 ,陸遜;快速準(zhǔn)確獲取因特網(wǎng)上的教育信息資源——互聯(lián)網(wǎng)上的搜索技巧[J];江蘇教育;2002年14期
10 江河;;基于知識本體的個性化網(wǎng)頁檢索系統(tǒng)設(shè)計與實現(xiàn)[J];常州工學(xué)院學(xué)報;2010年06期
相關(guān)會議論文 前10條
1 張國良;;植物生理學(xué)網(wǎng)絡(luò)教學(xué)系統(tǒng)設(shè)計與實現(xiàn)[A];2007中國植物生理學(xué)會全國學(xué)術(shù)會議論文摘要匯編[C];2007年
2 陶U
本文編號:2238993
本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2238993.html