石油企業(yè)海量網(wǎng)頁檢索系統(tǒng)設(shè)計與實現(xiàn)

發(fā)布時間：2018-09-12 12:22

【摘要】：當(dāng)前，隨著油田企業(yè)勘探一體化的不斷發(fā)展壯大，用于生產(chǎn)、經(jīng)營、科研數(shù)據(jù)分析以及統(tǒng)一組織和管理，企業(yè)無紙化辦公電子文檔和網(wǎng)頁需求的增加和出現(xiàn)，使得文檔數(shù)量每年成倍的增加，文檔的保存量變得非常龐大大。電子文檔在現(xiàn)有搜索引擎的不能很好的提供專門的信息索引。定制出企業(yè)級文檔檢索，，達(dá)到文檔信息快速查找、調(diào)用，解決以往信息檢索效率低下、查找不準(zhǔn)確的難題。根據(jù)最新互聯(lián)網(wǎng)調(diào)查，截止到目前，互聯(lián)網(wǎng)上一共有超過數(shù)以億計的網(wǎng)站信息量信息。全球最大的搜索引擎Google收錄了超過80億的網(wǎng)頁信息，搜索引擎的網(wǎng)頁提取系統(tǒng)（又稱爬蟲），是搜索引擎的主要應(yīng)用模塊之一，而爬蟲的速度、抓取網(wǎng)頁質(zhì)量又是奠定引擎搜索效率的主要標(biāo)準(zhǔn)。為讓爬蟲滿足企業(yè)數(shù)據(jù)搜集的需要，減少因信息重復(fù)搜集而產(chǎn)生的不必要數(shù)據(jù)重復(fù)。本文針對當(dāng)前企業(yè)海量網(wǎng)頁檢索存在的缺陷，根據(jù)油田企業(yè)的具體業(yè)務(wù)需求提出一種新的信息檢索方法，在檢索系統(tǒng)的機(jī)構(gòu)化結(jié)構(gòu)中引入了多Field思想。另外，針對企業(yè)局域網(wǎng)硬件條件，系統(tǒng)采用基于Lucene的詞法分析方法，對網(wǎng)頁進(jìn)行頁面數(shù)據(jù)分析，高效提取網(wǎng)頁的純正文內(nèi)容。最后，對系統(tǒng)進(jìn)行完整性驗證和性能分析。最后，對系統(tǒng)進(jìn)行了測試，測試結(jié)果表明系統(tǒng)滿足企業(yè)海量網(wǎng)頁檢索的需求，在可靠性、實用性、穩(wěn)定性、速度和安全性方面具有一定優(yōu)勢。
[Abstract]:At present, with the continuous development of the integration of exploration and exploration in oil field enterprises, the demand for paperless office electronic documents and web pages for production, management, scientific research data analysis, and unified organization and management has increased and appeared. The number of documents multiplied every year and the amount of documents saved became very large. Electronic documents in the existing search engine can not provide a good specialized information index. Enterprise document retrieval is customized to quickly find and call document information and solve the problem of low efficiency and inaccuracy of information retrieval in the past. According to the latest Internet survey, there are more than hundreds of millions of website information on the Internet so far. Google, the world's largest search engine, contains more than 8 billion pages of information. The search engine's web page extraction system, also known as the crawler, is one of the main application modules of the search engine, and the speed of the crawler. Grabbing the quality of web pages is also the main standard for engine search efficiency. In order to satisfy the need of enterprise data collection, the crawler can reduce the unnecessary data duplication caused by repeated collection of information. In this paper, a new information retrieval method is proposed according to the specific business requirements of oil field enterprises, aiming at the defects of the current massive web page retrieval of enterprises, and the idea of multiple Field is introduced into the institutional structure of the retrieval system. In addition, aiming at the hardware condition of the enterprise LAN, the system adopts the lexical analysis method based on Lucene, carries on the page data analysis to the webpage, and extracts the pure text content of the webpage efficiently. Finally, the integrity of the system and performance analysis. Finally, the system is tested, and the test results show that the system meets the requirements of mass web search, and has some advantages in reliability, practicability, stability, speed and security.
【學(xué)位授予單位】：電子科技大學(xué)
【學(xué)位級別】：碩士
【學(xué)位授予年份】：2013
【分類號】：TP391.3;TP393.092

【相似文獻(xiàn)】

相關(guān)期刊論文前10條

1 常莉;網(wǎng)頁檢索技術(shù)[J];河南科技;2004年09期

2 高廣太!本刊記者;網(wǎng)頁檢索加速[J];電腦知識與技術(shù);2001年18期

3 童建平;網(wǎng)頁檢索功能的簡單實現(xiàn)[J];計算機(jī)周刊;2000年49期

4 豆豆;搜索的是財富[J];軟件世界;2001年09期

5 李村合;呂克強(qiáng);;Nutch搜索引擎的頁面排序修改方法研究[J];計算機(jī)工程與設(shè)計;2009年06期

6 劉凱鵬;方濱興;;一種基于社會性標(biāo)注的網(wǎng)頁排序算法[J];計算機(jī)學(xué)報;2010年06期

7 向元平,曾鵬,胡曉;中國公用分組交換網(wǎng)網(wǎng)絡(luò)信息搜集系統(tǒng)介紹[J];中國數(shù)據(jù)通訊網(wǎng)絡(luò);2000年05期

8 陳季梅;;日本讀者怎樣利用Internet[J];圖書館雜志;2000年05期

9 馮家俊 ,陸遜;快速準(zhǔn)確獲取因特網(wǎng)上的教育信息資源——互聯(lián)網(wǎng)上的搜索技巧[J];江蘇教育;2002年14期

10 江河;;基于知識本體的個性化網(wǎng)頁檢索系統(tǒng)設(shè)計與實現(xiàn)[J];常州工學(xué)院學(xué)報;2010年06期

相關(guān)會議論文前10條

1 張國良;;植物生理學(xué)網(wǎng)絡(luò)教學(xué)系統(tǒng)設(shè)計與實現(xiàn)[A];2007中國植物生理學(xué)會全國學(xué)術(shù)會議論文摘要匯編[C];2007年

2 陶U

本文編號：2238993

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2238993.html

論文發(fā)表

·知網(wǎng)|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

石油企業(yè)海量網(wǎng)頁檢索系統(tǒng)設(shè)計與實現(xiàn)