WEB信息整合平臺設計與實現(xiàn)
[Abstract]:With the rapid development of Internet technology and the rapid growth of network information resources, the network has become an important source for people to obtain data. Facing the huge network resources, the search engine provides the important technical means for people's retrieval. However, the traditional search engine is based on word retrieval, there are some limitations, such as search results have a large number of unrelated web pages, because of reprinting and resulting in the same information content and so on. Therefore, it is very necessary to integrate the network information resources to help people extract the specific information that people care about from the massive network resources, and to reintegrate and unify the data. The main research work of this paper is to integrate WEB resource information so that Internet users can quickly and accurately search for the information they need. Firstly, this paper studies the theory and technology of WEB information integration, including two methods of information integration, three modules and four key technologies. In the process of design, the knowledge involved in each module is summarized, including ontology concept, web crawler, information extraction, resource description framework and so on. Secondly, this paper designs and implements a prototype system of WEB information integration platform, which is guided by ontology. The system is composed of four modules: data acquisition, information extraction, storage model and foreground presentation. This paper proposes a web crawler based on ontology and search engine, a page analysis filtering algorithm based on ontology, and information extraction rules based on ontology and DOM tree path. And a series of design schemes, such as data storage model based on RDF and foreground result presentation based on B / S, etc. Through the information integration platform, the user can set up the domain information that needs to be integrated. The system can retrieve and integrate the related domain resources in the Internet, and display the results to the user in a unified, structured and vivid way. The system does not need to set up wrappers for different data sources separately, but the scope of the entire Internet, and can integrate a variety of heterogeneous resources in the Internet. Finally, the paper also makes a comprehensive test on WEB information integration platform, including crawler efficiency and crawl test, data extraction rate test and so on. The test shows that the system can integrate some heterogeneous data sources in the Internet, but there are some shortcomings.
【學位授予單位】:電子科技大學
【學位級別】:碩士
【學位授予年份】:2012
【分類號】:TP391.1
【參考文獻】
相關期刊論文 前10條
1 程文濤;師雪霖;;以本體為指導的Web網(wǎng)頁信息抽取方法[J];北京化工大學學報(自然科學版);2011年04期
2 李保利,陳玉忠,俞士汶;信息抽取研究綜述[J];計算機工程與應用;2003年10期
3 蔡俊杰;孫建伶;董金祥;;建立Web信息集成系統(tǒng)[J];計算機科學;2001年12期
4 楊先娣;彭智勇;劉君強;李旭輝;;信息集成研究綜述[J];計算機科學;2006年07期
5 周德懋;李舟軍;;高性能網(wǎng)絡爬蟲:研究綜述[J];計算機科學;2009年08期
6 鄒嘉麟,陳家訓;Web信息資源整合系統(tǒng)模型和方法[J];計算機工程;2004年12期
7 李勇;韓亮;;主題搜索引擎中網(wǎng)絡爬蟲的搜索策略研究[J];計算機工程與科學;2008年03期
8 李效東,顧毓清;基于DOM的Web信息提取[J];計算機學報;2002年05期
9 周立柱,林玲;聚焦爬蟲技術(shù)研究綜述[J];計算機應用;2005年09期
10 劉金紅;陸余良;;主題網(wǎng)絡爬蟲研究綜述[J];計算機應用研究;2007年10期
相關碩士學位論文 前5條
1 方少卿;Web就業(yè)信息抽取系統(tǒng)的實現(xiàn)研究[D];合肥工業(yè)大學;2010年
2 薛惠忠;WEB信息的抽取與集成[D];東南大學;2004年
3 史軍強;WEB信息集成技術(shù)研究[D];電子科技大學;2005年
4 賀智平;Web信息自動抽取技術(shù)研究[D];西安電子科技大學;2006年
5 江佳;信息集成中Web信息抽取技術(shù)的研究[D];西安電子科技大學;2007年
,本文編號:2245262
本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2245262.html