基于元搜索的Web信息搜索技術(shù)研究
發(fā)布時間:2019-01-23 08:57
【摘要】:伴隨著互聯(lián)網(wǎng)的普及和發(fā)展,網(wǎng)絡(luò)信息內(nèi)容日益增加,這些信息中不但包含文本形式的內(nèi)容,圖片、音頻、視頻等內(nèi)容也夾雜其中。如何在網(wǎng)絡(luò)資源中快速準(zhǔn)確地篩選和整理用戶所需信息成為信息檢索領(lǐng)域的一個研究熱點。 數(shù)據(jù)挖掘技術(shù)在人工智能領(lǐng)域也稱之為知識發(fā)現(xiàn),它是通過分析已有數(shù)據(jù),從海量數(shù)據(jù)中找尋相同的規(guī)律,并將發(fā)現(xiàn)的規(guī)律進(jìn)行展示的技術(shù)。而Web信息搜索技術(shù)是數(shù)據(jù)挖掘技術(shù)在互聯(lián)網(wǎng)上的一項拓展。 搜索引擎最早的收錄方式是人工收錄,人工收錄的方法建立的搜索引擎以雅虎為代表。這種方法將互聯(lián)網(wǎng)的信息經(jīng)過人工搜集、篩選并進(jìn)行相關(guān)分類,之后將整理好的結(jié)果收錄到網(wǎng)站中。但是這種方法由于人工維護(hù)成本高昂、用戶知識結(jié)構(gòu)各不相同等因素不能滿足用戶多方面需求。伴隨著數(shù)據(jù)挖掘技術(shù)的發(fā)展,自動化的搜索引擎應(yīng)運而生。搜索引擎通過網(wǎng)絡(luò)機(jī)器人程序?qū)⒒ヂ?lián)網(wǎng)中所有數(shù)據(jù)進(jìn)行數(shù)據(jù)關(guān)聯(lián)并進(jìn)行爬行抓取,從而得到信息索引。同時,它為用戶提供一個信息檢索平臺,用戶可以通過該平臺使用關(guān)鍵詞進(jìn)行檢索。 搜索引擎可分為:全文搜索引擎、目錄搜索引擎、元搜索引擎等。其中元搜索引擎是網(wǎng)頁搜索引擎的進(jìn)一步延伸,用戶可以在一個用戶交互平臺中根據(jù)關(guān)鍵詞選擇在多個搜索引擎中進(jìn)行相關(guān)檢索操作,元搜索的特點就在于可以獨立調(diào)用其他搜索引擎,實現(xiàn)信息的跨引擎融合,滿足用戶快速整合信息的需求。元搜索引擎與傳統(tǒng)搜索引擎相比,,前者能夠獲得更加精確而全面的信息。 本文系統(tǒng)地闡述了Web信息提取技術(shù)的相關(guān)原理和研究現(xiàn)狀,同時介紹了Web信息提取技術(shù)的關(guān)鍵步驟。重點研究了搜索引擎的流程以及關(guān)鍵性技術(shù),并對元搜索進(jìn)行了深入研究。 本文的主要工作主要體現(xiàn)在: (1)對Web信息提取技術(shù)的研究背景以及Web信息提取技術(shù)的分類和步驟進(jìn)行了闡述。 (2)對Web信息提取模型、HTML語言和DOM文檔對象進(jìn)行了介紹。 (3)對SSH框架中Struts、Spring、Hibernate框架進(jìn)行了介紹,對網(wǎng)站的結(jié)構(gòu)信息進(jìn)行了分析。 (4)對搜索引擎的背景和分類以及關(guān)鍵技術(shù)進(jìn)行了總結(jié),以AJAX技術(shù)、HTML Parser等技術(shù)設(shè)計并實現(xiàn)了一個元搜索引擎。 (5)對搜索引擎得到的結(jié)果進(jìn)行了分析比對。 (6)對搜索引擎程序進(jìn)行了測試。 本文的研究以原有搜索引擎技術(shù)為基礎(chǔ),為實現(xiàn)更好的元搜索和開發(fā)更優(yōu)秀的網(wǎng)絡(luò)信息檢索工具提供了一些參考。
[Abstract]:With the popularization and development of the Internet, the content of network information is increasing day by day. These information not only include the content of text form, picture, audio, video and so on. How to quickly and accurately filter and organize the information required by users in the network resources has become a hot research topic in the field of information retrieval. Data mining technology is also called knowledge discovery in the field of artificial intelligence. It is a technique to find the same rule from the massive data and display the discovered rule by analyzing the existing data. Web information search technology is an extension of data mining technology on the Internet. The earliest way to include search engines is manual collection, artificial methods of building search engines to Yahoo as a representative. This method collects, sift and classifies the information of the Internet manually, and then collects the results into the website. However, due to the high cost of manual maintenance and different user knowledge structure, this method can not meet the needs of users. With the development of data mining technology, automated search engine emerges as the times require. The search engine links all the data in the Internet through the network robot program and crawls the data to get the information index. At the same time, it provides users with an information retrieval platform through which users can use keywords to retrieve. Search engines can be divided into: full-text search engines, catalog search engines, meta-search engines and so on. Meta search engine is a further extension of web search engine. Users can select multiple search engines according to keywords in a user interaction platform. The feature of meta-search is that it can call other search engines independently to realize the cross-engine fusion of information and meet the needs of users to integrate information quickly. Compared with traditional search engines, meta-search engines can obtain more accurate and comprehensive information. In this paper, the principle and research status of Web information extraction technology are systematically described, and the key steps of Web information extraction technology are also introduced. Focus on the search engine process and key technologies, and meta-search in-depth study. The main work of this paper is as follows: (1) the research background of Web information extraction technology and the classification and steps of Web information extraction technology are expounded. (2) introduce Web information extraction model, HTML language and DOM document object. (3) the Struts,Spring,Hibernate framework in SSH framework is introduced, and the structure information of the website is analyzed. (4) the background, classification and key technologies of search engine are summarized, and a meta-search engine is designed and implemented by AJAX technology, HTML Parser and so on. (5) the results of search engine are analyzed and compared. (6) testing the search engine program. Based on the original search engine technology, this paper provides some references for realizing better meta search and developing better web information retrieval tools.
【學(xué)位授予單位】:吉林大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2012
【分類號】:TP311.52
本文編號:2413650
[Abstract]:With the popularization and development of the Internet, the content of network information is increasing day by day. These information not only include the content of text form, picture, audio, video and so on. How to quickly and accurately filter and organize the information required by users in the network resources has become a hot research topic in the field of information retrieval. Data mining technology is also called knowledge discovery in the field of artificial intelligence. It is a technique to find the same rule from the massive data and display the discovered rule by analyzing the existing data. Web information search technology is an extension of data mining technology on the Internet. The earliest way to include search engines is manual collection, artificial methods of building search engines to Yahoo as a representative. This method collects, sift and classifies the information of the Internet manually, and then collects the results into the website. However, due to the high cost of manual maintenance and different user knowledge structure, this method can not meet the needs of users. With the development of data mining technology, automated search engine emerges as the times require. The search engine links all the data in the Internet through the network robot program and crawls the data to get the information index. At the same time, it provides users with an information retrieval platform through which users can use keywords to retrieve. Search engines can be divided into: full-text search engines, catalog search engines, meta-search engines and so on. Meta search engine is a further extension of web search engine. Users can select multiple search engines according to keywords in a user interaction platform. The feature of meta-search is that it can call other search engines independently to realize the cross-engine fusion of information and meet the needs of users to integrate information quickly. Compared with traditional search engines, meta-search engines can obtain more accurate and comprehensive information. In this paper, the principle and research status of Web information extraction technology are systematically described, and the key steps of Web information extraction technology are also introduced. Focus on the search engine process and key technologies, and meta-search in-depth study. The main work of this paper is as follows: (1) the research background of Web information extraction technology and the classification and steps of Web information extraction technology are expounded. (2) introduce Web information extraction model, HTML language and DOM document object. (3) the Struts,Spring,Hibernate framework in SSH framework is introduced, and the structure information of the website is analyzed. (4) the background, classification and key technologies of search engine are summarized, and a meta-search engine is designed and implemented by AJAX technology, HTML Parser and so on. (5) the results of search engine are analyzed and compared. (6) testing the search engine program. Based on the original search engine technology, this paper provides some references for realizing better meta search and developing better web information retrieval tools.
【學(xué)位授予單位】:吉林大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2012
【分類號】:TP311.52
【引證文獻(xiàn)】
相關(guān)碩士學(xué)位論文 前2條
1 薛奰舒;基于數(shù)據(jù)挖掘的旋轉(zhuǎn)設(shè)備振動故障診斷應(yīng)用[D];吉林大學(xué);2013年
2 馬迪;基于FFCA的模糊本體學(xué)習(xí)方法研究[D];大連海事大學(xué);2013年
本文編號:2413650
本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2413650.html
最近更新
教材專著