基于本體的食品安全垂直搜索引擎研究
發(fā)布時間:2018-07-12 11:58
本文選題:垂直搜索引擎 + 本體。 參考:《浙江工業(yè)大學》2013年碩士論文
【摘要】:最近一段時間,由于不斷有食品安全事件出現(xiàn),社會各界人士開始將目光投向了食品安全問題。國家雖然建有統(tǒng)一的食品安全信息發(fā)布平臺,但是就目前運營狀況看,難以達到信息全面、更新及時、與老百姓互動的基本水平。因此,有必要建立一個公共信息平臺,及時收集分散在互聯(lián)網(wǎng)上食品安全信息,為人民群眾提供信息服務。 隨著搜索引擎技術(shù)快速發(fā)展和各行業(yè)需求的變化,國內(nèi)外出現(xiàn)了許多面向各領域的垂直搜索引擎,但能夠搜索食品信息的引擎較少。實踐證明本體等自然語言處理技術(shù)的使用,可以提高了搜索的精準度,因此,許多垂直搜索引擎開始應用本體技術(shù)。隨著食品安全法規(guī)的制定,食品安全標準已經(jīng)較為詳實,這就為構(gòu)建食品安全本體鋪平了道路,并可以應用本體技術(shù)來提高系統(tǒng)的食品安全檢測能力。由搜索引擎獲取信息重要功能模塊是聚焦爬蟲,因此如何針對食品安全領域提高聚焦爬蟲效能,以獲得較為理想的準確率和召回率是本文研究的重點。 因此,本文首先分析了官方網(wǎng)站和一般論壇網(wǎng)站的不同特征,提出了不同的頁面搜索方法。文章通過分析了常用聚焦爬蟲運行算法,針對食品安全信息較多、分類較細而導致檢索結(jié)果不理想的情況,提出了一種組合搜索算法。這種策略包括利用本體開展更為高效的鏈接分析和相關度計算。本文采用了優(yōu)化的Fish-Search算法,使用本體進行關鍵詞篩選和拓展,分步實現(xiàn)相關度分析。即先用向量空間模型計算,然后結(jié)合本體得出較精確的文檔相似度,再進行分類。在此過程中用k最近鄰算法和Bayes算法做分類算法,基本做到了先宏觀再微觀的主題篩選。最后,在這個基礎上,本文做了相關信息抓取和檢索實驗,實驗結(jié)果表明,基于本體的信息檢索方法能能夠明顯提高網(wǎng)絡蜘蛛的抓取效率和對食品安全信息搜索的查準率。
[Abstract]:Recently, due to the continuous food safety incidents, people from all walks of life began to focus on food safety issues. Although the country has a unified food safety information publishing platform, it is difficult to reach the basic level of comprehensive information, timely update and interaction with the common people. Therefore, it is necessary to set up a public information platform to collect food safety information scattered over the Internet in time to provide information services for the people. With the rapid development of search engine technology and the change of industry demand, there are many vertical search engines facing various fields at home and abroad, but there are fewer engines to search food information. Practice has proved that the use of natural language processing technology such as ontology can improve the accuracy of search. Therefore many vertical search engines begin to apply ontology technology. With the establishment of food safety laws and regulations, food safety standards have been more detailed, which paved the way for the construction of food safety ontology, and can use ontology technology to improve the system's ability of food safety detection. The important function module of obtaining information from search engine is focused crawler, so how to improve the efficiency of focused crawler in the field of food safety in order to obtain ideal accuracy and recall rate is the focus of this paper. Therefore, this paper firstly analyzes the different features of official website and general forum website, and puts forward different page search methods. Based on the analysis of common focused crawler algorithms, a combined search algorithm is proposed to solve the problem that the food safety information is more and the classification is fine, which leads to the unsatisfactory retrieval results. This strategy includes using ontology to carry out more efficient link analysis and correlation calculation. In this paper, the optimized Fish-Search algorithm is used to filter and expand the keywords of ontology, and the correlation analysis is realized step by step. First, the vector space model is used to calculate, then the more accurate document similarity is obtained by ontology, and then classified. In this process, k-nearest neighbor algorithm and Bayes algorithm are used as classification algorithms. Finally, on this basis, this paper has done the relevant information capture and retrieval experiment, the experimental results show that the ontology-based information retrieval method can obviously improve the efficiency of web spider capture and the precision of food safety information search.
【學位授予單位】:浙江工業(yè)大學
【學位級別】:碩士
【學位授予年份】:2013
【分類號】:TS201.6;TP391.3
【參考文獻】
相關期刊論文 前4條
1 周立柱,林玲;聚焦爬蟲技術(shù)研究綜述[J];計算機應用;2005年09期
2 艾英山;張德賢;;基于文本和類別信息的KNN文本分類算法[J];計算機與數(shù)字工程;2009年11期
3 劉博;楊柳;袁方;;改進的KNN方法及其在中文文本分類中的應用[J];西華大學學報(自然科學版);2008年02期
4 陳軍;陳竹敏;;基于網(wǎng)頁分塊的Shark-Search算法[J];山東大學學報(理學版);2007年09期
,本文編號:2117077
本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2117077.html
最近更新
教材專著