基于本體的主題相關(guān)度算法研究
[Abstract]:Professional search engine provides valuable information and services for a specific field, a specific group or a specific demand, which is one of the development directions of network information search in the future. With the large scale of network resources and the rapid increase of the total amount of resources, the first problem solved by professional search engines is how to obtain the network information of specific fields or topics efficiently and accurately-the target network resources, including web pages and links. The core and key point of this problem is how to calculate the topic correlation of the target network resources, including evaluating the topic correlation of the web page and predicting the topic correlation of the link. The existing topic correlation algorithms basically calculate the topic correlation at the character level, and the ability to deal with concepts or semantics is relatively insufficient. The result is that the judgment of topic correlation is not accurate, resulting in low accuracy of obtaining topic information. Because of the excellent semantic expression ability of ontology, the ontology tool is introduced, and the web page is conceptualized by using ontology to express the topic and conceptualize the web page. on the basis of comparing and analyzing the classical topic correlation algorithms, the topic correlation algorithm with higher accuracy and efficiency is finally selected, including the web page topic correlation evaluation algorithm and the link topic correlation prediction algorithm. Furthermore, a topic crawler system based on ontology, which has higher efficiency and semantic processing ability, is designed and implemented. Finally, the effectiveness of the algorithm is verified by experiments. On the basis of summing up and reviewing the relevant literature, aiming at the problem of low accuracy and efficiency in obtaining subject information, the harvest rate and time efficiency are compared with the appropriate topic correlation algorithm to solve the problem. In order to improve the accuracy of topic information acquisition, by comparing KNN classification algorithm, concept space vector model CSVM algorithm and ontology-based topic correlation evaluation algorithm, the ontology-based topic correlation evaluation algorithm is selected, and the concept in web page is mapped to ontology to calculate the topic correlation degree of web page. In order to improve the efficiency of topic information acquisition, by comparing the topic-sensitive PageRank algorithm, the linked text content-based algorithm and the ontology-based link topic correlation prediction algorithm, the ontology-based link topic correlation prediction algorithm is selected. The algorithm combines Q learning and naive Bayesian classifiers to predict the long-term value of the link, and selects the link to be grasped by comparing the long-term value of the link. Among them, the Q learner obtains feedback through the web topic correlation value calculated by ontology-based web topic correlation evaluation algorithm. On the basis of the selected algorithm, the ontology-based topic crawler system is designed by using this algorithm. By constructing the small apple ontology, the running flow of the subject crawler system is described in detail by taking the apple theme as an example. Finally, the system is realized and compared with the crawler guided by the width first algorithm and the crawler guided by the Best-First algorithm. The experimental results show that the crawler guided by the width first algorithm and the crawler guided by the width first algorithm are compared with the crawler guided by the width first algorithm. The topic crawler guided by ontology-based topic correlation algorithm has higher harvest rate and greater potential when grasping topic-related network resources.
【學(xué)位授予單位】:中國農(nóng)業(yè)科學(xué)院
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2013
【分類號(hào)】:TP391.3
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 汪濤,樊孝忠,顧益軍,劉林;基于概念分析的主題爬蟲設(shè)計(jì)[J];北京理工大學(xué)學(xué)報(bào);2004年10期
2 張超;閆宏印;;多線程網(wǎng)絡(luò)爬蟲的設(shè)計(jì)與實(shí)現(xiàn)[J];電腦開發(fā)與應(yīng)用;2012年06期
3 胡金濱,唐旭清;人工神經(jīng)網(wǎng)絡(luò)的BP算法及其應(yīng)用[J];信息技術(shù);2004年04期
4 余靜;劉萬軍;;基于網(wǎng)頁分塊的主題爬蟲研究[J];計(jì)算機(jī)與信息技術(shù);2008年10期
5 朱大奇;人工神經(jīng)網(wǎng)絡(luò)研究現(xiàn)狀及其展望[J];江南大學(xué)學(xué)報(bào);2004年01期
6 張劍;李春平;;基于WordNet概念向量空間模型的文本分類[J];計(jì)算機(jī)工程與應(yīng)用;2006年04期
7 張宇翔;知識(shí)工程中的本體綜述[J];計(jì)算機(jī)工程;2005年S1期
8 盧鵬,孫明勇,陸汝占;基于知網(wǎng)的詞匯語義自動(dòng)分類系統(tǒng)[J];計(jì)算機(jī)仿真;2004年02期
9 劉朋;林泓;高德威;;基于內(nèi)容和鏈接分析的主題爬蟲策略[J];計(jì)算機(jī)與數(shù)字工程;2009年01期
10 曹浪財(cái),羅鍵,李天成;智能螞蟻算法——蟻群算法的改進(jìn)[J];計(jì)算機(jī)應(yīng)用研究;2003年10期
,本文編號(hào):2508606
本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2508606.html