基于本體概念相似度的主題爬蟲(chóng)中網(wǎng)頁(yè)排序模型研究
[Abstract]:Compared with the general search engine, the subject search engine focused on a specific field can bring higher precision information collection and better information retrieval service for users. As the core module of subject search engine, it is very important to improve the relevance of subject crawler. However, due to the large scale and highly dynamic growth of network resources, there will still be a large number of irrelevant web page information, which leads to a decline in the efficiency of collection. In order to solve this problem, this paper analyzes and summarizes the advantages and disadvantages of the current web page sorting algorithm, and combines the characteristics of the salt lake field by studying the correlation analysis technology in the subject crawler design, mainly the research of the web page sorting algorithm. Taking advantage of ontology in expressing semantics, a new web page sorting algorithm based on ontology concept similarity is proposed to improve the accuracy of topic correlation calculation. The method first selects the appropriate web page as the initial collar seed set, then obtains the ontology concept set by constructing the salt lake domain ontology, and classifies the concept set and gives the weight to the concept set. The concept similarity calculation method is used to calculate the similarity between all the concepts in the web page and the concepts in the ontology concept set. According to the comprehensive score, the web pages with high scores are sorted, and the high score pages are stored in the subject crawler to prepare for the future collection of web pages. Finally, the experimental results show that the algorithm not only reduces the irrelevant results, but also improves the retrieval accuracy.
【學(xué)位授予單位】:北京信息科技大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2013
【分類(lèi)號(hào)】:TP391.1
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 張文秀;朱慶華;;領(lǐng)域本體的構(gòu)建方法研究[J];圖書(shū)與情報(bào);2011年01期
2 朱禮軍,陶蘭,劉慧;領(lǐng)域本體中的概念相似度計(jì)算[J];華南理工大學(xué)學(xué)報(bào)(自然科學(xué)版);2004年S1期
3 馬培華;;科學(xué)開(kāi)發(fā)我國(guó)的鹽湖資源[J];化學(xué)進(jìn)展;2009年11期
4 劉玉婷;馬志明;;網(wǎng)頁(yè)排序中的隨機(jī)模型及算法[J];中國(guó)科學(xué):數(shù)學(xué);2011年12期
5 孫德才;孫星明;張偉;劉玉玲;;基于匹配區(qū)域特征的相似字符串匹配過(guò)濾算法[J];計(jì)算機(jī)研究與發(fā)展;2010年04期
6 李榮;楊冬;劉磊;;基于本體的概念相似度計(jì)算方法研究[J];計(jì)算機(jī)研究與發(fā)展;2011年S3期
7 蔡國(guó)民;王雅琳;;搜索引擎的相關(guān)排序算法分析與優(yōu)化[J];吉首大學(xué)學(xué)報(bào)(自然科學(xué)版);2006年05期
8 李學(xué)勇,歐陽(yáng)柳波,李國(guó)徽,鐘敏娟;網(wǎng)絡(luò)蜘蛛搜索策略比較研究[J];計(jì)算機(jī)工程與應(yīng)用;2004年04期
9 陳杰;蔣祖華;;領(lǐng)域本體的概念相似度計(jì)算[J];計(jì)算機(jī)工程與應(yīng)用;2006年33期
10 劉文劍;郭寧;金天國(guó);;制造資源本體的相似度計(jì)算模型[J];計(jì)算機(jī)集成制造系統(tǒng);2010年11期
相關(guān)博士學(xué)位論文 前1條
1 蔡盈芳;基于本體的航空產(chǎn)品知識(shí)庫(kù)構(gòu)建研究[D];北京交通大學(xué);2011年
本文編號(hào):2251648
本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2251648.html