天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁 > 科技論文 > 搜索引擎論文 >

基于領(lǐng)域本體的中文財經(jīng)Blog搜索引擎的設(shè)計與實(shí)現(xiàn)

發(fā)布時間:2018-11-02 16:14
【摘要】:隨著博客(Blog)的迅猛發(fā)展,Blog網(wǎng)頁數(shù)量成幾何級數(shù)增長,如何在海量的Blog頁面中找到自己感興趣的Blog網(wǎng)頁顯得尤為重要。于是針對Blog頁面的專業(yè)搜索引擎(Blog搜索引擎)誕生了。本文主要就是針對基于本體的財經(jīng)Blog搜索引擎展開研究。 經(jīng)過研究發(fā)現(xiàn),Blog搜索引擎存在一些不足之處可以歸納到三個方面:一是Blog網(wǎng)頁相似度計算,不能支持文檔級別查詢。其原因是現(xiàn)有的Blog搜索引擎沒有有效的計算Blog網(wǎng)頁相似的方法;二是搜索結(jié)果不能滿足用戶的查詢意圖,其原因是相似是否為語義相似或者相似值不準(zhǔn)確;三是檢索結(jié)果排序方面,如何讓內(nèi)容相關(guān)的結(jié)果排在前,這與檢索結(jié)果的排序算法相關(guān)。 本文針對這些不足之處進(jìn)行了深入研究,并歸納到以下二方面: 1.針對Blog網(wǎng)頁相似度計算方面,本文在現(xiàn)有的Blog網(wǎng)頁相似計算方法的研究基礎(chǔ)上提出了基于本體的財經(jīng)Blog網(wǎng)頁相似計算方法(CSFBO方法)。該方法提出了財經(jīng)關(guān)鍵詞表示Blog網(wǎng)頁信息,把Blog網(wǎng)頁相似計算轉(zhuǎn)化成財經(jīng)關(guān)鍵詞間的相似計算。這樣關(guān)鍵詞提取的好壞尤為重要。在傳統(tǒng)的TF*IDF算法的基礎(chǔ)上,根據(jù)Blog網(wǎng)頁的特點(diǎn)對網(wǎng)頁不同部分賦予不同的權(quán)值,從而改進(jìn)了財經(jīng)關(guān)鍵詞的提取算法,提高了相似計算的精確度。 2.針對Blog搜索結(jié)果排序方面,本文分析了BlogRank算法和B2Rank算法,結(jié)合了財經(jīng)Blog的特點(diǎn),根據(jù)財經(jīng)Blog排序算法的影響因子和對現(xiàn)有的排序算法的不足之處,提出了針對財經(jīng)領(lǐng)域的Blog搜索結(jié)果排序算法(SFBS算法)。 本文構(gòu)建了財經(jīng)領(lǐng)域本體,應(yīng)用了上述改進(jìn)算法,實(shí)現(xiàn)了基于領(lǐng)域本體的財經(jīng)Blog搜索引擎,采集了大量網(wǎng)絡(luò)數(shù)據(jù)進(jìn)行測試,通過對該系統(tǒng)的實(shí)現(xiàn)驗(yàn)證了改進(jìn)算法的有效性,在實(shí)際應(yīng)用中具有較高的實(shí)用價值。
[Abstract]:With the rapid development of blog (Blog), the number of Blog pages has increased in geometric order. How to find the Blog pages of interest in the massive Blog pages is particularly important. So the professional search engine (Blog search engine) for Blog pages was born. This paper mainly focuses on the ontology-based financial Blog search engine. It is found that the Blog search engine has some shortcomings in three aspects: first, the similarity calculation of Blog pages can not support document level query. The reason is that the existing Blog search engine has no effective method to calculate the similarity of Blog pages, the second is that the search results can not meet the query intention of users, the reason is whether the similarity is semantic similarity or the similarity value is inaccurate. Third, how to rank the content related results first, which is related to the sorting algorithm of the retrieval results. This article has carried on the thorough research to these deficiencies, and summed up the following two aspects: 1. On the aspect of Blog web page similarity calculation, this paper proposes an ontology-based Blog web page similarity calculation method (CSFBO method) based on the research of existing Blog web page similarity calculation methods. In this method, the financial keywords represent the information of Blog pages, and the similarity calculation of Blog pages is transformed into the similarity calculation between financial and financial keywords. This keyword extraction is particularly important. Based on the traditional TF*IDF algorithm, different parts of Blog pages are given different weights according to the characteristics of Blog pages, thus the algorithm of extracting financial keywords is improved, and the accuracy of similarity calculation is improved. 2. On the aspect of Blog search result sorting, this paper analyzes the BlogRank algorithm and B2Rank algorithm, combines the characteristics of financial Blog, according to the influence factors of the financial Blog sorting algorithm and the shortcomings of the existing sorting algorithm. This paper presents a Blog search result sorting algorithm (SFBS algorithm) for finance and economics. In this paper, the financial domain ontology is constructed, the improved algorithm is applied, the financial Blog search engine based on domain ontology is implemented, and a large number of network data are collected for testing. The effectiveness of the improved algorithm is verified by the implementation of the system. It has high practical value in practical application.
【學(xué)位授予單位】:江西理工大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2012
【分類號】:TP391.3

【參考文獻(xiàn)】

相關(guān)期刊論文 前3條

1 劉仁寧;李禹生;;領(lǐng)域本體構(gòu)建方法[J];武漢工業(yè)學(xué)院學(xué)報;2008年01期

2 李瑜;郭俊波;虎嵩林;;一種基于發(fā)布訂閱模型的博客搜索系統(tǒng)[J];微電子學(xué)與計算機(jī);2009年09期

3 丁晟春,顧德訪;Jena在實(shí)現(xiàn)基于Ontology的語義檢索中的應(yīng)用研究[J];現(xiàn)代圖書情報技術(shù);2005年10期

相關(guān)碩士學(xué)位論文 前7條

1 盧革超;基于本體的主題搜索引擎技術(shù)研究[D];吉林大學(xué);2011年

2 盧凡;基于領(lǐng)域本體的主題爬蟲系統(tǒng)研究與實(shí)現(xiàn)[D];電子科技大學(xué);2011年

3 艾丹祥;基于本體論的知識檢索研究[D];武漢大學(xué);2004年

4 陳建;領(lǐng)域本體的創(chuàng)建和應(yīng)用研究[D];對外經(jīng)濟(jì)貿(mào)易大學(xué);2006年

5 張志剛;領(lǐng)域本體構(gòu)建方法的研究與應(yīng)用[D];大連海事大學(xué);2008年

6 李峰;基于博客特性和鏈接分析的博客搜索結(jié)果排序算法研究[D];浙江大學(xué);2008年

7 林碧霞;基于領(lǐng)域本體的主題爬蟲研究及實(shí)現(xiàn)[D];西南交通大學(xué);2010年



本文編號:2306298

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2306298.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶c72ac***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com