基于相似性博客推薦技術(shù)的研究與應用
發(fā)布時間:2018-11-11 15:02
【摘要】:隨著Web2.0的應用,博客的傳播速度得到了前所未有的發(fā)展,使其擁有巨大的信息資源。在數(shù)目如此龐大的博客系統(tǒng)中,用戶想要找到自己最感興趣的博客或博文,,同時博主也想使自己的博客得到更高的訪問量,就顯得非常困難。博客搜索引擎的問世在一定程度上解決了這個問題,但是由于技術(shù)上以及對用戶要求上的原因,不能真正滿足用戶的需求。 本文研究了目前常用的推薦算法,并對博主的社會信息和博文信息進行分析,基于現(xiàn)有的技術(shù)設計了一種基于相似性的博客推薦算法,從博客的博文和博主的社會信息兩方面的相似性來計算研究博客的相似度。本文在算法設計之前先介紹了博客的博文相似性和博主社會信息相似性的概念,并闡述了采用相似性方法的優(yōu)點。構(gòu)造了博主社會信息相似度和博文信息相似度的計算公式,并把二者進行了綜合來計算總的相似度,對相似性權(quán)重值的確定采用線性結(jié)合法,并結(jié)合參考文獻的內(nèi)容確定其大小。實驗部分采用開源爬蟲工具(Heritrix)從新浪網(wǎng)上抓取相關(guān)的博客作為實驗性數(shù)據(jù),并對抓取回來的數(shù)據(jù)進行處理,然后將相關(guān)的數(shù)據(jù)信息存儲到數(shù)據(jù)庫中。 對于改進的算法通過兩種評價標準進行評估:一種是和文本算法對比準確率,這種方法適合于計算機進行自動測評;另一種是通過人工參與的方法,對推薦的博客與目標博客相似與否進行判定。通過對實驗結(jié)果進行對比與分析,證明了改進算法的有效性,為博客推薦提供了技術(shù)支持。
[Abstract]:With the application of Web2.0, the spreading speed of blog has been developed unprecedented, which makes it have huge information resources. In such a large number of blog systems, it is very difficult for users to find the blog or blog they are most interested in, and for bloggers to get more visitors to their blogs. The emergence of blog search engine solves this problem to some extent, but because of the technical and user requirements, it can not really meet the needs of users. In this paper, the commonly used recommendation algorithms are studied, and the social information and blog information of bloggers are analyzed. A blog recommendation algorithm based on similarity is designed based on existing technologies. The similarity of blog is calculated from the similarity of blog posts and social information of bloggers. Before the algorithm is designed, this paper introduces the concepts of blog similarity and social information similarity of bloggers, and expounds the advantages of using similarity method. In this paper, the formulas for calculating the similarity of social information and information of blog posts are constructed, and the total similarity is calculated by synthesizing them. The method of linear combination is used to determine the similarity weight. And combined with the content of reference to determine its size. In the experiment part, the open source crawler tool (Heritrix) is used to capture the relevant blog data from Sina.com as experimental data, and then the relevant data information is stored in the database. The improved algorithm is evaluated by two evaluation criteria: one is to compare the accuracy with the text algorithm, this method is suitable for computer automatic evaluation; The other is to judge whether the recommended blog is similar to the target blog by the method of artificial participation. The comparison and analysis of the experimental results prove the effectiveness of the improved algorithm and provide technical support for blog recommendation.
【學位授予單位】:內(nèi)蒙古科技大學
【學位級別】:碩士
【學位授予年份】:2012
【分類號】:TP391.3
[Abstract]:With the application of Web2.0, the spreading speed of blog has been developed unprecedented, which makes it have huge information resources. In such a large number of blog systems, it is very difficult for users to find the blog or blog they are most interested in, and for bloggers to get more visitors to their blogs. The emergence of blog search engine solves this problem to some extent, but because of the technical and user requirements, it can not really meet the needs of users. In this paper, the commonly used recommendation algorithms are studied, and the social information and blog information of bloggers are analyzed. A blog recommendation algorithm based on similarity is designed based on existing technologies. The similarity of blog is calculated from the similarity of blog posts and social information of bloggers. Before the algorithm is designed, this paper introduces the concepts of blog similarity and social information similarity of bloggers, and expounds the advantages of using similarity method. In this paper, the formulas for calculating the similarity of social information and information of blog posts are constructed, and the total similarity is calculated by synthesizing them. The method of linear combination is used to determine the similarity weight. And combined with the content of reference to determine its size. In the experiment part, the open source crawler tool (Heritrix) is used to capture the relevant blog data from Sina.com as experimental data, and then the relevant data information is stored in the database. The improved algorithm is evaluated by two evaluation criteria: one is to compare the accuracy with the text algorithm, this method is suitable for computer automatic evaluation; The other is to judge whether the recommended blog is similar to the target blog by the method of artificial participation. The comparison and analysis of the experimental results prove the effectiveness of the improved algorithm and provide technical support for blog recommendation.
【學位授予單位】:內(nèi)蒙古科技大學
【學位級別】:碩士
【學位授予年份】:2012
【分類號】:TP391.3
【參考文獻】
相關(guān)期刊論文 前10條
1 楊丹;曹俊;;基于Web2.0的社會性標簽推薦系統(tǒng)[J];重慶工學院學報(自然科學版);2008年07期
2 唐遠洋,黃爾嘉;知識挖掘技術(shù)與網(wǎng)絡教育資源的組織[J];電化教育研究;2003年06期
3 陳春明;徐義峰;;協(xié)同過濾算法中一種改進的相似性計算方法[J];桂林電子科技大學學報;2009年03期
4 韓家煒,孟小峰,王靜,李盛恩;Web挖掘研究[J];計算機研究與發(fā)展;2001年04期
5 李曉明,朱家稷,閆宏飛;互聯(lián)網(wǎng)上主題信息的一種收集與處理模型及其應用[J];計算機研究與發(fā)展;2003年12期
6 李峰;李軍懷;王瑞林;張t
本文編號:2325208
本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2325208.html
最近更新
教材專著