基于微博的用戶興趣分析與個性化信息推薦
發(fā)布時間:2018-11-07 14:29
【摘要】:在過去的十幾年中,互聯(lián)網(wǎng)上的信息量迅速增加,人們從信息匱乏時代進(jìn)入了信息過載時代。隨之而來的是人們獲取信息的方式的轉(zhuǎn)變,從傳統(tǒng)的人工尋找,到搜索引擎,再到現(xiàn)在的推薦系統(tǒng)。如何有效地給用戶推薦有用的信息,最重要的一個環(huán)節(jié)就是如何有效地獲取用戶興趣。微博等社交網(wǎng)絡(luò)的出現(xiàn)給我們提供了一個新的分析用戶興趣的巨大的數(shù)據(jù)源,成為近幾年研究的熱點(diǎn)。 本文對如何使用微博數(shù)據(jù)分析用戶興趣,以及進(jìn)行個性化推薦的方法進(jìn)行了分析和探索。與現(xiàn)有的工作相比,本文主要有以下幾點(diǎn)不同。首先,考慮到每條微博內(nèi)容都比較短的特點(diǎn),我們并沒有直接在微博數(shù)據(jù)上使用主題模型,而是使用外部知識庫構(gòu)建主題模型,用以對微博內(nèi)容進(jìn)行語義豐富,同時也避免了在微博數(shù)據(jù)上主題數(shù)目不容易確定的問題。其次,我們認(rèn)為并不是所有微博都是與用戶興趣相關(guān)的,也就是所謂的噪音微博,,而這些噪音微博會對模型效果造成影響。因此,我們從多個方面分析了用以識別噪音微博的特征,構(gòu)建了一個聯(lián)合分類器過濾掉噪音微博。最后,我們認(rèn)為用戶興趣是會隨時間變化的,提出了時間加權(quán)的主題分布來描述用戶興趣。在實驗中,我們把我們的算法同非負(fù)矩陣分解算法和直接在微博數(shù)據(jù)上使用主題模型的算法比較。實驗結(jié)果表明,本文的算法能夠更有效地發(fā)現(xiàn)用戶的實時興趣。而且,在用戶微博數(shù)量比較少或者噪音微博比較多的情況下,依然可以有效地分析出用戶興趣。
[Abstract]:In the past ten years, the amount of information on the Internet has increased rapidly, and people have moved from the era of information scarcity to the era of information overload. What follows is the change in the way people obtain information, from traditional manual search to search engine, and then to the present recommendation system. How to effectively recommend useful information to users, the most important link is how to effectively obtain user interest. The emergence of social networks such as Weibo has provided us with a new huge data source for analyzing users' interests, and has become a hot research topic in recent years. This paper analyzes and explores how to use Weibo data to analyze user interest and to carry out personalized recommendation. Compared with the existing work, this paper has the following main differences. First of all, considering that each Weibo content is relatively short, we do not directly use the topic model on Weibo data, but use an external knowledge base to build a topic model, which is used to enrich the semantic content of Weibo. At the same time, it avoids the problem that the number of topics on Weibo's data is not easy to determine. Secondly, we think that not all Weibo is related to user interest, the so-called noise Weibo, which will affect the effect of the model. Therefore, we analyze the features of noise Weibo from several aspects, and construct a combined classifier to filter out the noise Weibo. Finally, we propose a time-weighted topic distribution to describe user interest. In the experiment, our algorithm is compared with the non-negative matrix decomposition algorithm and the algorithm which uses the topic model directly on Weibo data. Experimental results show that the proposed algorithm can more effectively detect the real-time interest of users. Moreover, when the number of users Weibo is relatively small or the noise Weibo is more, user interest can still be effectively analyzed.
【學(xué)位授予單位】:上海交通大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2013
【分類號】:TP393.092;TP391.3
本文編號:2316647
[Abstract]:In the past ten years, the amount of information on the Internet has increased rapidly, and people have moved from the era of information scarcity to the era of information overload. What follows is the change in the way people obtain information, from traditional manual search to search engine, and then to the present recommendation system. How to effectively recommend useful information to users, the most important link is how to effectively obtain user interest. The emergence of social networks such as Weibo has provided us with a new huge data source for analyzing users' interests, and has become a hot research topic in recent years. This paper analyzes and explores how to use Weibo data to analyze user interest and to carry out personalized recommendation. Compared with the existing work, this paper has the following main differences. First of all, considering that each Weibo content is relatively short, we do not directly use the topic model on Weibo data, but use an external knowledge base to build a topic model, which is used to enrich the semantic content of Weibo. At the same time, it avoids the problem that the number of topics on Weibo's data is not easy to determine. Secondly, we think that not all Weibo is related to user interest, the so-called noise Weibo, which will affect the effect of the model. Therefore, we analyze the features of noise Weibo from several aspects, and construct a combined classifier to filter out the noise Weibo. Finally, we propose a time-weighted topic distribution to describe user interest. In the experiment, our algorithm is compared with the non-negative matrix decomposition algorithm and the algorithm which uses the topic model directly on Weibo data. Experimental results show that the proposed algorithm can more effectively detect the real-time interest of users. Moreover, when the number of users Weibo is relatively small or the noise Weibo is more, user interest can still be effectively analyzed.
【學(xué)位授予單位】:上海交通大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2013
【分類號】:TP393.092;TP391.3
【參考文獻(xiàn)】
相關(guān)期刊論文 前1條
1 劉維湘;鄭南寧;游屈波;;非負(fù)矩陣分解及其在模式識別中的應(yīng)用[J];科學(xué)通報;2006年03期
本文編號:2316647
本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2316647.html
最近更新
教材專著