基于網(wǎng)頁興趣度的用戶興趣模型體系研究
本文關(guān)鍵詞: 用戶興趣模型 興趣度 空間向量模型 時間分段 時間衰減 出處:《復(fù)旦大學(xué)》2012年碩士論文 論文類型:學(xué)位論文
【摘要】:進入Web2.0時代,博客、SNS、微博、輕博客、問答等新型互聯(lián)網(wǎng)應(yīng)用形式不斷涌現(xiàn),互聯(lián)網(wǎng)上的信息量呈現(xiàn)了爆炸式的增長。相比之下,用戶在特定時間感興趣的內(nèi)容相對有限,往往感興趣的內(nèi)容會被淹沒在信息汪洋之中。搜索引擎是目前幫助用戶找尋信息的最常用方法,它主要是通過用戶輸入的關(guān)鍵詞進行字符匹配再配合一些優(yōu)化算法來實現(xiàn)信息篩選。自從亞馬遜的商品推薦服務(wù)推出帶來了巨大成功之后,信息篩選的研究重點被逐漸拓展到信息的智能推送上來。如何從海量數(shù)據(jù)中挖掘出用戶感興趣的內(nèi)容,從而實現(xiàn)智能的個性化推薦服務(wù),逐漸成為了學(xué)術(shù)界和IT業(yè)界研究的熱門課題。 用戶興趣模型是實現(xiàn)內(nèi)容智能推薦的方式之一。它是指對于用戶不同興趣點的數(shù)學(xué)表示模型,通過分析用戶的訪問內(nèi)容和瀏覽行為,提取出內(nèi)容特征和用戶對內(nèi)容的感興趣程度(Interest Rate,簡稱IR),進而建立得到。興趣模型建立之后,將現(xiàn)有內(nèi)容與用戶的興趣模型進行比對,推薦與用戶興趣匹配程度最高的內(nèi)容,實現(xiàn)內(nèi)容的智能推薦。在內(nèi)容特征提取方面,本文采用向量空間模型(Vector Space Model,簡稱VSM)來表征文章。在興趣度評價方面,本文提出了一種綜合時間度量的用戶行為評估算法,使得提取得到的用戶興趣更加貼近真實情況。在用戶模型的更新方面,很多基于VSM的用戶興趣模型研究者忽視了用戶興趣的漂移問題,對用戶不同時間的興趣不加區(qū)分,導(dǎo)致無法快速發(fā)現(xiàn)用戶的興趣變化,使得模型無法準(zhǔn)確反映用戶的最新興趣;同時缺少更新機制,每次興趣模型更新都需要對所有用戶瀏覽記錄進行統(tǒng)計,計算量龐大,數(shù)據(jù)存儲代價高昂,這些都不便于興趣模型的長期實際應(yīng)用。針對這些問題,本文對以往的用戶興趣模型進行了優(yōu)化,引入興趣的時間分段機制和時間衰減機制以提高系統(tǒng)整體性能。 本文基于用戶興趣模型的理論研究建立了一套興趣模型系統(tǒng),采集了來自新浪門戶下世博、曼聯(lián)兩個主題的2000篇文章來形成文章內(nèi)容庫。在系統(tǒng)運行過程中,持續(xù)收集用戶的瀏覽操作、分析瀏覽行為、更新用戶興趣模型,最終根據(jù)興趣模型給用戶推送感興趣的內(nèi)容。經(jīng)過觀察和實驗,系統(tǒng)能很好地體現(xiàn)出用戶興趣的變化,并且具有良好的性能穩(wěn)定性,證明了本文提出的興趣模型體系的正確性和有效性。
[Abstract]:In the age of Web2.0, new forms of Internet applications, such as blog snaps, Weibo, light blogs, questions and answers, are emerging, and the amount of information on the Internet is exploding. In contrast, the content of interest to users at a given time is relatively limited. Often the content of interest will be submerged in the information Wang Yang. Search engine is currently the most common way to help users find information. It mainly uses the key words input by the user to match the characters and some optimized algorithms to filter the information. Since the launch of Amazon's product recommendation service, it has brought great success. The research focus of information screening has been gradually extended to the intelligent push of information. How to mine the contents of interest to users from the massive data, so as to realize the intelligent personalized recommendation service, It has gradually become a hot topic in academia and IT industry. User interest model is one of the ways to realize content intelligent recommendation. It refers to the mathematical representation model for different points of interest of the user, by analyzing the user's access content and browsing behavior. The content features and the degree of interest of the user to the content are extracted, and then the interest model is established. After the interest model is established, the existing content is compared with the user's interest model, and the content with the highest matching degree with the user's interest is recommended. In the aspect of content feature extraction, this paper uses vector space model (VSM) to represent the article. In the aspect of interest evaluation, this paper proposes a new algorithm of user behavior evaluation, which synthesizes time measurement. In the aspect of user model updating, many researchers based on VSM ignore the drift of user interest, and do not distinguish the interest of user at different time. The change of user's interest can not be found quickly, and the model can not accurately reflect the user's latest interest. At the same time, there is a lack of updating mechanism. Every update of interest model requires statistics of all users' browsing records, and the amount of calculation is huge. Data storage is expensive, which is not convenient for long-term practical application of interest model. In view of these problems, this paper optimizes the previous user interest model. In order to improve the overall performance of the system, an interest time segmentation mechanism and a time attenuation mechanism are introduced. Based on the theoretical research of user interest model, this paper establishes a set of interest model system, collects 2000 articles from the World Expo under Sina Portal and Manchester United to form the article content library. The user's browsing operation is continuously collected, the browsing behavior is analyzed, the user's interest model is updated, and the user's interesting content is pushed according to the interest model. Through observation and experiment, the system can well reflect the change of user's interest. And it has good performance stability, which proves the correctness and validity of the proposed interest model system.
【學(xué)位授予單位】:復(fù)旦大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2012
【分類號】:TP391.1
【參考文獻】
相關(guān)期刊論文 前10條
1 李偉超;付永華;;一種改進的基于瀏覽行為的用戶興趣模型[J];電信科學(xué);2011年05期
2 孫鐵利;劉延吉;;中文分詞技術(shù)的研究現(xiàn)狀與困難[J];信息技術(shù);2009年07期
3 李峰;裴軍;游之洋;;基于隱式反饋的自適應(yīng)用戶興趣模型[J];計算機工程與應(yīng)用;2008年09期
4 劉遙峰;王志良;王傳經(jīng);;中文分詞和詞性標(biāo)注模型[J];計算機工程;2010年04期
5 吳泓潤;許斐;李申展;;個性化推薦系統(tǒng)中用戶興趣模型的研究[J];科技信息;2011年19期
6 黃震華;向陽;張波;王棟;劉嘯嶺;;一種進行K-Means聚類的有效方法[J];模式識別與人工智能;2010年04期
7 朱yN;和莉;王小軍;;基于關(guān)聯(lián)反饋技術(shù)的用戶興趣模型的建立與自適應(yīng)更新[J];金陵科技學(xué)院學(xué)報;2011年04期
8 馮書曉,徐新,楊春梅;國內(nèi)中文分詞技術(shù)研究新進展[J];情報雜志;2002年11期
9 張艷;;個性化用戶興趣模型的研究[J];軟件導(dǎo)刊;2011年12期
10 曾春,邢春曉,周立柱;個性化服務(wù)技術(shù)綜述[J];軟件學(xué)報;2002年10期
相關(guān)碩士學(xué)位論文 前1條
1 曹衛(wèi)峰;中文分詞關(guān)鍵技術(shù)研究[D];南京理工大學(xué);2009年
,本文編號:1554813
本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/1554813.html