基于本體的用戶興趣挖掘系統(tǒng)的研究與實現(xiàn)

發(fā)布時間：2018-05-16 05:40

本文選題：興趣挖掘 + 本體　；參考：《華南理工大學(xué)》2013年碩士論文

【摘要】：隨著人類社會商業(yè)模式的不斷演變，各商業(yè)公司都希望有一種既符合自己業(yè)務(wù)特點，又能夠為不同用戶提供個性化服務(wù)的經(jīng)營策略，而這種個性化服務(wù)的關(guān)鍵便是用戶的興趣模型。傳統(tǒng)的數(shù)據(jù)挖掘技術(shù)，根據(jù)用戶與物品之間的關(guān)聯(lián)關(guān)系，捕獲用戶的興趣特征，為用戶推薦其可能感興趣的商品，進而引發(fā)用戶的購買行為。對于一些非電子商務(wù)類的企業(yè)，例如搜索引擎廠商、網(wǎng)絡(luò)服務(wù)提供商等，由于其并沒有用戶直接的購買記錄，所以使用常見的數(shù)據(jù)挖掘技術(shù)很難建立起有效的用戶興趣模型。但是這類企業(yè)往往擁有另外一種寶貴的資源——用戶的瀏覽記錄。本文所探討的用戶興趣挖掘系統(tǒng)正是針對用戶瀏覽記錄中的URL數(shù)據(jù)，以興趣本體為基礎(chǔ)，，提出了一種新穎的用戶興趣建模流程，并以真實的用戶數(shù)據(jù)作為實驗對象，證明了本系統(tǒng)的可行性和實用性。本文主要的研究工作如下： 1.一套完整有效的面向興趣本體概念的訓(xùn)練方法。從預(yù)先建立的興趣參照本體中獲得興趣關(guān)鍵字，使用這些關(guān)鍵字針對特定的搜索引擎構(gòu)造其搜索URL，系統(tǒng)將抓取搜索引擎的返回結(jié)果作為本體概念的訓(xùn)練文檔集。結(jié)合基于XPath的網(wǎng)頁信息提取技術(shù)和改進的基于行塊長度函數(shù)的網(wǎng)頁正文抽取算法，提煉訓(xùn)練文檔的核心內(nèi)容。最后使用Lucene為文檔集構(gòu)建便于快速檢索的倒排索引，高效而準(zhǔn)確地計算出興趣本體中每一個概念的TF-IDF特征向量。 2.一種結(jié)合了用戶瀏覽行為的興趣建模方法。用戶興趣模型本質(zhì)上是興趣參照本體的一個帶評分的實例，本文提出了一種集成了用戶瀏覽模式的擴散激活算法來初始化和更新用戶的興趣評分。該方法充分考慮了本體概念之間的關(guān)聯(lián)關(guān)系，不但能準(zhǔn)確捕捉到用戶明確表現(xiàn)出來的興趣，而且還能在一定程度上發(fā)現(xiàn)用戶的潛在興趣。此外，本文方法很好地克服了一般興趣挖掘算法所面臨的冷啟動問題。
[Abstract]:With the continuous evolution of the business model of human society, each commercial company hopes to have a business strategy that not only conforms to its own business characteristics, but also can provide individual services for different users. The key to this personalized service is the user's interest model. Traditional data mining technology, according to the relationship between the user and the goods, captures the interest characteristics of users, recommends the products they may be interested in, and then leads to the purchase behavior of users. For some non-e-commerce enterprises, such as search engine manufacturers, network service providers and so on, it is difficult to establish an effective user interest model by using common data mining techniques because they do not have direct purchase records. But such businesses often have another valuable resource-users'browsing records. The user interest mining system discussed in this paper is aimed at the URL data in the user browsing record. Based on the interest ontology, a novel modeling process of user interest is proposed, and the real user data is taken as the experimental object. The feasibility and practicability of the system are proved. The main research work of this paper is as follows: 1. A complete and effective training method for the concept of interest ontology. Interest keywords are obtained from pre-established interest reference ontology, and these keywords are used to construct their search URLLs for specific search engines. The system grabs the returned results of search engines as a set of training documents for ontology concepts. Combined with the technology of web page information extraction based on XPath and the improved algorithm of page text extraction based on line block length function, the core content of training document is extracted. Finally, Lucene is used to construct the inverted index for the document set, which is convenient for fast retrieval. The TF-IDF feature vectors of each concept in the ontology of interest are calculated efficiently and accurately. 2. An interest modeling method that combines user browsing behavior. User interest model is essentially an example of interest reference ontology. This paper proposes a diffusion activation algorithm which integrates user browsing mode to initialize and update user interest score. This method fully considers the relationship between ontology concepts, not only can accurately capture the clearly expressed interest of the user, but also can discover the potential interest of the user to a certain extent. In addition, this method can overcome the cold start problem of general interest mining algorithm.
【學(xué)位授予單位】：華南理工大學(xué)
【學(xué)位級別】：碩士
【學(xué)位授予年份】：2013
【分類號】：TP311.13;TP391.3

【參考文獻】

相關(guān)期刊論文前10條

1 應(yīng)曉敏,劉明,竇文華;一種面向個性化服務(wù)的無需反例集的用戶建模方法[J];國防科技大學(xué)學(xué)報;2002年03期

2 張付志;李偉靜;朱彩云;;基于領(lǐng)域本體的跨系統(tǒng)個性化服務(wù)用戶模型[J];計算機工程;2009年13期

3 郭巖,白碩,楊志峰,張凱;網(wǎng)絡(luò)日志規(guī)模分析和用戶興趣挖掘[J];計算機學(xué)報;2005年09期

4 施聰鶯;徐朝軍;楊曉江;;TFIDF算法研究綜述[J];計算機應(yīng)用;2009年S1期

5 詹恒飛;楊岳湘;方宏;;Nutch分布式網(wǎng)絡(luò)爬蟲研究與優(yōu)化[J];計算機科學(xué)與探索;2011年01期

6 許波;張結(jié)魁;周軍;;基于行為分析的用戶興趣建模[J];情報雜志;2009年06期

7 賴慶梅;;新經(jīng)濟時代下的個性化服務(wù)營銷策略[J];商場現(xiàn)代化;2007年05期

8 管建和;甘劍峰;;基于Lucene全文檢索引擎的應(yīng)用研究與實現(xiàn)[J];計算機工程與設(shè)計;2007年02期

9 李建廷;郭曄;湯志軍;;基于用戶瀏覽行為分析的用戶興趣度計算[J];計算機工程與設(shè)計;2012年03期

10 史艷梅;個性化服務(wù)中挖掘用戶興趣的CMPS[J];現(xiàn)代圖書情報技術(shù);2005年03期

本文編號：1895676

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/sousuoyinqinglunwen/1895676.html

上一篇：微博內(nèi)作弊和推廣聯(lián)盟的檢測算法研究
下一篇：基于多模態(tài)特征的新聞視頻語義分析

論文發(fā)表

·知網(wǎng)|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于本體的用戶興趣挖掘系統(tǒng)的研究與實現(xiàn)