基于知識庫與文本分類算法的用戶興趣點挖掘研究

發(fā)布時間：2018-01-21 07:57

本文關(guān)鍵詞： 知識庫關(guān)鍵詞分類 URL分類用戶興趣投射　出處：《天津師范大學(xué)》2013年碩士論文　論文類型：學(xué)位論文

【摘要】：近年來,隨著互聯(lián)網(wǎng)的飛速發(fā)展,人們可以通過網(wǎng)絡(luò)檢索自己所需要的信息。目前搜索引擎已經(jīng)成為重要的檢索工具,但由于檢索結(jié)果沒有針對不同的用戶做相應(yīng)的處理,使得不同用戶所獲得的信息是相同的,從而忽略了用戶的興趣愛好,并不能滿足用戶真正的個性化需求。面對海量的網(wǎng)絡(luò)信息,如何挖掘出用戶的興趣點,為用戶提供個性化服務(wù)已經(jīng)成為當(dāng)前研究的重要課題。用戶興趣點的挖掘是從用戶的瀏覽歷史記錄中挖掘出用戶的興趣點,其結(jié)果直接反映了個性化服務(wù)的準(zhǔn)確性和有效性,本文即立足于用戶興趣點的挖掘開展了相關(guān)研究。本文對相關(guān)的用戶興趣點挖掘算法進(jìn)行了詳細(xì)的分析和對比,針對現(xiàn)有用戶興趣點挖掘算法的局限性,提出了基于知識庫與文本分類算法來挖掘用戶的興趣點的基本思想。本文在英文語料研究下進(jìn)行的,首先利用Lucene建立基于Wikipedia的知識庫,然后對用戶輸入的關(guān)鍵詞、用戶輸入的URL進(jìn)行分類,最后進(jìn)行用戶興趣的投射。其中對于關(guān)鍵詞分類,提出了基于共現(xiàn)詞和WordNet擴(kuò)展相結(jié)合的分類方法；對于URL分類,提出了基于塊的網(wǎng)頁正文提取法、基于DFSD的特征提取法；對于用戶興趣投射,提出了基于上下文環(huán)境的投射法,將用戶候選興趣點映射為一個興趣點,從而挖掘出用戶真正的興趣點；最后通過對比實驗體現(xiàn)了算法的高效性和準(zhǔn)確性。
[Abstract]:In recent years, with the rapid development of the Internet, people can retrieve the information they need through the Internet. At present, search engine has become an important retrieval tool. However, because the retrieval results do not deal with different users, the information obtained by different users is the same, thus ignoring the interests of users. Facing the huge amount of network information, how to dig out the user's interest point and provide the personalized service for the user has become an important topic in the current research. User interest point mining is to mine user interest points from the user's browsing history records. The results directly reflect the accuracy and effectiveness of personalized services. In this paper, based on the mining of user interest points, the relevant research has been carried out. This paper makes a detailed analysis and comparison of the relevant user point of interest mining algorithm, aiming at the limitations of the existing user point of interest mining algorithm. This paper presents the basic idea of mining users' points of interest based on knowledge base and text classification algorithm. Firstly, the knowledge base based on Wikipedia is built by using Lucene, and then the keywords entered by users and the URL input by users are classified. Finally, the projection of user interest is carried out. For keyword classification, a classification method based on co-occurrence word and WordNet extension is proposed. For URL classification, a block based text extraction method and a DFSD based feature extraction method are proposed. For user interest projection, a context-based projection method is proposed to map user candidate interest points to a point of interest, thus mining out the real interest points of users. Finally, the high efficiency and accuracy of the algorithm are demonstrated through comparative experiments.
【學(xué)位授予單位】：天津師范大學(xué)
【學(xué)位級別】：碩士
【學(xué)位授予年份】：2013
【分類號】：TP391.1;TP311.13

【參考文獻(xiàn)】

相關(guān)期刊論文前7條

1 張海粟;馬大明;鄧智龍;;基于維基百科的語義知識庫及其構(gòu)建方法研究[J];計算機應(yīng)用研究;2011年08期

2 薛偉蓮;王蘊慧;;一種基于對話的電子商務(wù)推薦系統(tǒng)[J];遼寧師范大學(xué)學(xué)報(自然科學(xué)版);2011年02期

3 李霞;蔣盛益;;基于DOM樹及行文本統(tǒng)計去噪的網(wǎng)頁文本抽取技術(shù)[J];山東大學(xué)學(xué)報(理學(xué)版);2012年03期

4 陸曉曦;;ODP分類體系初探[J];山東圖書館學(xué)刊;2009年01期

5 任翔;劉彬;;基于超鏈接分析的網(wǎng)頁正文提取方法[J];泰山學(xué)院學(xué)報;2010年03期

6 范云杰;劉懷亮;;基于維基百科的中文短文本分類研究[J];現(xiàn)代圖書情報技術(shù);2012年03期

7 馬宏偉;張光衛(wèi);李鵬;;協(xié)同過濾推薦算法綜述[J];小型微型計算機系統(tǒng);2009年07期

，

本文編號：1450916

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/sousuoyinqinglunwen/1450916.html

上一篇：查找并獲取英文全文方法探討
下一篇：面向深層網(wǎng)絡(luò)的查詢規(guī)劃策略的研究

論文發(fā)表

·知網(wǎng)|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于知識庫與文本分類算法的用戶興趣點挖掘研究