天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁 > 科技論文 > 搜索引擎論文 >

基于知識庫與文本分類算法的用戶興趣點挖掘研究

發(fā)布時間:2018-01-21 07:57

  本文關(guān)鍵詞: 知識庫 關(guān)鍵詞分類 URL分類 用戶興趣投射 出處:《天津師范大學(xué)》2013年碩士論文 論文類型:學(xué)位論文


【摘要】:近年來,隨著互聯(lián)網(wǎng)的飛速發(fā)展,人們可以通過網(wǎng)絡(luò)檢索自己所需要的信息。目前搜索引擎已經(jīng)成為重要的檢索工具,但由于檢索結(jié)果沒有針對不同的用戶做相應(yīng)的處理,使得不同用戶所獲得的信息是相同的,從而忽略了用戶的興趣愛好,并不能滿足用戶真正的個性化需求。面對海量的網(wǎng)絡(luò)信息,如何挖掘出用戶的興趣點,為用戶提供個性化服務(wù)已經(jīng)成為當(dāng)前研究的重要課題。 用戶興趣點的挖掘是從用戶的瀏覽歷史記錄中挖掘出用戶的興趣點,其結(jié)果直接反映了個性化服務(wù)的準(zhǔn)確性和有效性,本文即立足于用戶興趣點的挖掘開展了相關(guān)研究。 本文對相關(guān)的用戶興趣點挖掘算法進(jìn)行了詳細(xì)的分析和對比,針對現(xiàn)有用戶興趣點挖掘算法的局限性,提出了基于知識庫與文本分類算法來挖掘用戶的興趣點的基本思想。本文在英文語料研究下進(jìn)行的,首先利用Lucene建立基于Wikipedia的知識庫,然后對用戶輸入的關(guān)鍵詞、用戶輸入的URL進(jìn)行分類,最后進(jìn)行用戶興趣的投射。其中對于關(guān)鍵詞分類,提出了基于共現(xiàn)詞和WordNet擴(kuò)展相結(jié)合的分類方法;對于URL分類,提出了基于塊的網(wǎng)頁正文提取法、基于DFSD的特征提取法;對于用戶興趣投射,提出了基于上下文環(huán)境的投射法,將用戶候選興趣點映射為一個興趣點,從而挖掘出用戶真正的興趣點;最后通過對比實驗體現(xiàn)了算法的高效性和準(zhǔn)確性。
[Abstract]:In recent years, with the rapid development of the Internet, people can retrieve the information they need through the Internet. At present, search engine has become an important retrieval tool. However, because the retrieval results do not deal with different users, the information obtained by different users is the same, thus ignoring the interests of users. Facing the huge amount of network information, how to dig out the user's interest point and provide the personalized service for the user has become an important topic in the current research. User interest point mining is to mine user interest points from the user's browsing history records. The results directly reflect the accuracy and effectiveness of personalized services. In this paper, based on the mining of user interest points, the relevant research has been carried out. This paper makes a detailed analysis and comparison of the relevant user point of interest mining algorithm, aiming at the limitations of the existing user point of interest mining algorithm. This paper presents the basic idea of mining users' points of interest based on knowledge base and text classification algorithm. Firstly, the knowledge base based on Wikipedia is built by using Lucene, and then the keywords entered by users and the URL input by users are classified. Finally, the projection of user interest is carried out. For keyword classification, a classification method based on co-occurrence word and WordNet extension is proposed. For URL classification, a block based text extraction method and a DFSD based feature extraction method are proposed. For user interest projection, a context-based projection method is proposed to map user candidate interest points to a point of interest, thus mining out the real interest points of users. Finally, the high efficiency and accuracy of the algorithm are demonstrated through comparative experiments.
【學(xué)位授予單位】:天津師范大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2013
【分類號】:TP391.1;TP311.13

【參考文獻(xiàn)】

相關(guān)期刊論文 前7條

1 張海粟;馬大明;鄧智龍;;基于維基百科的語義知識庫及其構(gòu)建方法研究[J];計算機應(yīng)用研究;2011年08期

2 薛偉蓮;王蘊慧;;一種基于對話的電子商務(wù)推薦系統(tǒng)[J];遼寧師范大學(xué)學(xué)報(自然科學(xué)版);2011年02期

3 李霞;蔣盛益;;基于DOM樹及行文本統(tǒng)計去噪的網(wǎng)頁文本抽取技術(shù)[J];山東大學(xué)學(xué)報(理學(xué)版);2012年03期

4 陸曉曦;;ODP分類體系初探[J];山東圖書館學(xué)刊;2009年01期

5 任翔;劉彬;;基于超鏈接分析的網(wǎng)頁正文提取方法[J];泰山學(xué)院學(xué)報;2010年03期

6 范云杰;劉懷亮;;基于維基百科的中文短文本分類研究[J];現(xiàn)代圖書情報技術(shù);2012年03期

7 馬宏偉;張光衛(wèi);李鵬;;協(xié)同過濾推薦算法綜述[J];小型微型計算機系統(tǒng);2009年07期

,

本文編號:1450916

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/1450916.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶9d0ce***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com