基于知識(shí)庫(kù)與文本分類算法的用戶興趣點(diǎn)挖掘研究
發(fā)布時(shí)間:2018-01-21 07:57
本文關(guān)鍵詞: 知識(shí)庫(kù) 關(guān)鍵詞分類 URL分類 用戶興趣投射 出處:《天津師范大學(xué)》2013年碩士論文 論文類型:學(xué)位論文
【摘要】:近年來,隨著互聯(lián)網(wǎng)的飛速發(fā)展,人們可以通過網(wǎng)絡(luò)檢索自己所需要的信息。目前搜索引擎已經(jīng)成為重要的檢索工具,但由于檢索結(jié)果沒有針對(duì)不同的用戶做相應(yīng)的處理,使得不同用戶所獲得的信息是相同的,從而忽略了用戶的興趣愛好,并不能滿足用戶真正的個(gè)性化需求。面對(duì)海量的網(wǎng)絡(luò)信息,如何挖掘出用戶的興趣點(diǎn),為用戶提供個(gè)性化服務(wù)已經(jīng)成為當(dāng)前研究的重要課題。 用戶興趣點(diǎn)的挖掘是從用戶的瀏覽歷史記錄中挖掘出用戶的興趣點(diǎn),其結(jié)果直接反映了個(gè)性化服務(wù)的準(zhǔn)確性和有效性,本文即立足于用戶興趣點(diǎn)的挖掘開展了相關(guān)研究。 本文對(duì)相關(guān)的用戶興趣點(diǎn)挖掘算法進(jìn)行了詳細(xì)的分析和對(duì)比,針對(duì)現(xiàn)有用戶興趣點(diǎn)挖掘算法的局限性,提出了基于知識(shí)庫(kù)與文本分類算法來挖掘用戶的興趣點(diǎn)的基本思想。本文在英文語料研究下進(jìn)行的,首先利用Lucene建立基于Wikipedia的知識(shí)庫(kù),然后對(duì)用戶輸入的關(guān)鍵詞、用戶輸入的URL進(jìn)行分類,最后進(jìn)行用戶興趣的投射。其中對(duì)于關(guān)鍵詞分類,提出了基于共現(xiàn)詞和WordNet擴(kuò)展相結(jié)合的分類方法;對(duì)于URL分類,提出了基于塊的網(wǎng)頁(yè)正文提取法、基于DFSD的特征提取法;對(duì)于用戶興趣投射,提出了基于上下文環(huán)境的投射法,將用戶候選興趣點(diǎn)映射為一個(gè)興趣點(diǎn),從而挖掘出用戶真正的興趣點(diǎn);最后通過對(duì)比實(shí)驗(yàn)體現(xiàn)了算法的高效性和準(zhǔn)確性。
[Abstract]:In recent years, with the rapid development of the Internet, people can retrieve the information they need through the Internet. At present, search engine has become an important retrieval tool. However, because the retrieval results do not deal with different users, the information obtained by different users is the same, thus ignoring the interests of users. Facing the huge amount of network information, how to dig out the user's interest point and provide the personalized service for the user has become an important topic in the current research. User interest point mining is to mine user interest points from the user's browsing history records. The results directly reflect the accuracy and effectiveness of personalized services. In this paper, based on the mining of user interest points, the relevant research has been carried out. This paper makes a detailed analysis and comparison of the relevant user point of interest mining algorithm, aiming at the limitations of the existing user point of interest mining algorithm. This paper presents the basic idea of mining users' points of interest based on knowledge base and text classification algorithm. Firstly, the knowledge base based on Wikipedia is built by using Lucene, and then the keywords entered by users and the URL input by users are classified. Finally, the projection of user interest is carried out. For keyword classification, a classification method based on co-occurrence word and WordNet extension is proposed. For URL classification, a block based text extraction method and a DFSD based feature extraction method are proposed. For user interest projection, a context-based projection method is proposed to map user candidate interest points to a point of interest, thus mining out the real interest points of users. Finally, the high efficiency and accuracy of the algorithm are demonstrated through comparative experiments.
【學(xué)位授予單位】:天津師范大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2013
【分類號(hào)】:TP391.1;TP311.13
【參考文獻(xiàn)】
相關(guān)期刊論文 前7條
1 張海粟;馬大明;鄧智龍;;基于維基百科的語義知識(shí)庫(kù)及其構(gòu)建方法研究[J];計(jì)算機(jī)應(yīng)用研究;2011年08期
2 薛偉蓮;王蘊(yùn)慧;;一種基于對(duì)話的電子商務(wù)推薦系統(tǒng)[J];遼寧師范大學(xué)學(xué)報(bào)(自然科學(xué)版);2011年02期
3 李霞;蔣盛益;;基于DOM樹及行文本統(tǒng)計(jì)去噪的網(wǎng)頁(yè)文本抽取技術(shù)[J];山東大學(xué)學(xué)報(bào)(理學(xué)版);2012年03期
4 陸曉曦;;ODP分類體系初探[J];山東圖書館學(xué)刊;2009年01期
5 任翔;劉彬;;基于超鏈接分析的網(wǎng)頁(yè)正文提取方法[J];泰山學(xué)院學(xué)報(bào);2010年03期
6 范云杰;劉懷亮;;基于維基百科的中文短文本分類研究[J];現(xiàn)代圖書情報(bào)技術(shù);2012年03期
7 馬宏偉;張光衛(wèi);李鵬;;協(xié)同過濾推薦算法綜述[J];小型微型計(jì)算機(jī)系統(tǒng);2009年07期
,本文編號(hào):1450916
本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/1450916.html
最近更新
教材專著