基于數(shù)據(jù)挖掘的社區(qū)網(wǎng)站用戶行為分析系統(tǒng)
發(fā)布時間:2018-02-10 18:34
本文關(guān)鍵詞: 數(shù)據(jù)挖掘 行為分析 高維數(shù)據(jù)索引 出處:《南京郵電大學》2012年碩士論文 論文類型:學位論文
【摘要】:隨著信息化逐步改善生活,衍生出如人人網(wǎng)、開心網(wǎng)、騰訊朋友網(wǎng)等的各類社區(qū)網(wǎng)絡服務,它們向人們提供整合知識、咨詢疑難、新聞關(guān)注、互通友誼等嶄新功能,而通過分析用戶行為來提供適合不同用戶的特定服務將能夠極大的增強用戶體驗。本文的目的正是為社區(qū)網(wǎng)站構(gòu)建一套個性化智能推薦引擎,通過分析社區(qū)網(wǎng)站用戶的特征,挖潛其用戶的興趣關(guān)注點,充分加強社區(qū)網(wǎng)站的用戶體驗,并為處于門戶網(wǎng)站階段、搜索引擎階段的網(wǎng)站改造為智能推薦階段提供一個原型參考。 通過參考數(shù)據(jù)挖掘及行為分析的國內(nèi)外相關(guān)文獻,本文先設計了基于數(shù)據(jù)挖掘的用戶行為分析系統(tǒng)的總體架構(gòu)及其主要業(yè)務流程,而后參照數(shù)據(jù)挖掘系統(tǒng)的構(gòu)建基本步驟,本文從特征收集、特征預處理、相關(guān)性特征數(shù)據(jù)挖掘算法、特征數(shù)據(jù)高效索引等幾個流程對基于社區(qū)網(wǎng)站用戶行為分析系統(tǒng)進行詳細設計,同時對該系統(tǒng)的時間調(diào)度機制進行了闡述。 為解決海量用戶的高效行為分析,本文借鑒已有的研究成果,通過改進的正則表達式多模匹配算法實現(xiàn)高性能數(shù)據(jù)預處理模塊,并通過建模將用戶行為分析轉(zhuǎn)換為排名問題進而采用Ranking算法進行數(shù)據(jù)挖掘,最后本系統(tǒng)將挖掘出的數(shù)據(jù)特征映射到高維空間,采用LSH算法構(gòu)建模糊搜索來進行高性能的匹配與鄰近查詢。 經(jīng)過實驗仿真,多樣化的分詞引擎配合較為全面的詞庫不僅可以將用戶的輸入文本進行快速分詞,同時具有較高的準確性;而正則表達式多模匹配算法經(jīng)優(yōu)化后可一定程度上降低內(nèi)存消耗,實現(xiàn)可用的高效用戶關(guān)注點匹配引擎;經(jīng)過不同維度及不同數(shù)據(jù)規(guī)模的測試,改進的LSH算法可以滿足海量用戶興趣特征的存儲索引,不僅能在特征數(shù)量維度增加的時候保持建庫及查詢時間的線性增長,同時不會由于用戶量的增加而明顯增加檢索匹配時間。因此本系統(tǒng)可基本滿足社區(qū)網(wǎng)站的行為分析需求,為社區(qū)網(wǎng)站的用戶行為分析提供了一套可行方案。
[Abstract]:With the gradual improvement of life by informationization, various kinds of community network services, such as Renren, Kaixin, Tencent Friends, etc., which provide people with new functions such as integrating knowledge, consulting and difficult problems, news attention, mutual friendship, etc., have spawned various kinds of community network services such as Renren, Kaixin, Tencent Friends, etc. The purpose of this paper is to build a personalized intelligent recommendation engine for community websites and analyze the characteristics of community website users by analyzing the behavior of users to provide specific services for different users. It can fully enhance the user experience of community websites and provide a prototype reference for the transformation of websites in portal stage and search engine stage for intelligent recommendation stage. By referring to the domestic and foreign literature on data mining and behavior analysis, this paper first designs the overall framework and main business process of user behavior analysis system based on data mining, and then refers to the basic steps of constructing data mining system. In this paper, the user behavior analysis system based on community website is designed in detail from several processes, such as feature collection, feature preprocessing, correlation feature data mining algorithm, feature data efficient index and so on. At the same time, the time scheduling mechanism of the system is expounded. In order to solve the problem of high-efficient behavior analysis of massive users, this paper uses the existing research results for reference, and implements the high-performance data preprocessing module through the improved regular expression multi-mode matching algorithm. Through modeling, the user behavior analysis is transformed into rank problem, and then Ranking algorithm is used for data mining. Finally, the system maps the extracted data features to high dimensional space. LSH algorithm is used to construct fuzzy search for high performance matching and neighbor query. Through the experiment simulation, the diversified word segmentation engine combined with a more comprehensive vocabulary can not only quickly segment the user's input text, but also have a high accuracy. After optimization, the regular expression multi-mode matching algorithm can reduce memory consumption to some extent and realize the efficient user concern matching engine, which is tested by different dimensions and different data scales. The improved LSH algorithm can satisfy the storage index of massive user's interest feature, and can not only keep the linear growth of database and query time when the dimension of feature number increases. At the same time, the search matching time will not be significantly increased because of the increase of the number of users. Therefore, the system can basically meet the needs of community website behavior analysis, and provide a set of feasible scheme for community website user behavior analysis.
【學位授予單位】:南京郵電大學
【學位級別】:碩士
【學位授予年份】:2012
【分類號】:TP311.13;TP393.092
【引證文獻】
相關(guān)碩士學位論文 前1條
1 徐雄威;基于本體的上下文感知“科技論文在線”用戶行為推理研究[D];武漢理工大學;2013年
,本文編號:1501166
本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/1501166.html
最近更新
教材專著