基于路徑與頁面挖掘的用戶瀏覽行為研究
[Abstract]:In the process of interaction between users and Internet products, especially web browsing, the network feedback a lot of behavior data. How to make use of the data generated in the process of browsing, to mine and analyze deeply, to understand the behavior, psychology and preferences of the user, and to improve the product to enhance the user experience. It has become a topic of interest to many Internet companies. To study the browsing behavior of Internet users, an effective method is to collect the web logs feedback during the browsing process, and to realize the user browsing behavior analysis through web log mining. This has been successfully studied by many scholars. This paper hopes to systematize and engineer the user behavior analysis based on web log mining based on the current popular Hadoop platform and data warehouse technology on the basis of predecessors, so as to become a project that can be applied in the daily production of Internet enterprises. Better implementation of the enterprise's product development, operation, management of effective support. Based on the path and page mining, this paper studies the user's page browsing behavior, which includes four aspects: (1) the Hadoop data processing platform and the hive data warehouse are introduced. Through distributed storage and computing, the platform can realize the high-speed and effective analysis of massive data, and according to the characteristics of hive data warehouse, The research framework of user browsing behavior based on data warehouse is proposed. (2) the basic data layer and topic layer are constructed based on data warehouse. In the topic layer, user browsing behavior is the main topic. (3) by studying association rules algorithm and common path mining algorithm, The Hive-CFAP algorithm of frequent access path mining based on data warehouse is proposed. (4) based on the topic of user browsing behavior and Hive-CFAP algorithm, the relationship among frequent access path, page views and page distance is discussed. The clustering of similar browsing users is analyzed and applied.
【學(xué)位授予單位】:北京郵電大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2014
【分類號(hào)】:TP393.092
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 單蓉;;一種基于用戶瀏覽行為更新的興趣模型[J];電子設(shè)計(jì)工程;2010年04期
2 肖國(guó)強(qiáng),肖軼;一種從Web日志中挖掘訪問模式的新算法[J];華中科技大學(xué)學(xué)報(bào)(自然科學(xué)版);2004年05期
3 何炎祥,孔維強(qiáng),向劍文,朱驍峰;WebLog訪問序列模式挖掘[J];計(jì)算機(jī)工程與應(yīng)用;2003年27期
4 褚紅丹;焦素云;馬威;;用戶訪問興趣路徑挖掘方法[J];計(jì)算機(jī)工程與應(yīng)用;2008年35期
5 田昌鵬;;基于Web日志分析的Web QoS研究[J];計(jì)算機(jī)科學(xué);2007年06期
6 任永功;付玉;張亮;;一種改進(jìn)的用戶瀏覽偏愛路徑挖掘方法[J];計(jì)算機(jī)工程;2009年08期
7 郭本俊;王鵬;陳高云;黃健;;基于MPI的云計(jì)算模型[J];計(jì)算機(jī)工程;2009年24期
8 程苗;陳華平;;基于Hadoop的Web日志挖掘[J];計(jì)算機(jī)工程;2011年11期
9 邢東山,沈鈞毅,宋擒豹;從Web日志中挖掘用戶瀏覽偏愛路徑[J];計(jì)算機(jī)學(xué)報(bào);2003年11期
10 盧喜利;周軍;周月鵬;;基于Cookie技術(shù)和啟發(fā)式規(guī)則的用戶識(shí)別算法[J];微計(jì)算機(jī)應(yīng)用;2009年11期
,本文編號(hào):2285141
本文鏈接:http://sikaile.net/guanlilunwen/ydhl/2285141.html