基于路徑與頁面挖掘的用戶瀏覽行為研究
發(fā)布時間:2018-10-21 12:50
【摘要】:在用戶與互聯(lián)網(wǎng)產(chǎn)品進行交互,特別是web瀏覽的過程中,網(wǎng)絡(luò)反饋了大量的行為數(shù)據(jù)。如何利用用戶瀏覽過程中所產(chǎn)生的這些數(shù)據(jù),進行深度挖掘和分析,摸透用戶的行為、心理以及喜好,更好的改進產(chǎn)品提升用戶體驗,成為了當下許多互聯(lián)網(wǎng)公司感興趣的課題。 對互聯(lián)網(wǎng)用戶瀏覽行為進行研究,一個行之有效的辦法,就是將用戶瀏覽過程中反饋的web日志收集起來,通過web日志挖掘,從而實現(xiàn)用戶瀏覽行為分析,這在許多學者的研究中己獲得了成功。本文希望在前人的基礎(chǔ)上,結(jié)合當前流行的Hadoop平臺和數(shù)據(jù)倉庫技術(shù),將基于web日志挖掘的用戶行為分析系統(tǒng)化、工程化,從而成為互聯(lián)網(wǎng)企業(yè)日常生產(chǎn)中可以應(yīng)用的項目,更好的實現(xiàn)對企業(yè)的產(chǎn)品開發(fā)、運營、管理的有效支撐。 本文基于路徑與頁面挖掘,對用戶的頁面瀏覽行為進行了研究,這主要包括四方面的內(nèi)容: (1)對Hadoop數(shù)據(jù)處理平臺及hive數(shù)據(jù)倉庫進行了介紹,該平臺通過分布式存儲與計算,可以實現(xiàn)海量數(shù)據(jù)的高速且有效分析,并根據(jù)hive數(shù)據(jù)倉庫的特點,提出了基于數(shù)據(jù)倉庫的用戶瀏覽行為研究框架。 (2)基于數(shù)據(jù)倉庫構(gòu)建了基礎(chǔ)數(shù)據(jù)層及主題層,在主題層主要是用戶瀏覽行為主題。 (3)通過研究關(guān)聯(lián)規(guī)則算法與常用路徑挖掘算法,提出了基于數(shù)據(jù)倉庫的頻繁訪問路徑挖掘Hive-CFAP算法。 (4)基于用戶瀏覽行為主題及Hive-CFAP 算法,對用戶頻繁訪問路徑、頁面瀏覽量與頁面距離的關(guān)系,相似瀏覽用戶的聚類進行了分析及應(yīng)用。
[Abstract]:In the process of interaction between users and Internet products, especially web browsing, the network feedback a lot of behavior data. How to make use of the data generated in the process of browsing, to mine and analyze deeply, to understand the behavior, psychology and preferences of the user, and to improve the product to enhance the user experience. It has become a topic of interest to many Internet companies. To study the browsing behavior of Internet users, an effective method is to collect the web logs feedback during the browsing process, and to realize the user browsing behavior analysis through web log mining. This has been successfully studied by many scholars. This paper hopes to systematize and engineer the user behavior analysis based on web log mining based on the current popular Hadoop platform and data warehouse technology on the basis of predecessors, so as to become a project that can be applied in the daily production of Internet enterprises. Better implementation of the enterprise's product development, operation, management of effective support. Based on the path and page mining, this paper studies the user's page browsing behavior, which includes four aspects: (1) the Hadoop data processing platform and the hive data warehouse are introduced. Through distributed storage and computing, the platform can realize the high-speed and effective analysis of massive data, and according to the characteristics of hive data warehouse, The research framework of user browsing behavior based on data warehouse is proposed. (2) the basic data layer and topic layer are constructed based on data warehouse. In the topic layer, user browsing behavior is the main topic. (3) by studying association rules algorithm and common path mining algorithm, The Hive-CFAP algorithm of frequent access path mining based on data warehouse is proposed. (4) based on the topic of user browsing behavior and Hive-CFAP algorithm, the relationship among frequent access path, page views and page distance is discussed. The clustering of similar browsing users is analyzed and applied.
【學位授予單位】:北京郵電大學
【學位級別】:碩士
【學位授予年份】:2014
【分類號】:TP393.092
本文編號:2285141
[Abstract]:In the process of interaction between users and Internet products, especially web browsing, the network feedback a lot of behavior data. How to make use of the data generated in the process of browsing, to mine and analyze deeply, to understand the behavior, psychology and preferences of the user, and to improve the product to enhance the user experience. It has become a topic of interest to many Internet companies. To study the browsing behavior of Internet users, an effective method is to collect the web logs feedback during the browsing process, and to realize the user browsing behavior analysis through web log mining. This has been successfully studied by many scholars. This paper hopes to systematize and engineer the user behavior analysis based on web log mining based on the current popular Hadoop platform and data warehouse technology on the basis of predecessors, so as to become a project that can be applied in the daily production of Internet enterprises. Better implementation of the enterprise's product development, operation, management of effective support. Based on the path and page mining, this paper studies the user's page browsing behavior, which includes four aspects: (1) the Hadoop data processing platform and the hive data warehouse are introduced. Through distributed storage and computing, the platform can realize the high-speed and effective analysis of massive data, and according to the characteristics of hive data warehouse, The research framework of user browsing behavior based on data warehouse is proposed. (2) the basic data layer and topic layer are constructed based on data warehouse. In the topic layer, user browsing behavior is the main topic. (3) by studying association rules algorithm and common path mining algorithm, The Hive-CFAP algorithm of frequent access path mining based on data warehouse is proposed. (4) based on the topic of user browsing behavior and Hive-CFAP algorithm, the relationship among frequent access path, page views and page distance is discussed. The clustering of similar browsing users is analyzed and applied.
【學位授予單位】:北京郵電大學
【學位級別】:碩士
【學位授予年份】:2014
【分類號】:TP393.092
【參考文獻】
相關(guān)期刊論文 前10條
1 單蓉;;一種基于用戶瀏覽行為更新的興趣模型[J];電子設(shè)計工程;2010年04期
2 肖國強,肖軼;一種從Web日志中挖掘訪問模式的新算法[J];華中科技大學學報(自然科學版);2004年05期
3 何炎祥,孔維強,向劍文,朱驍峰;WebLog訪問序列模式挖掘[J];計算機工程與應(yīng)用;2003年27期
4 褚紅丹;焦素云;馬威;;用戶訪問興趣路徑挖掘方法[J];計算機工程與應(yīng)用;2008年35期
5 田昌鵬;;基于Web日志分析的Web QoS研究[J];計算機科學;2007年06期
6 任永功;付玉;張亮;;一種改進的用戶瀏覽偏愛路徑挖掘方法[J];計算機工程;2009年08期
7 郭本俊;王鵬;陳高云;黃健;;基于MPI的云計算模型[J];計算機工程;2009年24期
8 程苗;陳華平;;基于Hadoop的Web日志挖掘[J];計算機工程;2011年11期
9 邢東山,沈鈞毅,宋擒豹;從Web日志中挖掘用戶瀏覽偏愛路徑[J];計算機學報;2003年11期
10 盧喜利;周軍;周月鵬;;基于Cookie技術(shù)和啟發(fā)式規(guī)則的用戶識別算法[J];微計算機應(yīng)用;2009年11期
,本文編號:2285141
本文鏈接:http://sikaile.net/guanlilunwen/ydhl/2285141.html
最近更新
教材專著