Web用戶行為數(shù)據(jù)收集統(tǒng)計系統(tǒng)的設計與實現(xiàn)
發(fā)布時間:2018-01-29 03:48
本文關(guān)鍵詞: 網(wǎng)站流量分析 行為數(shù)據(jù)收集 JavaScript自動嵌入 Netty 出處:《北京交通大學》2015年碩士論文 論文類型:學位論文
【摘要】:互聯(lián)網(wǎng)時代的到來,網(wǎng)絡已經(jīng)融入人們的生活,人們也逐漸接受了網(wǎng)上購物的消費模式。網(wǎng)購者的急劇增加,讓各個電子商務網(wǎng)站投入更多的成本來吸引用戶創(chuàng)造更多的營收。既然是電子商務網(wǎng)站,那么良好的網(wǎng)站設計,讓用戶滿意的購物體驗對網(wǎng)站的經(jīng)營來說至關(guān)重要,所以網(wǎng)站分析就顯得十分必要。要想了解用戶訪問網(wǎng)站的情況,就要獲取全面而且詳細的用戶瀏覽網(wǎng)站的行為數(shù)據(jù),從大數(shù)據(jù)的角度來講海量信息使得網(wǎng)站分析更具洞察力,或許就會從不起眼的數(shù)據(jù)中挖掘到潛在的價值。 雖然現(xiàn)在已有很多第三方甚至免費的網(wǎng)站分析工具,但實際應用在網(wǎng)站中并不方便,如采用JavaScript頁面標簽法的Google Analytics,必須修改頁面引入JavaScript代碼,而且捕獲某種用戶行為數(shù)據(jù)需要大量地修改頁面增加事件跟蹤的代碼,導致數(shù)據(jù)捕獲的工作量繁重、管理不便,而且對數(shù)據(jù)的統(tǒng)計也不具有實時性;而服務器日志的方式不能進行事件跟蹤,還要過濾數(shù)據(jù)。本文的重點就是實現(xiàn)一個用戶行為數(shù)據(jù)收集統(tǒng)計系統(tǒng),采用JavaScript頁面標簽法采集用戶行為數(shù)據(jù),但是不需手動修改頁面,而是通過Nginx的模塊功能自動將不同的JavaScript嵌入到各類頁面中;事件跟蹤的JavaScript代碼可以統(tǒng)一管理,方便維護;數(shù)據(jù)收集服務器基于Netty,可以快速地處理大量的數(shù)據(jù);行為數(shù)據(jù)通過數(shù)據(jù)收集服務器發(fā)送至MetaQ消息中間件,因為本系統(tǒng)對行為數(shù)據(jù)的統(tǒng)計有兩種方式,分別是使用Hive實現(xiàn)定制化的周期報表和通過Storm實現(xiàn)實時統(tǒng)計并展示,所以這兩種統(tǒng)計方式可以獨立地從MetaQ消息中間件中拉取數(shù)據(jù)消息互不影響,因而將數(shù)據(jù)收集服務器從中解耦出來。 本人在項目中的工作主要包括用戶行為數(shù)據(jù)采集方法的研究、行為數(shù)據(jù)采集和數(shù)據(jù)收集存儲模塊的實現(xiàn),其中本人參與開發(fā)的是通過Hive生成各類運營統(tǒng)計報表,故Storm實時統(tǒng)計的實現(xiàn)不在本文中介紹。目前本系統(tǒng)已經(jīng)為聯(lián)通網(wǎng)上商城和手機商城等平臺提供行為數(shù)據(jù)統(tǒng)計服務,借助已有的任務調(diào)度系統(tǒng)每日或周期性地生成報表發(fā)送給相關(guān)人員,而且就現(xiàn)有情況來看HDFS上的數(shù)據(jù)存儲也基本達到了實時性,因此通過對行為數(shù)據(jù)的實時查詢可以監(jiān)控一些網(wǎng)站狀況,如出現(xiàn)異常可通過短信接口發(fā)送告警信息給開發(fā)人員。
[Abstract]:With the advent of the Internet era, the Internet has been integrated into people's lives, and people have gradually accepted the consumption mode of online shopping. Let each e-commerce site invest more cost to attract users to create more revenue. Since it is an e-commerce site, so good website design. Customer satisfaction shopping experience is very important to the operation of the website, so website analysis is very necessary. To understand the user visit the site. Comprehensive and detailed user browsing behavior data is needed. From big data's point of view, vast amounts of information make website analysis more insightful, and may tap into potential value from unremarkable data. Although there are many third-party and even free website analysis tools, but the actual application in the site is not convenient. For Google Analytics using JavaScript page tags, the page must be modified to introduce JavaScript code. To capture certain user behavior data, it is necessary to modify the page to increase the code of event tracking, which leads to the heavy workload of data capture, the inconvenience of management, and the lack of real-time data statistics. However, the way of server log can not do event tracking, but also filter data. The focus of this paper is to implement a user behavior data collection and statistics system. JavaScript page tag method is used to collect user behavior data, but no manual modification of the page is required. Instead, it automatically embeds different JavaScript into all kinds of pages through the module function of Nginx. Event tracking JavaScript code can be unified management, easy to maintain; The data collection server is based on Netty. it can process a lot of data quickly. Behavior data is sent to the MetaQ messaging middleware through the data collection server, because there are two ways to calculate the behavior data in this system. Hive is used to realize customized periodic reports and real-time statistics and display through Storm. Therefore, these two statistical methods can independently pull data messages from MetaQ message middleware and decouple the data collection server from them. My work in the project mainly includes the research of user behavior data acquisition method, the implementation of behavior data acquisition and data collection and storage module. Among them, I participate in the development of Hive to generate all kinds of operational statistics reports. Therefore, the realization of Storm real-time statistics is not introduced in this paper. At present, this system has provided the behavior data statistics service for the platform such as Unicom online mall and mobile phone mall. With the help of the existing task scheduling system to generate or periodically generate reports to the relevant personnel, and the existing situation on the HDFS data storage is basically achieved real-time. Therefore, real-time query of behavior data can monitor the status of some websites, such as abnormal can send alarm information to developers through SMS interface.
【學位授予單位】:北京交通大學
【學位級別】:碩士
【學位授予年份】:2015
【分類號】:TP311.52;TP393.092
【參考文獻】
相關(guān)期刊論文 前7條
1 李聳;房明;;基于Web的網(wǎng)站流量統(tǒng)計系統(tǒng)的設計[J];電腦知識與技術(shù);2008年05期
2 張宏升;;軟件架構(gòu)的非功能性需求指標和區(qū)域化支持[J];電腦知識與技術(shù);2011年09期
3 向堅持;劉相濱;徐選華;;基于用戶行為的Web使用挖掘數(shù)據(jù)采集技術(shù)研究[J];計算機與現(xiàn)代化;2007年12期
4 袁雅萍;;網(wǎng)站流量評估監(jiān)測系統(tǒng)的設計與實現(xiàn)[J];煤炭技術(shù);2009年10期
5 趙儀,趙熊,張成昱;專業(yè)網(wǎng)站的評價指標分析[J];現(xiàn)代圖書情報技術(shù);2002年04期
6 馬亞娜,錢煥延,孫亞民;Cookie在web認證中的應用研究[J];小型微型計算機系統(tǒng);2004年02期
7 靳永超;吳懷谷;;基于Storm和Hadoop的大數(shù)據(jù)處理架構(gòu)的研究[J];現(xiàn)代計算機(專業(yè)版);2015年04期
,本文編號:1472431
本文鏈接:http://sikaile.net/guanlilunwen/ydhl/1472431.html
最近更新
教材專著