天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁(yè) > 科技論文 > 軟件論文 >

基于分布式處理的用戶行為特征提取與建模研究

發(fā)布時(shí)間:2019-04-24 11:56
【摘要】:隨著互聯(lián)網(wǎng)行業(yè)的蓬勃發(fā)展和運(yùn)營(yíng)商基礎(chǔ)設(shè)施與服務(wù)的不斷建設(shè)升級(jí),用戶訪問(wèn)互聯(lián)網(wǎng)而產(chǎn)生的數(shù)據(jù)日益豐富。分布式數(shù)據(jù)處理技術(shù)的發(fā)展和數(shù)據(jù)挖掘及機(jī)器學(xué)習(xí)領(lǐng)域的結(jié)合,使得針對(duì)互聯(lián)網(wǎng)用戶進(jìn)行特征提取和行為偏好研究成為熱門(mén)領(lǐng)域。運(yùn)營(yíng)商作為數(shù)據(jù)管道掌握著全網(wǎng)范圍內(nèi)的網(wǎng)絡(luò)訪問(wèn)流量記錄,在其采集的DPI數(shù)據(jù)上進(jìn)行處理、挖掘和分析,對(duì)全方位刻畫(huà)用戶行為偏好有著巨大潛力。在此背景下,本文針對(duì)國(guó)內(nèi)某運(yùn)營(yíng)商采集的某市固網(wǎng)寬帶DPI數(shù)據(jù)進(jìn)行了研究,利用分布式處理技術(shù)和數(shù)據(jù)挖掘相關(guān)方法從用戶的上網(wǎng)流量記錄中提取互聯(lián)網(wǎng)用戶行為特征。傳統(tǒng)的基于運(yùn)營(yíng)商流量的數(shù)據(jù)分析多是以研究各類(lèi)業(yè)務(wù)的流量分布特性為切入點(diǎn),描繪用戶不同時(shí)段使用不同種類(lèi)應(yīng)用的行為習(xí)慣。本文以DPI記錄中URL為出發(fā)點(diǎn),從用戶訪問(wèn)網(wǎng)站的類(lèi)別、序列模式特征和在線商品瀏覽等方面提取用戶上網(wǎng)行為特征,并進(jìn)行了建模研究和實(shí)驗(yàn)分析。首先,本文利用爬蟲(chóng)技術(shù)從導(dǎo)航網(wǎng)站和分類(lèi)目錄網(wǎng)站獲取網(wǎng)站分類(lèi)標(biāo)簽庫(kù),并且對(duì)上網(wǎng)終端搭載的操作系統(tǒng)進(jìn)行識(shí)別,通過(guò)統(tǒng)計(jì)分析和聚類(lèi)技術(shù)研究了基于網(wǎng)站標(biāo)簽的用戶群組興趣特征;其次,本文將序列模式挖掘方法應(yīng)用于全網(wǎng)范圍內(nèi)用戶跨多個(gè)網(wǎng)站的訪問(wèn)特征研究,建立用戶訪問(wèn)網(wǎng)站的序列模型,發(fā)現(xiàn)在全天范圍內(nèi)用戶的網(wǎng)站訪問(wèn)行為在時(shí)序上的頻繁序列模式;最后,本文針對(duì)用戶訪問(wèn)電商網(wǎng)站產(chǎn)生的流量進(jìn)行了單獨(dú)研究,并結(jié)合爬蟲(chóng)技術(shù)將用戶的興趣偏好特征直接細(xì)化到商品、品牌和類(lèi)目三個(gè)級(jí)別,通過(guò)頻繁項(xiàng)集挖掘和關(guān)聯(lián)分析提取用戶在線瀏覽商品的偏好特征,并通過(guò)建模和實(shí)驗(yàn)進(jìn)行了全面的研究和分析。
[Abstract]:With the rapid development of Internet industry and the continuous construction and upgrading of operators' infrastructure and services, the data generated by users accessing the Internet is becoming more and more abundant. With the development of distributed data processing technology and the combination of data mining and machine learning, the research on feature extraction and behavior preference of Internet users has become a hot field. As a data pipeline, operators master the network access traffic records in the whole network, and process, mine and analyze the collected DPI data, which has great potential to portray the behavior preference of users in all directions. Under this background, this paper studies the fixed-line broadband DPI data collected by a domestic operator, and extracts the behavior characteristics of Internet users from users' Internet traffic records by means of distributed processing technology and data mining related methods. The traditional data analysis based on carrier traffic is based on the research of traffic distribution characteristics of all kinds of services, and describes the behavior habits of users using different kinds of applications at different times. Taking URL in DPI record as the starting point, this paper extracts the characteristics of users' online behavior from the categories of users visiting websites, sequence pattern features and online merchandise browsing, and carries on modeling research and experimental analysis. First of all, this paper uses crawler technology to obtain the website classification tag library from the navigation website and the classified directory website, and to identify the operating system on the Internet terminal. Through statistical analysis and clustering technology, the interest characteristics of user groups based on website tags are studied. Secondly, in this paper, the sequential pattern mining method is applied to the study of the access characteristics of users across multiple websites in the whole network, and the sequence model of users visiting the websites is established. The frequent sequence patterns of users' website visit behavior in time series are found in the whole day. Finally, this paper makes a separate study on the traffic generated by users visiting e-commerce websites, and combines with crawler technology to refine the user's interest and preference directly to three levels: commodity, brand and category. Through frequent itemsets mining and association analysis, the preference features of users browsing goods online are extracted, and comprehensive research and analysis are carried out through modeling and experiments.
【學(xué)位授予單位】:北京郵電大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2016
【分類(lèi)號(hào)】:TP311.13;TP393.092

【參考文獻(xiàn)】

相關(guān)期刊論文 前5條

1 楊波;;通信運(yùn)營(yíng)商寬帶用戶行為分析的研究與應(yīng)用[J];郵電設(shè)計(jì)技術(shù);2014年11期

2 邊凌燕;賀仁龍;姚曉輝;;基于DPI數(shù)據(jù)挖掘?qū)崿F(xiàn)URL分類(lèi)掛載的相關(guān)技術(shù)研究[J];電信科學(xué);2013年11期

3 陶彩霞;謝曉軍;陳康;郭利榮;劉春;;基于云計(jì)算的移動(dòng)互聯(lián)網(wǎng)大數(shù)據(jù)用戶行為分析引擎設(shè)計(jì)[J];電信科學(xué);2013年03期

4 劉棟;尉永清;薛文娟;;基于Map Reduce的序列模式挖掘算法[J];計(jì)算機(jī)工程;2012年15期

5 邢東山,沈鈞毅,宋擒豹;從Web日志中挖掘用戶瀏覽偏愛(ài)路徑[J];計(jì)算機(jī)學(xué)報(bào);2003年11期

相關(guān)博士學(xué)位論文 前2條

1 郭敏杰;基于云計(jì)算的海量網(wǎng)絡(luò)流量數(shù)據(jù)分析處理及關(guān)鍵算法研究[D];北京郵電大學(xué);2014年

2 竇伊男;根據(jù)多維特征的網(wǎng)絡(luò)用戶分類(lèi)研究[D];北京郵電大學(xué);2010年



本文編號(hào):2464425

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2464425.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶ec6c8***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com
日本午夜精品视频在线观看| 国产又粗又猛又大爽又黄| 精品少妇人妻av免费看| 国产综合一区二区三区av| 丝袜人妻夜夜爽一区二区三区| 一区二区三区精品人妻| 婷婷开心五月亚洲综合| 国产成人精品午夜福利| 国产精品成人一区二区三区夜夜夜 | 青草草在线视频免费视频| 亚洲欧美日产综合在线网| 国产a天堂一区二区专区| 欧美丰满人妻少妇精品| 久久精品国产亚洲熟女| 亚洲午夜av一区二区| 欧美偷拍一区二区三区四区| 一区二区三区人妻在线| 国产一区欧美一区日韩一区| av国产熟妇露脸在线观看| 久久国产亚洲精品赲碰热| 99久久精品午夜一区| 99热九九热这里只有精品| 午夜福利国产精品不卡| 日韩免费午夜福利视频| 日韩欧美综合中文字幕 | 国产女性精品一区二区三区| 国产av乱了乱了一区二区三区| 国产午夜精品在线免费看| 国产精品久久熟女吞精| 深夜福利欲求不满的人妻| 小黄片大全欧美一区二区| 99日韩在线视频精品免费| 国产人妻精品区一区二区三区| 色婷婷日本视频在线观看| 欧美日本精品视频在线观看| 午夜精品成年人免费视频| 性欧美唯美尤物另类视频| 国产麻豆视频一二三区| 久久精品国产第一区二区三区| 黑色丝袜脚足国产一区二区| 日韩特级黄片免费观看|