天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于Hadoop的大型網(wǎng)站海量數(shù)據(jù)的統(tǒng)計(jì)與應(yīng)用

發(fā)布時(shí)間:2018-11-19 09:26
【摘要】:隨著互聯(lián)網(wǎng)的飛速發(fā)展,人們對(duì)于網(wǎng)絡(luò)的需求越來(lái)越廣。然而,用戶對(duì)互聯(lián)網(wǎng)的需求卻各不相同,往往體現(xiàn)出一定的偏好性。對(duì)于一個(gè)網(wǎng)站來(lái)講,用戶的一些行為在網(wǎng)站的后臺(tái)留下了日志數(shù)據(jù),并且這些數(shù)據(jù)是海量的。對(duì)這些海量數(shù)據(jù)進(jìn)行處理與統(tǒng)計(jì)是分析用戶行為特征、獲取用戶屬性、檢測(cè)廣告投放效果最好的辦法。 對(duì)于海量數(shù)據(jù)的處理,已經(jīng)有了許多的研究,一些開源的軟件框架不斷地被開發(fā)出來(lái)。最流行的就是Hadoop分布式軟件框架,它可以高效地處理海量數(shù)據(jù)。其中,Hadoop內(nèi)置的Hive數(shù)據(jù)倉(cāng)庫(kù)框架同樣可以高效地處理海量數(shù)據(jù)。目前,Hadoop得到了工程研發(fā)界的普遍關(guān)注。 某大型網(wǎng)站的用戶行為分析項(xiàng)目就是為了分析用戶行為特征而產(chǎn)生的。將通過這些海量數(shù)據(jù)的處理,進(jìn)而挖掘用戶的行為特征、用戶屬性和廣告投放屬性。在該項(xiàng)目中,利用Hadoop和Hive來(lái)處理這些海量數(shù)據(jù)。該項(xiàng)目主要分為以下幾個(gè)部分:用戶人群分類、總體數(shù)據(jù)統(tǒng)計(jì)、廣告數(shù)據(jù)統(tǒng)計(jì)、cookie重合度統(tǒng)計(jì)、品牌探針和全網(wǎng)路統(tǒng)計(jì)。得到這些部分的統(tǒng)計(jì)結(jié)果后,進(jìn)入數(shù)據(jù)分析階段,挖掘相關(guān)信息,幫助制定運(yùn)營(yíng)策略。 本文詳細(xì)介紹了這幾個(gè)部分的設(shè)計(jì)與實(shí)現(xiàn)過程,并針對(duì)有的部分給出了簡(jiǎn)要的分析。首先介紹了項(xiàng)目背景和Hadoop的相關(guān)技術(shù),然后詳細(xì)介紹了項(xiàng)目目標(biāo)和數(shù)據(jù)的一些細(xì)節(jié)。接著詳細(xì)介紹了每個(gè)部分的作用以及如何使用Hadoop來(lái)幫助完成海量數(shù)據(jù)的處理。最后,對(duì)本文進(jìn)行了簡(jiǎn)單的總結(jié),指出不足之處和可以進(jìn)行優(yōu)化的地方。
[Abstract]:With the rapid development of the Internet, people's demand for the network is more and more extensive. However, the user's demand for the Internet is different, often reflecting a certain degree of preference. For a website, some user behavior leaves log data in the background of the site, and the data is massive. Processing and statistics of these massive data is the best way to analyze the characteristics of user behavior, to obtain user attributes, and to detect the effect of advertising. There has been a lot of research on massive data processing, and some open source software frameworks have been developed. The most popular is the Hadoop distributed software framework, which can deal with large amounts of data efficiently. Among them, Hadoop built-in Hive data warehouse framework can also efficiently deal with massive data. At present, Hadoop has been the general concern of the engineering research and development community. The user behavior analysis project of a large website is produced to analyze the characteristics of user behavior. Through the processing of these massive data, the user behavior characteristics, user attributes and advertising attributes will be mined. In this project, Hadoop and Hive are used to process these huge amounts of data. The project is mainly divided into the following parts: user population classification, general data statistics, advertising data statistics, cookie coincidence statistics, brand probe and the whole network statistics. After getting the statistical results of these parts, enter the stage of data analysis, mining relevant information, and help to formulate operational strategy. This paper introduces the design and implementation of these parts in detail, and gives a brief analysis for some parts. The background of the project and the related technology of Hadoop are introduced, and then some details of the project objectives and data are introduced in detail. Then the function of each part and how to use Hadoop to help complete the processing of massive data are introduced in detail. Finally, a brief summary of this paper, pointing out the shortcomings and can be optimized.
【學(xué)位授予單位】:南京大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2012
【分類號(hào)】:TP393.092;TP311.5

【引證文獻(xiàn)】

相關(guān)期刊論文 前1條

1 葉小榕;邵晴;;基于增強(qiáng)現(xiàn)實(shí)和位置服務(wù)的手機(jī)廣告系統(tǒng)[J];科技導(dǎo)報(bào);2013年04期

相關(guān)碩士學(xué)位論文 前1條

1 付倩文;基于Hadoop/hive架構(gòu)的網(wǎng)絡(luò)身份識(shí)別系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)[D];南京郵電大學(xué);2013年

,

本文編號(hào):2341868

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/wenyilunwen/guanggaoshejilunwen/2341868.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶8ae7c***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com