基于MapReduce的廣告點(diǎn)擊率預(yù)測系統(tǒng)設(shè)計(jì)與實(shí)現(xiàn)
發(fā)布時(shí)間:2018-06-29 00:02
本文選題:計(jì)算廣告 + 貝葉斯網(wǎng) ; 參考:《云南大學(xué)》2016年碩士論文
【摘要】:隨著信息技術(shù)的快速發(fā)展,“大數(shù)據(jù)”時(shí)代已經(jīng)到來,依托大數(shù)據(jù)和互聯(lián)網(wǎng),許多傳統(tǒng)被更新甚至是被顛覆。搜索廣告已經(jīng)成為互聯(lián)網(wǎng)行業(yè)的主要收入來源之一,廣告通常的運(yùn)營模式為關(guān)鍵詞競拍,廣告主支付獲得這些關(guān)鍵詞的費(fèi)用,其主要付費(fèi)方式為按點(diǎn)擊付費(fèi)(Pay Per Click, PPC)。廣告主的付費(fèi)為每次點(diǎn)擊費(fèi)用(Cost Per Click, CPC);廣告的熱門程度用點(diǎn)擊率(Click-Through-Rate, CTR)描述;而廣告媒介的收益則是CTR ×CPC,因此,廣告點(diǎn)擊率預(yù)測變得尤其重要。本文首先利用MapReduce框架對海量廣告數(shù)據(jù)進(jìn)行處理,接著基于貝葉斯網(wǎng)構(gòu)造廣告關(guān)鍵詞之間的相似模型,在接下來對存儲在HBase上的大規(guī)模貝葉斯網(wǎng)進(jìn)行概率推理,進(jìn)而得到待預(yù)測廣告的點(diǎn)擊率;谏鲜鏊枷雽(shí)現(xiàn)了廣告點(diǎn)擊率預(yù)測系統(tǒng),成功解決了海量數(shù)據(jù)情況下的廣告點(diǎn)擊率預(yù)測。本文的主要工作概括如下:1)數(shù)據(jù)預(yù)處理。本文主要是利用MapReduce框架對用戶搜索日志進(jìn)行分析處理,提取出有價(jià)值的數(shù)據(jù)后存儲到HBase中。2)大規(guī)模貝葉斯網(wǎng)的構(gòu)建和存儲。本文利用MapReduce分布式計(jì)算框架,將廣告關(guān)鍵詞作為貝葉斯網(wǎng)的節(jié)點(diǎn),首先構(gòu)造貝葉斯網(wǎng)的有向無環(huán)圖結(jié)構(gòu),接著基于貝葉斯網(wǎng)的有向無環(huán)圖結(jié)構(gòu),并行計(jì)算各節(jié)點(diǎn)的條件概率參數(shù)表,最后將完整構(gòu)造的貝葉斯網(wǎng)以鍵值對key,value形式并行地存儲到HBase表中。3)基于大規(guī)模貝葉斯網(wǎng)的廣告點(diǎn)擊率預(yù)測。本文將貝葉斯網(wǎng)的概率推理轉(zhuǎn)化為HBase上的數(shù)據(jù)查詢處理,同時(shí)基于MapReduce編程模型實(shí)現(xiàn)大規(guī)模貝葉斯網(wǎng)的概率推理,進(jìn)而預(yù)測廣告點(diǎn)擊率。4)基于上述研究內(nèi)容,我們將設(shè)計(jì)相應(yīng)的系統(tǒng),包含如下三個(gè)模塊:數(shù)據(jù)預(yù)處理模塊、大規(guī)模貝葉斯網(wǎng)構(gòu)建模塊以及廣告點(diǎn)擊率預(yù)測模塊,進(jìn)而實(shí)現(xiàn)了基于MapReduce的廣告點(diǎn)擊率預(yù)測系統(tǒng)。5)最后本文基于真實(shí)商用數(shù)據(jù),對系統(tǒng)進(jìn)行了功能測試和非功能測試。
[Abstract]:With the rapid development of information technology, "big data" era has come, relying on big data and the Internet, many traditions have been updated or even subverted. Search advertising has become one of the main sources of revenue in the Internet industry. Advertising usually operates as a keyword auction, and advertisers pay the cost of obtaining these keywords. The main way of payment is pay per click (PPC). Advertisers pay for the cost per click (CPC); the popularity of advertising is described by Click-Through-Rate- (CTR); and the revenue of advertising media is CTR 脳 CPC.Therefore, the prediction of ad click rate becomes particularly important. In this paper, we first use MapReduce framework to process massive advertising data, then construct a similar model of advertising keywords based on Bayesian network, and then do probability reasoning for large-scale Bayesian network stored on HBASE. And then get the click rate of the ad to be predicted. Based on the above ideas, an ad click rate prediction system is implemented, which successfully solves the problem of advertising click rate prediction in the case of massive data. The main work of this paper is summarized as follows: 1) data preprocessing. In this paper, we use MapReduce framework to analyze and process user search logs, extract valuable data and store them into HBASE. 2) Construction and storage of large-scale Bayesian networks. In this paper, we use MapReduce distributed computing framework to construct directed acyclic graph structure of Bayesian network, and then construct directed acyclic graph structure based on Bayesian network. The conditional probabilistic parameter tables of each node are computed in parallel. Finally, the constructed Bayesian networks are stored in parallel in the form of key-value pairs in HBase tables. 3) based on the large scale Bayesian networks, the advertisement click rate is predicted. In this paper, the probabilistic reasoning of Bayesian network is transformed into data query processing on HBase, and the probability reasoning of large-scale Bayesian network is realized based on MapReduce programming model. We will design the corresponding system, including the following three modules: data preprocessing module, large-scale Bayesian network construction module and advertising click rate prediction module. Finally, based on the real commercial data, the function test and non-function test of the system are carried out.
【學(xué)位授予單位】:云南大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2016
【分類號】:TP311.52
,
本文編號:2079934
本文鏈接:http://sikaile.net/wenyilunwen/guanggaoshejilunwen/2079934.html
最近更新
教材專著