基于spark的網(wǎng)絡(luò)廣告交易計(jì)費(fèi)系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)
發(fā)布時(shí)間:2018-03-22 05:16
本文選題:分布式系統(tǒng) 切入點(diǎn):廣告計(jì)費(fèi) 出處:《哈爾濱工業(yè)大學(xué)》2016年碩士論文 論文類型:學(xué)位論文
【摘要】:近年來(lái),網(wǎng)絡(luò)廣告市場(chǎng)規(guī)模發(fā)展迅速,各大互聯(lián)網(wǎng)公司都在布局自己的網(wǎng)絡(luò)廣告交易平臺(tái)。計(jì)費(fèi)系統(tǒng)是整個(gè)網(wǎng)絡(luò)廣告交易流程中重要的、不可或缺的一環(huán)。本文根據(jù)實(shí)際的業(yè)務(wù)需求,設(shè)計(jì)并實(shí)現(xiàn)了網(wǎng)絡(luò)廣告交易計(jì)費(fèi)系統(tǒng),來(lái)支持廣告交易平臺(tái)的計(jì)費(fèi)需求。本文使用java、scala語(yǔ)言開發(fā)了廣告計(jì)費(fèi)系統(tǒng),主要研究?jī)?nèi)容分為廣告反作弊和廣告計(jì)費(fèi)。廣告反作弊部分用來(lái)判定作弊的廣告,系統(tǒng)對(duì)作弊的廣告不扣費(fèi),保護(hù)廣告主的利益。本文提出了基于統(tǒng)計(jì)方法的作弊判定規(guī)則,來(lái)過濾作弊的廣告;為避免單純的統(tǒng)計(jì)方法判定結(jié)果過于武斷,提出了打分算法來(lái)計(jì)算廣告的作弊可能性,從而實(shí)現(xiàn)對(duì)作弊廣告的平滑過濾。廣告計(jì)費(fèi)部分基于spark實(shí)現(xiàn)。spark是一個(gè)基于內(nèi)存的、可擴(kuò)展、可容錯(cuò)的分布式計(jì)算框架。它在分布式的環(huán)境下處理廣告數(shù)據(jù),過濾作弊的廣告,計(jì)算扣費(fèi)金額,生成扣費(fèi)日志,充分利用分布式系統(tǒng)高效,容錯(cuò)等特點(diǎn),提供可擴(kuò)展、高可用的計(jì)費(fèi)服務(wù)。為了避免分布式系統(tǒng)內(nèi),單個(gè)結(jié)點(diǎn)壓力過大而導(dǎo)致整個(gè)任務(wù)變慢的情況,提出對(duì)大規(guī)模數(shù)據(jù)進(jìn)行分片的解決方案,使得每個(gè)分片內(nèi)的數(shù)據(jù)量都在一個(gè)合理范圍內(nèi),數(shù)據(jù)可以平均分布到各個(gè)結(jié)點(diǎn)上。為了解決網(wǎng)絡(luò)訪問中的性能瓶頸,通過異步接口提升系統(tǒng)性能。由于系統(tǒng)中處理的數(shù)據(jù)都跟錢有關(guān),系統(tǒng)出現(xiàn)故障將直接導(dǎo)致計(jì)費(fèi)的損失。為了盡量減少損失、規(guī)避風(fēng)險(xiǎn),系統(tǒng)內(nèi)進(jìn)行了多項(xiàng)指標(biāo)的監(jiān)控,出現(xiàn)異?梢约皶r(shí)告警。經(jīng)過測(cè)試和實(shí)際的線上運(yùn)行,證明本系統(tǒng)可以對(duì)作弊廣告進(jìn)行有效過濾,每天處理億級(jí)的廣告數(shù)據(jù),而且系統(tǒng)的設(shè)計(jì)性能高于線上的平均負(fù)載流量,可以應(yīng)對(duì)短時(shí)間的數(shù)據(jù)尖峰。整個(gè)處理過程中,重要的數(shù)據(jù)指標(biāo)有監(jiān)控,關(guān)鍵操作有日志記錄,萬(wàn)一出現(xiàn)異常方便排查問題。系統(tǒng)具有可擴(kuò)展、可容錯(cuò)、高可用的特點(diǎn),很好地支持了廣告計(jì)費(fèi)的需求,具有較高的實(shí)用價(jià)值。
[Abstract]:In recent years, the scale of the online advertising market has developed rapidly, and all the major Internet companies are laying out their own online advertising trading platforms. The billing system is important in the entire network advertising transaction process. According to the actual business requirements, this paper designs and implements a network advertising transaction billing system to support the billing requirements of advertising trading platform. This paper uses Java Scala language to develop an advertising billing system. The main content of the study is divided into anti-cheating and advertising billing. The anti-cheating part of advertising is used to determine the cheating ads, the system does not charge the cheating ads, so as to protect the interests of advertisers. In order to avoid the simple statistical method to judge the results too arbitrary, a scoring algorithm is proposed to calculate the likelihood of cheating. Advertising billing part based on spark implementation. Spark is a memory-based, extensible, fault-tolerant distributed computing framework. It processes advertising data in a distributed environment and filters cheating ads. Calculate deduction amount, generate deduction log, make full use of the characteristics of distributed system, such as high efficiency, fault tolerance, provide scalable and highly available billing services. When the pressure on a single node is too great to slow down the whole task, a solution is proposed to divide the large scale data into pieces, so that the amount of data in each slice is within a reasonable range. Data can be distributed evenly among nodes. In order to solve the performance bottleneck in network access, the asynchronous interface is used to improve the performance of the system. Because the data processed in the system is related to money, The failure of the system will directly lead to the loss of accounting. In order to minimize the loss and avoid the risk, the system has carried on the monitoring of many indexes, and the abnormal can be alerted in time. It is proved that the system can filter the cheating advertisement effectively, deal with the ad data of 100 million level every day, and the design performance of the system is higher than the average load flow on the line, which can deal with the data spike of short time. The important data index has the monitoring, the key operation has the log record, in case of the unusual convenient checking problem. The system has the characteristics of expandability, fault-tolerance, high availability, which supports the demand of advertisement charging well, and has high practical value.
【學(xué)位授予單位】:哈爾濱工業(yè)大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2016
【分類號(hào)】:TP311.52
,
本文編號(hào):1647227
本文鏈接:http://sikaile.net/wenyilunwen/guanggaoshejilunwen/1647227.html
最近更新
教材專著