基于Spark的在線欺詐檢測算法設(shè)計(jì)與實(shí)現(xiàn)
發(fā)布時(shí)間:2018-05-26 02:55
本文選題:欺詐檢測 + 不平衡學(xué)習(xí); 參考:《浙江大學(xué)》2017年碩士論文
【摘要】:在大數(shù)據(jù)時(shí)代背景下,電子商務(wù)、第三方支付等線上業(yè)務(wù)爆發(fā)式增長,隨之而來的是日益猖獗的線上欺詐案件,在線欺詐檢測技術(shù)作為企業(yè)風(fēng)控能力的基石,通過對(duì)業(yè)務(wù)行為建模,更加精準(zhǔn)、高效地識(shí)別欺詐案件,為廣大用戶和線上平臺(tái)挽回?fù)p失、規(guī)避風(fēng)險(xiǎn),發(fā)揮著巨大的作用。由于線上欺詐案件與正常交易的極度不平衡性,在線欺詐檢測需要重點(diǎn)解決不平衡學(xué)習(xí)問題。除此以外,隨著線上業(yè)務(wù)量日益增長,在線欺詐檢測系統(tǒng)作為業(yè)務(wù)系統(tǒng)的核心組件,對(duì)其性能要求也越來越嚴(yán)格,將大數(shù)據(jù)技術(shù)和在線欺詐檢測有機(jī)結(jié)合將極大地提升企業(yè)的風(fēng)控防御能力。本論文從相關(guān)技術(shù)介紹切入,詳細(xì)討論了包括分布式計(jì)算框架Spark,實(shí)時(shí)流計(jì)算組件Spark Streaming在內(nèi)的大數(shù)據(jù)技術(shù),同時(shí)介紹了在線欺詐檢測研究的進(jìn)展。結(jié)合大數(shù)據(jù)背景,本文提出了基于聚類的數(shù)據(jù)集自平衡構(gòu)建算法和分布式資損敏感Lasso算法,將兩者有機(jī)結(jié)合基于Spark分布式計(jì)算框架進(jìn)行了實(shí)現(xiàn),并在實(shí)際在線欺詐檢測數(shù)據(jù)集上進(jìn)行了相關(guān)指標(biāo)的測評(píng)。本論文的主要貢獻(xiàn)有:1)提出了一種基于聚類的數(shù)據(jù)集自平衡增量構(gòu)建算法,利用增量聚類算法度量類內(nèi)樣本的相似度,選擇類內(nèi)具有代表性的多個(gè)樣本點(diǎn)構(gòu)成訓(xùn)練集,在能夠保留時(shí)序數(shù)據(jù)信息的情況下,有效解決在線欺詐檢測數(shù)據(jù)集的類內(nèi)、類間不平衡等問題;2)結(jié)合在線支付欺詐檢測場景,提出了分布式資損敏感Lasso算法,在大數(shù)據(jù)背景下能夠高效地進(jìn)行模型訓(xùn)練,并能有效提高在線欺詐檢測模型的資損率;3)基于Spark分布式計(jì)算框架和Spark Streaming實(shí)時(shí)流處理組件,無縫集成基于聚類的數(shù)據(jù)集自平衡增量構(gòu)建算法和分布式資損敏感Lasso算法,驗(yàn)證了上述方法在大數(shù)據(jù)背景下的在線欺詐檢測場景的有效性。
[Abstract]:Under the background of big data era, e-commerce, third-party payment and other online business explosive growth, followed by the increasingly rampant online fraud cases, online fraud detection technology as the cornerstone of enterprise wind control capacity, Through the modeling of business behavior, more accurate and efficient identification of fraud cases, for the vast number of users and online platforms to recover losses, avoid risks, play a huge role. Because of the extreme imbalance between online fraud cases and normal transactions, online fraud detection needs to focus on solving the imbalance learning problem. In addition, with the increasing volume of online business, the online fraud detection system, as the core component of the business system, has become more and more stringent in its performance requirements. The combination of big data technology and online fraud detection will greatly improve the ability of wind control defense. This paper discusses the big data technology including the distributed computing framework (Spark), the real-time stream computing component (Spark Streaming), and the research progress of online fraud detection. Based on the background of big data, this paper proposes a clustering based self-balancing algorithm for data sets and a distributed loss-sensitive Lasso algorithm. The two algorithms are implemented based on the distributed computing framework of Spark. The related indexes are evaluated on the actual online fraud detection data set. The main contributions of this paper are: (1) A clustering based self-balanced incremental algorithm is proposed. Using the incremental clustering algorithm to measure the similarity of samples within a class, a training set is constructed by selecting a number of representative sample points in the class. This paper proposes a distributed loss-sensitive Lasso algorithm based on the on-line payment fraud detection scenario, which can effectively solve the problems of in-class and inter-class imbalance in online fraud detection data set. Under the background of big data, model training can be carried out efficiently, and the capital loss rate of online fraud detection model can be improved effectively. It is based on Spark distributed computing framework and Spark Streaming real-time stream processing module. The clustering based self-balanced incremental construction algorithm and the distributed capital-loss sensitive Lasso algorithm are seamlessly integrated to verify the effectiveness of the above methods in the online fraud detection scenario under the background of big data.
【學(xué)位授予單位】:浙江大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:TP311.13
【參考文獻(xiàn)】
相關(guān)期刊論文 前3條
1 孫大為;張廣艷;鄭緯民;;大數(shù)據(jù)流式計(jì)算:關(guān)鍵技術(shù)及系統(tǒng)實(shí)例[J];軟件學(xué)報(bào);2014年04期
2 李國杰;程學(xué)旗;;大數(shù)據(jù)研究:未來科技及經(jīng)濟(jì)社會(huì)發(fā)展的重大戰(zhàn)略領(lǐng)域——大數(shù)據(jù)的研究現(xiàn)狀與科學(xué)思考[J];中國科學(xué)院院刊;2012年06期
3 陳建增;;第三方支付業(yè)務(wù)的反欺詐措施與技術(shù)探析[J];時(shí)代金融;2012年21期
相關(guān)碩士學(xué)位論文 前1條
1 魏吉勇;B2B平臺(tái)的反欺詐問題研究[D];南京大學(xué);2014年
,本文編號(hào):1935671
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/1935671.html
最近更新
教材專著