電商金融大數(shù)據(jù)價(jià)值提取與空間關(guān)聯(lián)挖掘應(yīng)用研究
本文選題:電商金融 + 大數(shù)據(jù)征信 ; 參考:《江西理工大學(xué)》2017年碩士論文
【摘要】:隨著搜索引擎、云計(jì)算、人工智能這些新興技術(shù)的成熟和普及,人類在日常中產(chǎn)生的數(shù)據(jù)量出現(xiàn)了前所未有的爆發(fā)式增長,催生了“大數(shù)據(jù)”時(shí)代的到來。在這種背景下,互聯(lián)網(wǎng)與傳統(tǒng)金融業(yè)的“碰撞”使得互聯(lián)網(wǎng)金融應(yīng)運(yùn)而生;ヂ(lián)網(wǎng)金融的誕生滿足了中小微企業(yè)和大眾金融消費(fèi)者的需求,彌補(bǔ)了傳統(tǒng)金融機(jī)構(gòu)的不足,為普惠金融的發(fā)展提供新的思路。其中,以電子商務(wù)平臺(tái)為核心的電商金融在所有互聯(lián)網(wǎng)金融模式中影響最大,引起了整個(gè)行業(yè)和社會(huì)的高度關(guān)注。電商金融行業(yè)本身就是一個(gè)基于數(shù)據(jù)的產(chǎn)業(yè),行業(yè)內(nèi)擁有著大量的多源異構(gòu)數(shù)據(jù),一方面是自身內(nèi)部電商平臺(tái)的海量歷史交易數(shù)據(jù);另一方面是互聯(lián)網(wǎng)和社交媒體上的外部數(shù)據(jù)。因此,如何具備從電商金融大數(shù)據(jù)中提取和挖掘所蘊(yùn)含數(shù)據(jù)價(jià)值的能力將決定未來整個(gè)電商金融行業(yè)的競爭力。本文針對(duì)上述問題,在分析電商金融大數(shù)據(jù)特征及價(jià)值、國內(nèi)外基于空間關(guān)聯(lián)規(guī)則的挖掘方法以及大數(shù)據(jù)挖掘研究現(xiàn)狀的基礎(chǔ)上,采用分布式搜索引擎技術(shù),定制網(wǎng)絡(luò)爬蟲從電商金融行業(yè)的多源異構(gòu)數(shù)據(jù)中獲取所需要的銀行卡和淘寶店鋪數(shù)據(jù),設(shè)計(jì)相應(yīng)的Spark并行算法對(duì)數(shù)據(jù)預(yù)處理,建立倒排表和二級(jí)索引文件,為后面的大數(shù)據(jù)分析平臺(tái)提供數(shù)據(jù)源。確定數(shù)據(jù)來源后,運(yùn)用MECE分析法并結(jié)合行業(yè)內(nèi)多位金融業(yè)務(wù)專家評(píng)分得到企業(yè)信用風(fēng)險(xiǎn)評(píng)價(jià)候選指標(biāo)集及量化方法,分析指標(biāo)相關(guān)性和風(fēng)險(xiǎn)定級(jí)。接著,利用大數(shù)據(jù)機(jī)器學(xué)習(xí)庫中的隨機(jī)森林算法對(duì)候選指標(biāo)集特征選擇,設(shè)計(jì)基于Hash結(jié)構(gòu)的多級(jí)空間關(guān)聯(lián)規(guī)則算法來挖掘企業(yè)風(fēng)險(xiǎn)信息,構(gòu)建出信用風(fēng)險(xiǎn)評(píng)估與智能預(yù)警模型。最后,將機(jī)器學(xué)習(xí)、挖掘算法庫、信用風(fēng)險(xiǎn)評(píng)估與智能預(yù)警模型、大數(shù)據(jù)存儲(chǔ)與分布式計(jì)算能力進(jìn)行封裝,搭建基于Spark on YARN的電商金融大數(shù)據(jù)分析平臺(tái),對(duì)所研究模型的準(zhǔn)確度和平臺(tái)實(shí)用性進(jìn)行驗(yàn)證。以淘寶平臺(tái)某旗艦店一年的日常經(jīng)營數(shù)據(jù)、銀行卡資金往來數(shù)據(jù)和管理層群體數(shù)據(jù)作為數(shù)據(jù)源,利用電商金融大數(shù)據(jù)分析平臺(tái)對(duì)店鋪進(jìn)行經(jīng)營行為分析,提供信用風(fēng)險(xiǎn)評(píng)估與審批授信和貸后風(fēng)險(xiǎn)預(yù)警管理服務(wù),證明構(gòu)建的信用風(fēng)險(xiǎn)評(píng)估與智能預(yù)警模型能夠達(dá)到預(yù)期要求,具有較高的可信度。
[Abstract]:With the maturity and popularization of search engine, cloud computing and artificial intelligence, the amount of data generated by human beings in the daily life has increased dramatically, and the era of "big data" has come into being. Under this background, the collision between Internet and traditional financial industry makes Internet finance emerge as the times require. The birth of Internet finance meets the needs of small and medium-sized enterprises and consumers of popular finance, makes up for the shortcomings of traditional financial institutions, and provides a new way of thinking for the development of inclusive finance. Among them, the electronic commerce finance with the electronic commerce platform as the core has the biggest influence in all the Internet finance models, which has aroused the high attention of the whole industry and the society. E-commerce finance industry itself is a data-based industry, the industry has a large number of multi-source heterogeneous data, on the one hand, the internal e-commerce platform of the massive historical transaction data; On the other hand are external data on the Internet and social media. Therefore, how to extract and mine the data value from the e-commerce finance big data will determine the competitiveness of the entire e-commerce finance industry in the future. In this paper, based on the analysis of the characteristics and value of big data in e-commerce finance, the mining methods based on spatial association rules and the current situation of big data mining, the distributed search engine technology is adopted in this paper. The customized web crawler acquires the bank card and Taobao store data from the multi-source heterogeneous data of the e-commerce finance industry, designs the corresponding Spark parallel algorithm to preprocess the data, and establishes the inverted list and the secondary index file. Provide data sources for later big data analysis platforms. After the data source is determined, the enterprise credit risk evaluation candidate index set and quantitative method are obtained by using MECE analysis method and combining with the score of many financial business experts in the industry, and the correlation and risk grading of the index are analyzed. Then, using the stochastic forest algorithm in big data machine learning library to select the feature of candidate index set, a multi-level spatial association rule algorithm based on Hash structure is designed to mine enterprise risk information, and a credit risk assessment and intelligent early warning model is constructed. Finally, the machine learning, mining algorithm library, credit risk assessment and intelligent early warning model, big data storage and distributed computing ability are encapsulated, and the big data analysis platform of e-commerce finance based on Spark on YARN is built. The accuracy and practicability of the model are verified. Taking the daily management data of a flagship store on Taobao platform, bank card fund data and management group data as the data source, the big data analysis platform of e-commerce finance is used to analyze the business behavior of the store. It is proved that the established credit risk assessment and intelligent early-warning model can meet the expected requirements and have a high credibility by providing the services of credit risk assessment and approval and post-loan risk early warning management.
【學(xué)位授予單位】:江西理工大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:TP311.13
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 周成虎;;大數(shù)據(jù)時(shí)代的空間數(shù)據(jù)價(jià)值——《空間數(shù)據(jù)挖掘理論與應(yīng)用》評(píng)介[J];地理學(xué)報(bào);2016年07期
2 王偉;陳偉;祝效國;王洪偉;;眾籌融資成功率與語言風(fēng)格的說服性——基于Kickstarter的實(shí)證研究[J];管理世界;2016年05期
3 朱宇峰;蘭小機(jī);康俊鋒;;旅游地理信息垂直搜索引擎及應(yīng)用研究[J];測(cè)繪科學(xué);2016年05期
4 侯敬文;程功勛;;大數(shù)據(jù)時(shí)代我國金融數(shù)據(jù)的服務(wù)創(chuàng)新[J];財(cái)經(jīng)科學(xué);2015年10期
5 劉國平;;電商金融的運(yùn)作模式與商業(yè)銀行的應(yīng)對(duì)策略[J];新金融;2015年08期
6 武若楠;;基于以電商平臺(tái)為核心的互聯(lián)網(wǎng)金融研究[J];時(shí)代金融;2015年11期
7 李清泉;李德仁;;大數(shù)據(jù)GIS[J];武漢大學(xué)學(xué)報(bào)(信息科學(xué)版);2014年06期
8 郭遲;劉經(jīng)南;方媛;羅夢(mèng);崔競松;;位置大數(shù)據(jù)的價(jià)值提取與協(xié)同挖掘方法[J];軟件學(xué)報(bào);2014年04期
9 于艷華;宋美娜;;大數(shù)據(jù)[J];中興通訊技術(shù);2013年01期
10 路永和;李焰鋒;;改進(jìn)TF-IDF算法的文本特征項(xiàng)權(quán)值計(jì)算方法[J];圖書情報(bào)工作;2013年03期
相關(guān)會(huì)議論文 前1條
1 呂琳;朱東華;劉玉琴;;面向數(shù)據(jù)倉庫的數(shù)據(jù)預(yù)處理研究綜述[A];2007年中國智能自動(dòng)化會(huì)議論文集[C];2007年
相關(guān)碩士學(xué)位論文 前5條
1 張日金;我國P2P網(wǎng)絡(luò)借貸風(fēng)險(xiǎn)控制研究[D];浙江大學(xué);2015年
2 樊嘉麒;基于大數(shù)據(jù)的數(shù)據(jù)挖掘引擎[D];北京郵電大學(xué);2015年
3 談浩;互聯(lián)網(wǎng)金融和小額貸款研究[D];上海交通大學(xué);2013年
4 楊宸鑄;基于HADOOP的數(shù)據(jù)挖掘研究[D];重慶大學(xué);2010年
5 方剛;空間關(guān)聯(lián)規(guī)則挖掘算法的研究與應(yīng)用[D];電子科技大學(xué);2009年
,本文編號(hào):1959906
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/1959906.html