基于Hadoop的數(shù)據(jù)挖掘在電商環(huán)境的研究與應(yīng)用
本文選題:數(shù)據(jù)挖掘 切入點(diǎn):關(guān)聯(lián)規(guī)則算法 出處:《湖南大學(xué)》2016年碩士論文
【摘要】:隨著便攜式網(wǎng)絡(luò)接入設(shè)備的飛速發(fā)展以及互聯(lián)網(wǎng)技術(shù)的迭代更新,使得網(wǎng)絡(luò)生態(tài)系統(tǒng)逐漸壯大、活躍,這也使得依托于互聯(lián)網(wǎng)技術(shù)的電子商務(wù)發(fā)展迅速。相較于傳統(tǒng)線下的購(gòu)物方式,線上電子商務(wù)無疑是一種快捷、高效和便利的購(gòu)物方式。近年來井噴的電商購(gòu)物平臺(tái)也很好的印證了這一點(diǎn)。對(duì)于電子商務(wù)平臺(tái)的運(yùn)營(yíng)者來說,如何鞏固現(xiàn)有客戶、拓展?jié)撛诳蛻羰侵刂兄亍;诨ヂ?lián)網(wǎng)時(shí)代快速、海量數(shù)據(jù)的特點(diǎn),本文設(shè)計(jì)將數(shù)據(jù)挖掘技術(shù)應(yīng)用于電商平臺(tái)數(shù)據(jù),一方面,深度發(fā)掘現(xiàn)有客戶的瀏覽、購(gòu)物習(xí)慣,鞏固現(xiàn)有用戶;另一方面,分析潛在用戶行為,獲取其興趣點(diǎn),進(jìn)行定向推送,拓展更多的客戶;陔娚唐脚_(tái)用戶購(gòu)物數(shù)據(jù)之間存在較強(qiáng)的關(guān)聯(lián)性,本文設(shè)計(jì)采用關(guān)聯(lián)規(guī)則算法進(jìn)行數(shù)據(jù)挖掘與分析,達(dá)到鞏固現(xiàn)有用戶,發(fā)掘新用戶的目的。數(shù)據(jù)挖掘的過程就是發(fā)現(xiàn)隱藏在各種尚沒有處理的原始數(shù)據(jù)集合中的各種相關(guān)聯(lián)系,并從這些聯(lián)系中提取知識(shí)的過程。數(shù)據(jù)挖掘是多種計(jì)算機(jī)相關(guān)學(xué)科相結(jié)合的產(chǎn)物,其包含了數(shù)據(jù)庫技術(shù)、計(jì)算機(jī)機(jī)器自主學(xué)習(xí)、數(shù)據(jù)統(tǒng)計(jì)分析、行為模式識(shí)別、人工神經(jīng)網(wǎng)絡(luò)等等學(xué)科。由于其具有很高的商業(yè)使用價(jià)值,同時(shí)適合應(yīng)用的范圍極為廣泛,所以目前數(shù)據(jù)挖掘的相關(guān)研究已成為研究的重點(diǎn)之一。本文以現(xiàn)今互聯(lián)網(wǎng)、大數(shù)據(jù)時(shí)代下的電商平臺(tái)為切入點(diǎn),對(duì)電商平臺(tái)現(xiàn)狀進(jìn)行分析,得出其弊端,即無法應(yīng)對(duì)大數(shù)據(jù)時(shí)代海量無序數(shù)據(jù)的沖擊,容易使平臺(tái)積累無效數(shù)據(jù),造成資源使用率低下,平臺(tái)電商有效轉(zhuǎn)化率低。其次,作者對(duì)某知名電商平臺(tái)的服飾賣家以及家電賣家進(jìn)行了匿名訪談,得出了服裝買家購(gòu)買物品具有較高關(guān)聯(lián)度的結(jié)論。技術(shù)上,本文基于數(shù)據(jù)挖掘技術(shù)提出了一套基于Aprior i的關(guān)聯(lián)規(guī)則算法,并利用Hadoop數(shù)據(jù)庫集群進(jìn)行數(shù)據(jù)處理,相較于傳統(tǒng)的關(guān)系型數(shù)據(jù)庫,Hadoop集群能同時(shí)對(duì)數(shù)據(jù)進(jìn)行處理,大大提高算法工作效率。本文還基于Angular JS、Bootstrap以及Html搭建了一套前端數(shù)據(jù)可視化系統(tǒng)。
[Abstract]:With the rapid development of portable network access devices and the iterative updating of Internet technology, the network ecosystem is gradually expanding and active. This also makes e-commerce based on Internet technology develop rapidly. Compared with traditional offline shopping, online e-commerce is undoubtedly a kind of fast. Efficient and convenient shopping methods. In recent years, the blowout e-commerce shopping platform is also very good proof of this. For e-commerce platform operators, how to consolidate existing customers, Expanding potential customers is the most important thing. Based on the characteristics of fast and massive data in the Internet era, this paper designs and applies data mining technology to e-commerce platform data. On the one hand, it deeply excavates the browsing and shopping habits of existing customers. Consolidation of existing users; on the other hand, analysis of potential user behavior, access to their points of interest, directed push, expand the number of customers. Based on e-commerce platform, there is a strong correlation between user shopping data, In this paper, the association rule algorithm is used for data mining and analysis to consolidate existing users and discover new users. The process of data mining is to discover all kinds of related connections hidden in all kinds of raw data sets that have not yet been processed. The process of extracting knowledge from these links. Data mining is a combination of many computer related disciplines, including database technology, computer machine autonomous learning, data statistical analysis, behavior pattern recognition, Artificial neural network and other disciplines. Because of its high commercial value, and suitable for a wide range of applications, the current data mining related research has become one of the focus of research. Based on the analysis of the current situation of the e-commerce platform in big data's time, the author finds out its disadvantages, that is, it can not cope with the impact of the massive disordered data in the era of big data, which easily makes the platform accumulate invalid data, resulting in the low utilization rate of resources. Secondly, the author conducted anonymous interviews with clothing sellers and home appliance sellers of a well-known e-commerce platform, and drew the conclusion that clothing buyers have a high degree of correlation. In this paper, a set of association rules algorithm based on Aprior I is proposed based on data mining technology, and the data is processed by using Hadoop database cluster. Compared with the traditional relational database cluster, it can process the data at the same time. This paper also builds a front-end data visualization system based on Angular JS bootstrap and Html.
【學(xué)位授予單位】:湖南大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2016
【分類號(hào)】:F724.6;TP311.13
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 何建華;;大數(shù)據(jù)對(duì)企業(yè)戰(zhàn)略決策的影響分析[J];當(dāng)代經(jīng)濟(jì)管理;2014年10期
2 王裕;;基于云平臺(tái)的大數(shù)據(jù)處理流程的關(guān)鍵技術(shù)研究[J];信息技術(shù);2014年09期
3 陶雪嬌;胡曉峰;劉洋;;大數(shù)據(jù)研究綜述[J];系統(tǒng)仿真學(xué)報(bào);2013年S1期
4 程瑩;張?jiān)朴?徐雷;房秉毅;;基于Hadoop及關(guān)系型數(shù)據(jù)庫的海量數(shù)據(jù)分析研究[J];電信科學(xué);2010年11期
5 舒正渝;;淺談數(shù)據(jù)挖掘技術(shù)及其應(yīng)用[J];中國(guó)西部科技;2010年05期
6 石軍;;“感知中國(guó)”促進(jìn)中國(guó)物聯(lián)網(wǎng)加速發(fā)展[J];通信管理與技術(shù);2009年05期
7 胡天濡;;淺談數(shù)據(jù)挖掘與知識(shí)發(fā)現(xiàn)發(fā)展[J];科教文匯(上旬刊);2009年10期
8 鄒艷;;歐洲物流業(yè)發(fā)展趨勢(shì)分析[J];商場(chǎng)現(xiàn)代化;2009年05期
9 洪光英;;數(shù)據(jù)挖掘與商業(yè)決策[J];中國(guó)科技信息;2009年03期
10 胡冰;胡東軍;馬文超;;文本挖掘研究及發(fā)展[J];電腦知識(shí)與技術(shù);2008年31期
,本文編號(hào):1661198
本文鏈接:http://sikaile.net/jingjilunwen/dianzishangwulunwen/1661198.html