基于多核集群的電子商務(wù)應(yīng)用并行化研究
本文選題:并行計算 + 云計算; 參考:《遼寧師范大學(xué)》2013年碩士論文
【摘要】:并行計算的出現(xiàn)為單機(jī)時代計算瓶頸帶來了劃時代的解決方案,同時帶動了并行集群的發(fā)展。如今隨著計算機(jī)集群的逐步發(fā)展,并行平臺的種類也逐漸增多,各個平臺有特有的并行優(yōu)勢。云計算是一種基于因特網(wǎng)的超級計算模式,它將計算任務(wù)分割分布在大量計算機(jī)構(gòu)成的集群上,獲取超強(qiáng)計算能力、存儲空間和信息服務(wù),是目前最為流行的計算模式。 近年來,電子商務(wù)市場從賣家市場轉(zhuǎn)向買家市場,促使電商之間激烈的競爭。客戶關(guān)系維系是企業(yè)發(fā)展的立足點,是企業(yè)盈利的前提。企業(yè)想要對客戶進(jìn)行準(zhǔn)確的分析就要對客戶進(jìn)行分類,,傳統(tǒng)的分類方法是基于經(jīng)驗歸類或簡單統(tǒng)計的方法,然而在面對海量數(shù)據(jù)時,單機(jī)計算能力舉步維艱。本文針對此,將并行計算的思想引入到電子商務(wù)客戶分類研究領(lǐng)域來解決上述問題。設(shè)計了多數(shù)據(jù)表關(guān)聯(lián)算法對數(shù)據(jù)進(jìn)行預(yù)處理,把從電子商務(wù)網(wǎng)站上獲得的商品信息和歷史交易記錄中的數(shù)據(jù)關(guān)聯(lián)在一起,轉(zhuǎn)換成適合數(shù)據(jù)挖掘的形式。同時設(shè)計了客戶分類方法,選取FCM模糊聚類算法對預(yù)處理后的客戶數(shù)據(jù)分析。 數(shù)據(jù)表關(guān)聯(lián)的傳統(tǒng)方式是使用本地并行數(shù)據(jù)庫,但面臨多個因特網(wǎng)下的海量數(shù)據(jù)表關(guān)聯(lián)時力不從心。云計算模式的Hadoop集群能解決因特網(wǎng)下的海量數(shù)據(jù)表關(guān)聯(lián)問題,Hadoop集群的高效性適用于大型數(shù)據(jù)密集型任務(wù)的計算,應(yīng)用于諸多領(lǐng)域。本文基于Hadoop集群實現(xiàn)了多個海量數(shù)據(jù)表的關(guān)聯(lián),并對實驗數(shù)據(jù)進(jìn)行詳盡對比,對比結(jié)果表明Hadoop集群的高性能性在處理海量數(shù)據(jù)表連接方面有明顯的并行效率。 數(shù)據(jù)預(yù)處理之后的數(shù)據(jù)分析中,選取了多元統(tǒng)計分析中應(yīng)用廣泛的FCM模糊聚類分析算法對客戶數(shù)據(jù)分類。與傳統(tǒng)的基于經(jīng)驗或簡單統(tǒng)計的分類方法相比,拓寬了指標(biāo)體系,由單一指標(biāo)拓寬到客戶消費(fèi)模式多個指標(biāo)。以凡客誠品的交易數(shù)據(jù)為例進(jìn)行試驗,按照客戶消費(fèi)模式將客戶分為四類:優(yōu)質(zhì)客戶、一般客戶、小客戶及潛在客戶。實驗結(jié)果驗證了FCM算法對數(shù)據(jù)聚類效果及MATLAB多核并行集群在并行處理復(fù)雜算法的高效性。 本文設(shè)計的方法可以運(yùn)用到金融領(lǐng)域的大規(guī)模數(shù)據(jù)處理及客戶分類分析中,具有一定的應(yīng)用價值。
[Abstract]:The emergence of parallel computing has brought about epoch-making solutions for the bottleneck of computing in the single machine era, and has also driven the development of parallel clusters.Nowadays, with the development of computer cluster, the variety of parallel platforms is increasing, and each platform has its own parallel advantages.Cloud computing is a kind of supercomputing mode based on Internet. It divides computing tasks into a large number of computer clusters to obtain super computing power, storage space and information services, which is the most popular computing mode at present.In recent years, e-commerce market from seller market to buyer market, promote fierce competition between e-commerce.Customer relationship maintenance is the foothold of enterprise development and the premise of enterprise profit.The traditional classification method is based on experience or simple statistics, but in the face of massive data, the single machine computing ability is difficult.In this paper, the idea of parallel computing is introduced into the field of customer classification in e-commerce to solve the above problems.A multi-data table association algorithm is designed to preprocess the data and correlate the commodity information obtained from the e-commerce website with the data from the historical transaction record and convert it into a form suitable for data mining.At the same time, the customer classification method is designed, and the FCM fuzzy clustering algorithm is selected to analyze the customer data after preprocessing.The traditional way of data table association is to use local parallel database, but it is difficult to associate large amount of data table under multiple Internet.The Hadoop cluster based on cloud computing model can solve the problem of massive data table association under the Internet. The high efficiency of Hadoop cluster is suitable for the computation of large data intensive tasks and applied in many fields.Based on the Hadoop cluster, this paper realizes the association of many massive data tables, and compares the experimental data in detail. The comparison results show that the high performance of Hadoop cluster has obvious parallel efficiency in dealing with the connection of massive data tables.In the data analysis after data preprocessing, FCM fuzzy cluster analysis algorithm, which is widely used in multivariate statistical analysis, is selected to classify customer data.Compared with the traditional classification method based on experience or simple statistics, the index system is broadened from single index to multiple indexes of customer consumption pattern.Taking VANCL's transaction data as an example, the customers are divided into four categories according to customer consumption pattern: high quality customers, general customers, small customers and potential customers.The experimental results verify the effectiveness of FCM algorithm in data clustering and the efficiency of MATLAB multi-core parallel cluster in parallel processing complex algorithm.The method designed in this paper can be applied to large-scale data processing and customer classification analysis in the field of finance.
【學(xué)位授予單位】:遼寧師范大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2013
【分類號】:TP311.13
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 王友明;多元統(tǒng)計分析方法及其在經(jīng)濟(jì)研究中的應(yīng)用[J];安徽水利水電職業(yè)技術(shù)學(xué)院學(xué)報;2003年02期
2 尹世久;吳林海;劉梅;;消費(fèi)者網(wǎng)絡(luò)購物影響因素分析[J];商業(yè)研究;2009年08期
3 楊雷;胡煒薇;楊莘元;卓志敏;;多目標(biāo)聚類融合跟蹤中的特征信息利用[J];彈箭與制導(dǎo)學(xué)報;2007年02期
4 林大云;;基于Hadoop的微博信息挖掘[J];計算機(jī)光盤軟件與應(yīng)用;2012年01期
5 司錫才;陳玉坤;李志剛;;數(shù)據(jù)關(guān)聯(lián)算法的研究[J];哈爾濱工程大學(xué)學(xué)報;2007年07期
6 劉江;趙衛(wèi)國;李小龍;周艷;;多元統(tǒng)計分析在產(chǎn)品設(shè)計要素分析中的應(yīng)用[J];機(jī)電產(chǎn)品開發(fā)與創(chuàng)新;2007年05期
7 向小軍;高陽;商琳;楊育彬;;基于Hadoop平臺的海量文本分類的并行化[J];計算機(jī)科學(xué);2011年10期
8 程苗;陳華平;;基于Hadoop的Web日志挖掘[J];計算機(jī)工程;2011年11期
9 張軍偉;王念濱;黃少濱;
本文編號:1751431
本文鏈接:http://sikaile.net/jingjilunwen/dianzishangwulunwen/1751431.html