分布式多數(shù)據(jù)源電商數(shù)據(jù)融合分析系統(tǒng)
[Abstract]:With the popularity of the Internet, the popularization of mobile intelligent terminals and the rapid development of the logistics industry, e-commerce has become an important part of the people's life and the national economy. As a shopping carrier, e-commerce platform carries a large number of valuable data. From the e-commerce data, it can not only restore the environment of the user's network shopping, but also the analysis network. The influence of the collaterals shopping environment on the behavior of the users can also analyze the behavior rules of the commodity market, give the behavior suggestions for the merchants and analyze the national economic situation, and have high research value. The data analysis and mining of e-commerce is the process of analyzing and mining the e-commerce data to obtain valuable information. The data analysis and mining of e-commerce is a number of data mining. In the process of data analysis and mining of e-commerce, there are several problems to be solved in the process of data analysis and mining of e-commerce: data acquisition, preprocessing, lack of direct connection between multiple data sources, low credibility and integrity of single e-commerce data, lack of fusion analysis of multi data source data, and single computer data. The mining system can not deal with the demand of mass data processing of e-commerce. It needs to apply distributed data mining system. At the same time, some common data mining algorithms have low efficiency in distributed implementation. The main work points of this paper are divided into 3 points: (1) aiming at the special point of e-commerce data, it is pertinent and specific to e-commerce data. Data analysis and mining work. In this paper, from the definition of e-commerce data and data acquisition, the data types included in the e-commerce site are analyzed. According to the analysis requirements, the required data are collected, and the data storage format is designed. The data include more semi-structured, unstructured data, unstandardized data and large data noise. According to the characteristics of the data preprocessing, the solution is made to ensure that the data has better data quality. At the same time, the data mining methods such as association analysis, clustering, linear regression, artificial neural network and other data mining methods are used to analyze and excavate the e-commerce data. (2) a method of data fusion for multi data source is designed and implemented, and different electricity is used for different electricity. The commercial website data is used for data fusion, and the fusion data are used in data mining. This paper analyzes the structural features of commercial information on e-commerce sites, designs a method of multi e-commerce data fusion according to its characteristics, and extracts commodity name, commodity attribute name and commodity attribute content by preprocessing and text analysis of e-commerce data. The unsupervised learning algorithm is designed, which can learn and match the data according to the characteristics of the seeds in the case of the unknown relation of the commodity parameters of the different data sources, and use a variety of commodity parameters to gradually find the matching goods and commodity parameters, and reduce the amount of calculation of data fusion, while comparing with the single parameter. The results obtained by data fusion can improve the accuracy of the unity of the commodity entities, and can flexibly set the standard of the same goods, get the matching results under different standards. And use the data after the fusion to predict the data. Compared with the use of single data source data, the accuracy of the prediction results has been improved. (3) the Hadoop based classification is designed. The implementation of hierarchical cluster data mining system is improved and realized under Hadoop. The characteristics of distributed computing architecture are analyzed. A distributed data analysis mining system based on Hadoop is designed. The traditional hierarchical clustering which is caused by Hadoop is not friendly to the iteration, and the hierarchical clustering has high overlapping times in Hadoop. According to the principle of hierarchical clustering algorithm and the structure characteristics of Hadoop, the improved hierarchical clustering is designed. Under the condition of monotonous increasing distance between classes, it can not change the clustering results, and can aggregate many classes in a cluster process, reduce the number of iterations, and can greatly improve the level of hierarchical clustering under the Hadoop. At the same time, the feasibility of the method is verified by using the hierarchical clustering to calculate the similarity between the goods and then use the hierarchical clustering to calculate the similarity between the goods under the condition of the lack of multi-dimensional feature information.
【學(xué)位授予單位】:北京郵電大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2016
【分類號】:TP311.13
【相似文獻】
相關(guān)期刊論文 前10條
1 李玲玲;;關(guān)于凝聚型層次聚類時間復(fù)雜度的研究[J];宿州學(xué)院學(xué)報;2011年02期
2 潘大慶;;基于層次聚類的微博敏感話題檢測算法研究[J];廣西民族大學(xué)學(xué)報(自然科學(xué)版);2012年04期
3 鄭曉鳴;呂士穎;王曉東;;一種基于隨機抽取的有限深度層次聚類[J];鄭州大學(xué)學(xué)報(理學(xué)版);2007年03期
4 湯周文;葉東毅;;基于層次聚類的差異化屬性約簡算法[J];計算機應(yīng)用;2009年02期
5 文順;趙杰煜;朱紹軍;;基于貝葉斯和諧度的層次聚類[J];模式識別與人工智能;2013年12期
6 龔尚福;陳婉璐;賈澎濤;;層次聚類社區(qū)發(fā)現(xiàn)算法的研究[J];計算機應(yīng)用研究;2013年11期
7 香紅麗;王瀟涵;羅淑云;;基于層次聚類方法研究課程關(guān)系結(jié)構(gòu)[J];中國科教創(chuàng)新導(dǎo)刊;2011年26期
8 李曉飛;;基于動態(tài)層次聚類的離散化算法的研究[J];計算機應(yīng)用與軟件;2009年10期
9 張闊,徐鵬,李涓子,王克宏;基于優(yōu)化層次聚類的文檔邏輯結(jié)構(gòu)抽取[J];清華大學(xué)學(xué)報(自然科學(xué)版);2005年04期
10 王旅;彭宏;胡勁松;梁華芳;;層次聚類在種群親緣關(guān)系研究中的應(yīng)用[J];計算機時代;2006年07期
相關(guān)會議論文 前6條
1 吾守爾·斯拉木;吳啟南;;基于層次聚類方法[A];第六屆全國計算機應(yīng)用聯(lián)合學(xué)術(shù)會議論文集[C];2002年
2 彭楠峗;王厚峰;凌晨添;;基于層次聚類的網(wǎng)絡(luò)新聞熱點發(fā)現(xiàn)[A];中國計算語言學(xué)研究前沿進展(2009-2011)[C];2011年
3 楊建武;;Web檢索結(jié)果的層次聚類研究[A];第二十一屆中國數(shù)據(jù)庫學(xué)術(shù)會議論文集(技術(shù)報告篇)[C];2004年
4 劉啟亮;鄧敏;李光強;王佳t,
本文編號:2123692
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2123692.html