天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當前位置:主頁 > 科技論文 > 軟件論文 >

基于Hadoop的通信行業(yè)大數(shù)據(jù)分析挖掘技術(shù)研究與實現(xiàn)

發(fā)布時間:2019-04-12 06:20
【摘要】:隨著信息技術(shù)的發(fā)展,產(chǎn)生的數(shù)據(jù)規(guī)模在急劇擴大,面對如此海量的數(shù)據(jù),數(shù)據(jù)挖掘相關(guān)技術(shù)也隨之發(fā)展。面對海量數(shù)據(jù)既有挑戰(zhàn)也有機遇,如何從如此大量的數(shù)據(jù)中挖掘出有用的信息,是一項具有挑戰(zhàn)性的任務。在通信行業(yè)存在大量的客戶數(shù)據(jù),利用大數(shù)據(jù)相關(guān)技術(shù)對這些數(shù)據(jù)進行分析挖掘,挖掘出潛在的知識,以提高服務體驗是一項有意義的任務。本文在此背景設下所做的工作如下:首先對算法進行了研究和改進,利用聚類算法實現(xiàn)客戶細分,使用決策樹算法進行客戶預測。傳統(tǒng)的K-means算法需要輸入聚類數(shù)目,而而對如此海量數(shù)據(jù)并不清楚數(shù)據(jù)的分布情況,這對使用此算法帶來了困難,針對這些不足,本文對K-means聚類算法進行了改進,實現(xiàn)了一和了DGK-means算法,利用遺傳算法來計算最合適的聚類數(shù)目,同時使用基于密度的思想計算遺傳算法中的適應度函數(shù),提高了算法效率和準確度。使用C4.5決策樹算法構(gòu)造決策樹模型,使用此模型預測未知結(jié)果的數(shù)據(jù),達到客戶預測和客戶挽留的目標。其次使用Hadoop平臺進行大數(shù)據(jù)的分析和挖掘,設計并實現(xiàn)了基于Hadoop的通信行業(yè)大數(shù)據(jù)分析挖掘系統(tǒng),使用HDFS對數(shù)據(jù)進行分布式存儲和MapReduce編程模型對算法進行并行化計算。在算法層對算法分別進行了并行化設計,提高了效率。最后本文使用測試數(shù)據(jù)集對系統(tǒng)和算法的性能進行了驗證,表明設計的DGK-means算法的準確度和效率相比較傳統(tǒng)算法均得到了提高;并行化計算在集群節(jié)點數(shù)目大于2的情況下效率得到了提高,并且隨著集群節(jié)點數(shù)目的增加效率提高越明顯。
[Abstract]:With the development of information technology, the scale of data is expanding rapidly. In the face of such a huge amount of data, data mining technology is also developed. Faced with both challenges and opportunities, how to mine useful information from such a large amount of data is a challenging task. There is a large amount of customer data in the communication industry. It is a meaningful task to analyze and mine these data by using big data's related technology to find out the potential knowledge in order to improve the service experience. Under this background, the work done in this paper is as follows: firstly, the algorithm is studied and improved, the clustering algorithm is used to achieve customer segmentation, and the decision tree algorithm is used to predict the customer. The traditional K-means algorithm needs to input the number of clusters, but for such a large amount of data does not know the distribution of the data, which brings difficulties to use this algorithm, in view of these shortcomings, this paper has improved the K-means clustering algorithm. The one-sum DGK-means algorithm is implemented. The genetic algorithm is used to calculate the most suitable number of clusters, and the fitness function of the genetic algorithm is calculated by using the density-based idea, which improves the efficiency and accuracy of the algorithm. The C4.5 decision tree algorithm is used to construct the decision tree model. The model is used to predict the data of unknown results to achieve the goal of customer prediction and customer retention. Secondly, the Hadoop platform is used to analyze and mine big data, and the big data analysis and mining system based on Hadoop is designed and implemented. HDFS is used for distributed storage of data and MapReduce programming model is used for parallel calculation of the algorithm. In the algorithm layer, the parallel design of the algorithm is carried out to improve the efficiency. Finally, the test data set is used to verify the performance of the system and the algorithm. It is shown that the accuracy and efficiency of the designed DGK-means algorithm are improved compared with the traditional algorithm. The efficiency of parallel computing is improved when the number of cluster nodes is greater than 2, and the efficiency increases more obviously with the increase of the number of cluster nodes.
【學位授予單位】:北京郵電大學
【學位級別】:碩士
【學位授予年份】:2016
【分類號】:TP311.13

【參考文獻】

相關(guān)期刊論文 前10條

1 牛怡晗;海沫;;Hadoop平臺下Mahout聚類算法的比較研究[J];計算機科學;2015年S1期

2 張引;陳敏;廖小飛;;大數(shù)據(jù)應用的現(xiàn)狀與展望[J];計算機研究與發(fā)展;2013年S2期

3 王元卓;靳小龍;程學旗;;網(wǎng)絡大數(shù)據(jù):現(xiàn)狀與展望[J];計算機學報;2013年06期

4 張石磊;武裝;;一種基于Hadoop云計算平臺的聚類算法優(yōu)化的研究[J];計算機科學;2012年S2期

5 彭凱;秦永彬;許道云;;應用因子分析和K-MEANS聚類的客戶分群建模[J];計算機科學;2011年05期

6 山拜·達拉拜;曹紅麗;尤努斯·艾沙;;基于遺傳算法的K-means初始化EM算法及聚類應用[J];現(xiàn)代電子技術(shù);2010年15期

7 雷小鋒;謝昆青;林帆;夏征義;;一種基于K-Means局部最優(yōu)性的高效聚類算法[J];軟件學報;2008年07期

8 劉光遠;苑森淼;董立巖;;數(shù)據(jù)挖掘方法在用戶流失預測分析中的應用[J];計算機工程與應用;2007年09期

9 張賓;賀昌政;;自組織數(shù)據(jù)挖掘方法研究綜述[J];哈爾濱工業(yè)大學學報;2006年10期

10 吳志勇;吳躍;;數(shù)據(jù)挖掘在電信業(yè)中的應用研究[J];計算機應用;2005年S1期

相關(guān)碩士學位論文 前1條

1 黎光譜;改進K-Means聚類算法在基于Hadoop平臺的圖像檢索系統(tǒng)中的研究與實現(xiàn)[D];廈門大學;2014年



本文編號:2456765

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2456765.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶b668f***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com