基于云平臺的關(guān)聯(lián)規(guī)則算法優(yōu)化及應用研究

發(fā)布時間：2018-04-24 10:02

本文選題：云計算 + 數(shù)據(jù)挖掘��；參考：《河南工業(yè)大學》2017年碩士論文

【摘要】：隨著互聯(lián)網(wǎng)的快速發(fā)展,網(wǎng)絡(luò)已深入到生活的方方面面�；ヂ�(lián)網(wǎng)豐富、方便了大眾的生活,甚至一定程度上改變了人們的工作方式。隨著互聯(lián)網(wǎng)技術(shù)的廣泛應用,后臺產(chǎn)生的數(shù)據(jù)信息規(guī)模呈現(xiàn)海量化。如何在大數(shù)據(jù)中挖掘出價值信息得到了各行業(yè)的關(guān)注。從大規(guī)模噪雜的的數(shù)據(jù)集合中挖掘出事物之間的關(guān)聯(lián)規(guī)則是數(shù)據(jù)挖掘技術(shù)中一個較為廣泛的應用。但是傳統(tǒng)的單機數(shù)據(jù)挖掘無法實現(xiàn)對海量數(shù)據(jù)的全面分析,云計算的出現(xiàn)為數(shù)據(jù)挖掘行業(yè)提出了新思路。Apache基金會研發(fā)的Hadoop云平臺降低了云計算開發(fā)的技術(shù)門檻。將云平臺的并行計算技術(shù)與改進后的關(guān)聯(lián)規(guī)則算法相結(jié)合,能夠更好地實現(xiàn)對海量數(shù)據(jù)的挖掘操作,得出蘊含在數(shù)據(jù)集中的信息規(guī)律,從而為商業(yè)應用提供出更好地決策。本文以傳統(tǒng)的Apriori算法為研究的理論基礎(chǔ),通過分析算法的執(zhí)行流程找出可優(yōu)化的關(guān)鍵點,對算法進行了相應的改進,將改進后的Apriori算法與Hadoop平臺相結(jié)合,算法部署在云平臺上用以實現(xiàn)算法的并行化,以此來達到對海量數(shù)據(jù)的處理。文中對當前云計算以及數(shù)據(jù)挖掘技術(shù)的研究現(xiàn)狀和發(fā)展做了詳細論述,在Hadoop技術(shù)中著重介紹了HDFS和MapReduce兩個核心技術(shù)。第三章對傳統(tǒng)的Apriori關(guān)聯(lián)算法做了分析,并以實例的形式論述算法執(zhí)行存在的缺陷,同時介紹了已存在的算法優(yōu)化的方法,列出了性能上的對比。文章第四、第五章是是所研究的核心內(nèi)容,其主要內(nèi)容是:第四章針對傳統(tǒng)的Apriori算法提出了改進,降低算法執(zhí)行的時間復雜度,提高了算法的執(zhí)行效率;然后引入了興趣度閾值的概念對算法挖掘產(chǎn)生的規(guī)則做進一步的篩選,提高強關(guān)聯(lián)規(guī)則的有效性、可用性,并以折線圖的方式將實驗分析所得出的結(jié)果呈現(xiàn)出來,對比得出結(jié)論。第五章著重介紹了搭建Hadoop平臺的流程及常規(guī)配置,闡述了算法并行化的思想,介紹了零售行業(yè)對云計算關(guān)聯(lián)分析技術(shù)的需求,將優(yōu)化的Apriori算法部署在Hadoop平臺上與普通的串行算法的執(zhí)行效率做對比,以實驗結(jié)果分析論述算法并行化的可行性及優(yōu)勢。
[Abstract]:With the rapid development of the Internet, the network has penetrated into all aspects of life. The Internet is rich, convenient for people's life, and even changes the way people work to a certain extent. With the wide application of Internet technology, the scale of data information produced in the background presents sea quantification. How to dig out value information in big data has been concerned by various industries. Mining association rules between objects from large scale noisy data sets is a more extensive application in data mining technology. However, traditional single-machine data mining can not achieve a comprehensive analysis of massive data, cloud computing for the data mining industry put forward a new idea. Apache Foundation research and development of Hadoop cloud platform to reduce the technical threshold of cloud computing development. By combining the parallel computing technology of cloud platform with the improved association rules algorithm, the mining operation of massive data can be realized better, and the information law contained in the data set can be obtained, thus providing better decision for commercial applications. Based on the traditional Apriori algorithm, this paper finds out the key points that can be optimized by analyzing the execution flow of the algorithm, and improves the algorithm accordingly. The improved Apriori algorithm is combined with the Hadoop platform. The algorithm is deployed on the cloud platform to realize the parallelization of the algorithm so as to process the massive data. In this paper, the current research status and development of cloud computing and data mining technology are discussed in detail, and two core technologies, HDFS and MapReduce, are emphatically introduced in Hadoop technology. In the third chapter, the traditional Apriori association algorithm is analyzed, and the shortcomings of the algorithm execution are discussed in the form of an example. At the same time, the existing algorithm optimization methods are introduced, and the performance comparison is given. The fourth chapter and the fifth chapter are the core contents of the research. The main contents are as follows: in the fourth chapter, the traditional Apriori algorithm is improved, the time complexity of the algorithm is reduced, and the efficiency of the algorithm is improved. Then the concept of interest threshold is introduced to further filter the rules generated by algorithm mining, to improve the effectiveness and availability of strong association rules, and the results of experimental analysis are presented by the way of broken line graph. Draw a conclusion by contrast. The fifth chapter mainly introduces the flow and general configuration of Hadoop platform, expounds the idea of algorithm parallelization, and introduces the demand of cloud computing association analysis technology in retail industry. The optimized Apriori algorithm is deployed on the Hadoop platform and compared with the execution efficiency of the ordinary serial algorithm. The feasibility and advantages of parallelization of the algorithm are discussed with the experimental results.
【學位授予單位】：河南工業(yè)大學
【學位級別】：碩士
【學位授予年份】：2017
【分類號】：TP311.13;TP393.09

【參考文獻】

相關(guān)期刊論文前10條

1 柴巖;張京輝;魯新新;;最小支持度為區(qū)間值的加權(quán)Apriori算法[J];遼寧工程技術(shù)大學學報(自然科學版);2016年12期

2 段春梅;;云計算分布式緩存技術(shù)在海量數(shù)據(jù)處理平臺中的應用[J];智能計算機與應用;2016年01期

3 王來;翟健宏;;基于HDFS的分布式存儲策略分析[J];智能計算機與應用;2016年01期

4 林長方;吳揚揚;黃仲開;曾少俊;;基于MapReduce的Apriori算法并行化[J];江南大學學報(自然科學版);2014年04期

5 周勇;池麗華;;大數(shù)據(jù)時代零售業(yè)的五項對策[J];上海商學院學報;2014年04期

6 李雷;黃蓉;;基于Apriori的快速剪枝和連接的新算法（英文）[J];計算機技術(shù)與發(fā)展;2014年05期

7 王娟;;一種基于DHP算法的頻繁項集改進方法[J];科技視界;2013年31期

8 屠要峰;錢煜明;;一種基于海量數(shù)據(jù)的信息云系統(tǒng)及其關(guān)鍵技術(shù)研究[J];電信科學;2012年12期

9 劉正偉;文中領(lǐng);張海濤;;云計算和云數(shù)據(jù)管理技術(shù)[J];計算機研究與發(fā)展;2012年S1期

10 李成華;張新訪;金海;向文;;MapReduce:新型的分布式并行計算編程模型[J];計算機工程與科學;2011年03期

相關(guān)碩士學位論文前10條

1 董金鳳;數(shù)據(jù)挖掘中關(guān)聯(lián)規(guī)則算法的改進與并行化處理[D];哈爾濱理工大學;2016年

2 任田田;云數(shù)據(jù)中心中虛擬機初始化放置策略的優(yōu)化算法及其應用研究[D];華東師范大學;2015年

3 賈玉辰;Hadoop中海量小文件存取關(guān)鍵技術(shù)的研究與實現(xiàn)[D];南京郵電大學;2015年

4 王達明;基于云計算與醫(yī)療大數(shù)據(jù)的Apriori算法的優(yōu)化研究[D];北京郵電大學;2015年

5 陳積富;云計算模式下Web服務QoS預測技術(shù)研究[D];江西財經(jīng)大學;2014年

6 姚吉龍;基于大數(shù)據(jù)的Hadoop并行計算優(yōu)化處理性能分析[D];南京郵電大學;2014年

7 段玉琴;數(shù)據(jù)挖掘中關(guān)聯(lián)規(guī)則算法的研究[D];西安電子科技大學;2011年

8 李寬;基于HDFS的分布式Namenode節(jié)點模型的研究[D];華南理工大學;2011年

9 曹風兵;基于Hadoop的云計算模型研究與應用[D];重慶大學;2011年

10 寶智紅;C2C電子商務下顧客購買行為的實證研究[D];西南財經(jīng)大學;2010年

，

本文編號：1796186

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://sikaile.net/guanlilunwen/ydhl/1796186.html

上一篇：軟件定義網(wǎng)絡(luò)的虛擬網(wǎng)絡(luò)映射和多控制節(jié)點管理研究
下一篇：一種基于復雜網(wǎng)絡(luò)的P2P流媒體拓撲構(gòu)建算法

論文發(fā)表

·知網(wǎng)|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于云平臺的關(guān)聯(lián)規(guī)則算法優(yōu)化及應用研究