天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁 > 科技論文 > 軟件論文 >

基于Hadoop平臺并行關(guān)聯(lián)規(guī)則挖掘算法研究

發(fā)布時(shí)間:2018-07-04 23:55

  本文選題:大數(shù)據(jù) + 關(guān)聯(lián)規(guī)則 ; 參考:《西安科技大學(xué)》2017年碩士論文


【摘要】:數(shù)據(jù)規(guī)模的爆炸性增長給傳統(tǒng)計(jì)算機(jī)技術(shù)和串行算法帶來挑戰(zhàn),同時(shí)也帶來了新的發(fā)展機(jī)遇!按髷(shù)據(jù)”順應(yīng)而生。大數(shù)據(jù)使串行化關(guān)聯(lián)規(guī)則算法需要重寫,串行算法的并行化迫在眉睫,并行計(jì)算和大數(shù)據(jù)平臺的應(yīng)用是好的解決方案。關(guān)聯(lián)規(guī)則用于發(fā)現(xiàn)信息與信息之間存在的關(guān)系,是重要的數(shù)據(jù)挖掘任務(wù)。關(guān)聯(lián)規(guī)則傳統(tǒng)算法Apriori算法和FP-Growth算法處理大數(shù)據(jù)時(shí),單機(jī)處理發(fā)生內(nèi)存溢出情況。使用Hadoop進(jìn)行關(guān)聯(lián)規(guī)則研究,降低編程難度,數(shù)據(jù)分片,因此Hadoop上關(guān)聯(lián)規(guī)則并行算法研究是一個(gè)重要課題。針對此問題,本文進(jìn)行了如下研究:(l)研究了 H-Apriori(Apriori algorithm based on Hadoop)算法并改進(jìn)其算法。大數(shù)據(jù)環(huán)境下,Apriori串行算法難以處理海量數(shù)據(jù),H-Apriori算法的中間過程產(chǎn)生大量值為1的鍵/值對,并且讀取全部的事務(wù),以致產(chǎn)生了大量的候選項(xiàng)并消耗了運(yùn)算時(shí)間。本文采用刪除非頻繁項(xiàng)達(dá)到減少冗余數(shù)據(jù)的目的。重構(gòu)數(shù)據(jù)庫,優(yōu)化讀取事務(wù)步驟,提出了基于Hadoop的改進(jìn)算法。有效約簡了事務(wù)數(shù)據(jù)庫,使用哈希樹計(jì)數(shù)減少計(jì)數(shù)時(shí)間,提高了算法效率。(2)提出了一種基于Hadoop平臺的負(fù)載均衡數(shù)據(jù)分割FP-Growth的改進(jìn)算法。大數(shù)據(jù)環(huán)境下,FP-Growth串行算法難以處理海量數(shù)據(jù),PFP(ParallelFP-Growth)難以處理一定量的數(shù)據(jù)。改進(jìn)算法使用負(fù)載量估計(jì)、改進(jìn)的均衡化分組方法進(jìn)行均衡化分組,克服了 PFP數(shù)據(jù)量增大不能處理、負(fù)載不均衡的缺點(diǎn)。改進(jìn)算法可以有效平衡集群各節(jié)點(diǎn)的負(fù)載,縮短整個(gè)集群的算法運(yùn)行時(shí)間。搭建大數(shù)據(jù)Hadoop平臺框架后,進(jìn)行了對比實(shí)驗(yàn)。通過權(quán)威數(shù)據(jù)驗(yàn)證算法實(shí)效性。實(shí)驗(yàn)表明,改進(jìn)算法能夠更好的適應(yīng)大數(shù)據(jù),并且效率較高。
[Abstract]:The explosive growth of data scale brings challenges to traditional computer technology and serial algorithms, but also brings new opportunities for development. "big data" comes with adaptation. The serialized association rule algorithm needs to be rewritten by big data, and the parallelization of serial algorithm is imminent. Parallel computing and big data platform are good solutions. Association rules are used to discover the relationship between information and information, which is an important task of data mining. When Apriori algorithm and FP-Growth algorithm deal with big data, memory overflow occurs on single machine. Using Hadoop to study association rules reduces the difficulty of programming and divides data into pieces. Therefore the research on parallel algorithms of association rules on Hadoop is an important subject. In order to solve this problem, this paper researches as follows: (l) studies H-Apriori (Apriori algorithm based on Hadoop algorithm and improves its algorithm. In big data environment, it is difficult to deal with massive data in the middle process of H-Apriori algorithm, which produces a large number of key / value pairs with a value of 1, and reads all transactions, resulting in a large number of candidate items and consuming operation time. In this paper, we reduce redundant data by deleting infrequent items. The improved algorithm based on Hadoop is proposed to reconstruct the database and optimize the step of reading transaction. The transaction database is reduced effectively and the counting time is reduced by using hash tree. (2) an improved FP-Growth algorithm for load balancing data segmentation based on Hadoop platform is proposed. FP-Growth serial algorithm is difficult to deal with large amount of data in big data (parallel FP-Growth). The improved algorithm uses the load estimation and the improved equalization grouping method to equalize the packet, which overcomes the disadvantage that the PFP data can not be processed and the load is unbalanced. The improved algorithm can effectively balance the load of each node in the cluster and shorten the running time of the whole cluster. After the big data Hadoop platform framework is built, a comparative experiment is carried out. The validity of the algorithm is verified by authoritative data. Experiments show that the improved algorithm can adapt to big data better and more efficiently.
【學(xué)位授予單位】:西安科技大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:TP311.13

【參考文獻(xiàn)】

相關(guān)期刊論文 前10條

1 鄒裕;肖倩;吳樹榮;;基于增強(qiáng)關(guān)聯(lián)規(guī)則挖掘的大型網(wǎng)站推薦系統(tǒng)[J];計(jì)算機(jī)與現(xiàn)代化;2016年10期

2 陳明潔;;分布式頻繁項(xiàng)集挖掘算法[J];計(jì)算機(jī)應(yīng)用與軟件;2015年10期

3 晁永生;孫文磊;;基于粗糙集的焊接類型關(guān)聯(lián)規(guī)則提取[J];計(jì)算機(jī)工程與應(yīng)用;2015年15期

4 呂婉琪;鐘誠;唐印滸;陳志朕;;Hadoop分布式架構(gòu)下大數(shù)據(jù)集的并行挖掘[J];計(jì)算機(jī)技術(shù)與發(fā)展;2014年01期

5 章志剛;吉根林;;一種基于FP-Growth的頻繁項(xiàng)目集并行挖掘算法[J];計(jì)算機(jī)工程與應(yīng)用;2014年02期

6 劉維曉;陳俊麗;屈世富;萬旺根;;一種改進(jìn)的Apriori算法[J];計(jì)算機(jī)工程與應(yīng)用;2011年11期

7 王鋒;李勇華;毋國慶;;基于矩陣的改進(jìn)的Apriori算法[J];計(jì)算機(jī)工程與設(shè)計(jì);2009年10期

8 談恒貴;王文杰;李克雙;;頻繁項(xiàng)集挖掘算法綜述[J];計(jì)算機(jī)仿真;2005年11期

9 陳付幸,王潤生;基于預(yù)檢驗(yàn)的快速隨機(jī)抽樣一致性算法[J];軟件學(xué)報(bào);2005年08期

10 遲利華,劉杰,胡慶豐;數(shù)值并行計(jì)算可擴(kuò)展性評價(jià)與測試[J];計(jì)算機(jī)研究與發(fā)展;2005年06期

相關(guān)碩士學(xué)位論文 前3條

1 車斌;基于Hadoop海量數(shù)據(jù)處理關(guān)鍵技術(shù)研究[D];電子科技大學(xué);2013年

2 魏峰;基于聚類的關(guān)聯(lián)規(guī)則挖掘算法研究[D];浙江工業(yè)大學(xué);2012年

3 謝朋峻;基于MapReduce的頻繁項(xiàng)集挖掘算法的并行化研究[D];南京大學(xué);2012年

,

本文編號:2098020

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2098020.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶37c2d***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請E-mail郵箱bigeng88@qq.com
中国日韩一级黄色大片| 国产日韩久久精品一区| 玩弄人妻少妇一区二区桃花| 亚洲欧美日韩精品永久| 国产亚洲精品久久久优势| 日韩精品第一区二区三区| 激情内射日本一区二区三区| 1024你懂的在线视频| 五月婷婷六月丁香在线观看| 亚洲精品中文字幕欧美| 欧美精品中文字幕亚洲| 精品欧美日韩一区二区三区 | 日韩在线欧美一区二区| 91亚洲国产成人久久| 老熟妇2久久国内精品| 黄色日韩欧美在线观看| 微拍一区二区三区福利| 91欧美一区二区三区| 日韩精品一区二区三区av在线| 日韩人妻欧美一区二区久久| 欧美黑人暴力猛交精品| 亚洲国产成人一区二区在线观看| 精品人妻少妇二区三区| 久久精品中文扫妇内射| 亚洲一区二区欧美在线| 日本最新不卡免费一区二区| 好吊色欧美一区二区三区顽频| 美女极度色诱视频在线观看| 精品熟女少妇一区二区三区| 日韩人妻免费视频一专区| 日韩夫妻午夜性生活视频| 欧美丰满大屁股一区二区三区| 国产精品亚洲综合色区韩国| 国产精品午夜视频免费观看| 国产又粗又长又大高潮视频| 91亚洲精品亚洲国产| 国产午夜福利片在线观看| 亚洲色图欧美另类人妻| 中文字幕中文字幕在线十八区| 欧美午夜一级特黄大片| 在线免费视频你懂的观看|