天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁(yè) > 科技論文 > 軟件論文 >

MapReduce下區(qū)間連接方法研究

發(fā)布時(shí)間:2018-05-08 20:16

  本文選題:區(qū)間連接 + 集合分類; 參考:《華中科技大學(xué)》2016年碩士論文


【摘要】:隨著網(wǎng)絡(luò)技術(shù)的飛速發(fā)展,全球數(shù)據(jù)倍增,為大數(shù)據(jù)的分析和處理帶來(lái)了困難。Map Reduce作為新興的數(shù)據(jù)密集型計(jì)算編程模型,在大數(shù)據(jù)分析與處理方面發(fā)揮了重要的作用。而區(qū)間連接是屬性取值在一個(gè)范圍內(nèi)的連接運(yùn)算,是大數(shù)據(jù)分析和處理的重要運(yùn)算,如何利用Map Reduce編程平臺(tái)提升區(qū)間連接的效率具有重要的意義。在Allen提出的區(qū)間元組概念、區(qū)間元組關(guān)系的基礎(chǔ)上,設(shè)計(jì)了一種基于集合分類實(shí)現(xiàn)二路區(qū)間和多路區(qū)間的連接算法。首先將參與運(yùn)算的區(qū)間元組根據(jù)區(qū)間范圍均勻劃分成若干個(gè)分區(qū),根據(jù)元組與分區(qū)是否有交集,將元組映射到相應(yīng)的分區(qū)集合,對(duì)每個(gè)元組在分區(qū)中的位置進(jìn)行分類,定義了四種類型的集合分類,并分析了每個(gè)分區(qū)中四種類型集合分類占分區(qū)數(shù)據(jù)總量的比例。其次用Map Reduce分布式編程框架編程實(shí)現(xiàn)二路區(qū)間和多路區(qū)間連接算法。通過(guò)四種集合分類構(gòu)建的鍵值對(duì)可以過(guò)濾掉不需要參與連接的元組,減少M(fèi)ap端數(shù)據(jù)傳輸量和Reduce端數(shù)據(jù)計(jì)算量,提升區(qū)間連接的效率。最后,根據(jù)各個(gè)集合分類占各個(gè)分區(qū)數(shù)據(jù)總量的比例,分別制定二路區(qū)間和多路區(qū)間的負(fù)載均衡策略,重新組合各個(gè)分區(qū)之間的集合分類生成新的鍵值對(duì),均衡各個(gè)Reduce節(jié)點(diǎn)收到的數(shù)據(jù),以進(jìn)一步提高區(qū)間連接作業(yè)的完成效率。在搭建的分布式Hadoop平臺(tái)下分別對(duì)二路區(qū)間連接和多路區(qū)間連接方法進(jìn)行了有效性的驗(yàn)證。實(shí)驗(yàn)結(jié)果表明,基于集合分類的區(qū)間連接方法能適用于多種情況,相比已有二路區(qū)間連接和多路區(qū)間連接方法具有一定的優(yōu)勢(shì),并且制定的負(fù)載均衡策略能進(jìn)一步提升效率。
[Abstract]:With the rapid development of network technology, the global data is multiplying, which brings difficulties to the analysis and processing of large data..Map Reduce is a new data intensive programming model, which plays an important role in the analysis and processing of large data. And the important operation of processing, how to use Map Reduce programming platform to improve the efficiency of the interval connection is of great significance. Based on the concept of interval tuples and interval tuples proposed by Allen, a connection algorithm based on set classification is designed to realize the connection between the two path interval and the multipath interval. First, the interval tuples involved in the operation are based on the algorithm. The interval range is divided into several partitions. According to whether the tuple and the partition have intersection, the tuples are mapped to the corresponding partition sets, the positions of each tuple in the partition are classified, four types of set classification are defined, and the proportion of the four types of set classification in each partition is analyzed. Secondly, Ma is used. P Reduce distributed programming framework programming two road interval and multipath interval connection algorithm. Through four sets of set of key values, we can filter the tuples that do not need to join, reduce the amount of data transmission in the Map end and the amount of data in the Reduce end, and improve the efficiency of the interval connection. Finally, according to each set classification, each partition occupies each partition. In the proportion of total data, the load balancing strategy of two roads and multiple intervals is formulated respectively, and the set classification between each partition is recombined to generate a new key value pair, and the data received by each Reduce node is balanced to further improve the completion efficiency of the interval connection operation. In the distributed Hadoop platform, the two road intervals are respectively set up. The effectiveness of connection and multiple interval connection method is verified. The experimental results show that the interval connection method based on the set classification can be applied to a variety of situations. Compared with the existing two way interval connection and multipath interval connection method, the proposed load balancing strategy can further improve the efficiency.

【學(xué)位授予單位】:華中科技大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2016
【分類號(hào)】:TP311.13

【參考文獻(xiàn)】

相關(guān)期刊論文 前4條

1 張延松;;數(shù)據(jù)庫(kù)與MapReduce融合的大數(shù)據(jù)管理技術(shù)探索[J];科研信息化技術(shù)與應(yīng)用;2013年01期

2 孟小峰;慈祥;;大數(shù)據(jù)管理:概念、技術(shù)與挑戰(zhàn)[J];計(jì)算機(jī)研究與發(fā)展;2013年01期

3 覃雄派;王會(huì)舉;杜小勇;王珊;;大數(shù)據(jù)分析——RDBMS與MapReduce的競(jìng)爭(zhēng)與共生[J];軟件學(xué)報(bào);2012年01期

4 姜素芳;陳天滋;;空間連接優(yōu)化方法的研究[J];計(jì)算機(jī)工程;2007年02期

相關(guān)博士學(xué)位論文 前1條

1 黃繼先;基于R-樹的空間數(shù)據(jù)庫(kù)查詢技術(shù)研究[D];中南大學(xué);2005年

相關(guān)碩士學(xué)位論文 前2條

1 孫惠;基于Hadoop框架的大數(shù)據(jù)集連接優(yōu)化算法[D];南京郵電大學(xué);2013年

2 李俊潔;空間數(shù)據(jù)庫(kù)中空間連接和查詢優(yōu)化研究[D];哈爾濱理工大學(xué);2008年

,

本文編號(hào):1862908

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/1862908.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶b146e***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com
亚洲国产精品肉丝袜久久| 大胆裸体写真一区二区| 欧美一区二区三区不卡高清视| 久久久免费精品人妻一区二区三区 | 欧美日韩亚洲精品在线观看| 好吊妞视频免费在线观看| 国产精品视频第一第二区| 精品女同一区二区三区| 国自产拍偷拍福利精品图片| 亚洲国产av在线观看一区| 日韩精品亚洲精品国产精品| 亚洲第一香蕉视频在线| 亚洲一区二区精品久久av| 日韩欧美国产精品中文字幕| 精品一区二区三区免费看| 在线懂色一区二区三区精品| 99久久国产精品免费| 东京热男人的天堂一二三区| 日本东京热加勒比一区二区| 丝袜视频日本成人午夜视频| 日韩午夜老司机免费视频| 国产精品尹人香蕉综合网| 午夜免费精品视频在线看| 自拍偷女厕所拍偷区亚洲综合| 中文字幕日韩欧美亚洲午夜| 青草草在线视频免费视频| 欧美日韩有码一二三区| 91老熟妇嗷嗷叫太91| 东京热男人的天堂久久综合| 91后入中出内射在线| 亚洲国产av国产av| 日本高清不卡一二三区| 天海翼精品久久中文字幕| 又黄又硬又爽又色的视频 | 日韩精品一区二区三区射精| 人妻熟女欲求不满一区二区| 国产三级视频不卡在线观看| 在线观看视频国产你懂的| 99热九九在线中文字幕| 午夜视频成人在线观看| 精品人妻一区二区三区免费|