天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

分布式環(huán)境下Top-K計(jì)算問(wèn)題研究

發(fā)布時(shí)間:2018-11-09 08:57
【摘要】:Top-k計(jì)算作為一種偏好查詢,是數(shù)據(jù)庫(kù)中一個(gè)最基本的操作,旨在從給定的數(shù)據(jù)集中查找出用戶可能感興趣的信息。作為一種數(shù)據(jù)分析的重要工具,top-k計(jì)算在網(wǎng)頁(yè)搜索、電子商務(wù)、數(shù)據(jù)挖掘、多標(biāo)準(zhǔn)決策支持等領(lǐng)域有著廣泛的應(yīng)用。隨著大數(shù)據(jù)時(shí)代的來(lái)臨,傳統(tǒng)的top-k處理技術(shù)遇到前所未有的挑戰(zhàn),已經(jīng)無(wú)法滿足大數(shù)據(jù)分析的需求。新環(huán)境下的top-k計(jì)算主要面臨著三個(gè)挑戰(zhàn):一是數(shù)據(jù)規(guī)模達(dá)到TB或者PB級(jí),傳統(tǒng)的單機(jī)處理方式不再適用,應(yīng)該考慮分布式并行計(jì)算框架;二是對(duì)于面對(duì)海量數(shù)據(jù)集,在分布式環(huán)境下,采取怎樣的數(shù)據(jù)劃分方法才能夠提升并行性能和查詢速度;三是傳統(tǒng)的top-k查詢需要用戶給定一個(gè)評(píng)分函數(shù),而選擇一個(gè)合適的評(píng)分函數(shù)卻不是件容易的事。因此,本文對(duì)分布式環(huán)境下top-k計(jì)算的數(shù)據(jù)劃分和并行算法設(shè)計(jì)的關(guān)鍵技術(shù)進(jìn)行了研究和探索,主要的研究?jī)?nèi)容包括:(1)在分布式計(jì)算框架下,針對(duì)于加權(quán)top-k查詢問(wèn)題,提出了類似網(wǎng)格數(shù)據(jù)劃分方式,將原始數(shù)據(jù)集劃分為不同的子數(shù)據(jù)集,根據(jù)用戶偏好選取子數(shù)據(jù)集代替全部數(shù)據(jù)集進(jìn)行查詢,減少查詢數(shù)據(jù)。針對(duì)于高維度中的“空空間”現(xiàn)象,本文在網(wǎng)格劃分基礎(chǔ)上引入超平面劃分。與基于角度和超平面的數(shù)據(jù)劃分方式相比,該方法預(yù)處理簡(jiǎn)單不用進(jìn)行復(fù)雜坐標(biāo)轉(zhuǎn)換,而且對(duì)于較高維度中出現(xiàn)的“空空間”現(xiàn)象依舊適用。實(shí)驗(yàn)結(jié)果證明在大數(shù)據(jù)環(huán)境下類似網(wǎng)格和超平面數(shù)據(jù)劃分方法查詢速度比基于角度劃分方法快了接近15‰此外對(duì)于數(shù)據(jù)維度較高時(shí)候出現(xiàn)的“空空間”現(xiàn)象(實(shí)驗(yàn)中即:d大于等于8),比基于角度劃分方法,查詢結(jié)果更準(zhǔn)確,同時(shí)具有良好的可擴(kuò)展性。(2)針對(duì)于傳統(tǒng)的top-k查詢需要用戶給定一個(gè)評(píng)分函數(shù),而某些用戶難以給出一個(gè)合理的評(píng)分函數(shù)這一問(wèn)題。在結(jié)合已有的單機(jī)算法基礎(chǔ)上提出了五種在分布式平臺(tái)下基于度量空間的并行top-k dominating查詢算法。算法1利用skyline集合中一定包含top-1 dominating結(jié)果這一結(jié)論,分區(qū)并行計(jì)算skyline,來(lái)加快處理速度;同時(shí)利用候選集的支配關(guān)系,避免k次重復(fù)計(jì)算。算法2利用k-skyband集合中包含所有top-k dominating結(jié)果這一結(jié)論,每個(gè)分區(qū)并行計(jì)算k-skyband,避免k次循環(huán)。算法3在算法1基礎(chǔ)上,首先結(jié)合ANN對(duì)原始數(shù)據(jù)進(jìn)行篩選,加快對(duì)skyline的計(jì)算。算法4為一種基于集合ANN和k-skyband的剪枝算法,該算法利用集合ANN預(yù)先剪枝,再求k-skyband,最后獲取top-k dominating,加快計(jì)算k-skyband速度。算法5為一種基于排序剪枝的top-k dominating算法,該算法根據(jù)查詢輸入集合Q對(duì)數(shù)據(jù)集排序,建立索引表,采用round-robin方式讀取索引表,避免遍歷原始數(shù)據(jù)集來(lái)計(jì)算每個(gè)候選集的支配分?jǐn)?shù)。實(shí)驗(yàn)結(jié)果表明這五種并行算法減少了數(shù)據(jù)之間的支配比較次數(shù),提高了查詢效率,效果明顯,且大部分情況下算法4的查詢效果最好。
[Abstract]:As a kind of preference query, Top-k computing is the most basic operation in the database, which aims to find out the information that the user may be interested in from the given data set. As an important tool of data analysis, top-k computing is widely used in web search, electronic commerce, data mining, multi-standard decision support and so on. With the advent of big data, the traditional top-k processing technology has encountered unprecedented challenges, and has been unable to meet the needs of big data analysis. In the new environment, top-k computing faces three main challenges: first, the data scale reaches the level of TB or PB, the traditional single-machine processing method is no longer applicable, the distributed parallel computing framework should be considered; Second, what kind of data partition method can be adopted to improve the parallel performance and query speed in the distributed environment in the face of massive data sets; Third, the traditional top-k query requires the user to give a rating function, but it is not easy to select a proper score function. Therefore, the key technologies of data partitioning and parallel algorithm design for top-k computing in distributed environment are studied and explored in this paper. The main research contents are as follows: (1) in the framework of distributed computing, In order to solve the problem of weighted top-k query, a similar grid data partition method is proposed. The original data set is divided into different subdatasets, and subdatasets are selected to replace all the data sets according to user preferences to reduce the query data. Aiming at the phenomenon of "empty space" in high dimension, this paper introduces hyperplane partition on the basis of grid division. Compared with the data partitioning method based on angle and hyperplane, this method is simple to preprocess without complex coordinate transformation, and it is still applicable to the phenomenon of "empty space" in higher dimensions. The experimental results show that the query speed of similar grid and hyperplane data partitioning method in big data environment is nearly 15 鈥,

本文編號(hào):2319947

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/jingjilunwen/dianzishangwulunwen/2319947.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶8c885***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com
日本和亚洲的香蕉视频| 五月婷婷六月丁香亚洲| 白丝美女被插入视频在线观看| 高清欧美大片免费在线观看| 日韩一级一片内射视频4k| 精品伊人久久大香线蕉综合| 久久精品国产亚洲av麻豆尤物 | 国产又大又硬又粗又湿| 精品人妻一区二区三区免费看 | 亚洲国产精品久久综合网| 日本视频在线观看不卡| 手机在线观看亚洲中文字幕| 日本高清中文精品在线不卡| 91午夜少妇极品福利| 国产免费一区二区三区不卡| 亚洲超碰成人天堂涩涩| 大香蕉伊人一区二区三区| 午夜视频成人在线免费| 91偷拍裸体一区二区三区| 国产午夜福利片在线观看| 91欧美一区二区三区| 亚洲精品一区二区三区日韩| 国产精品一区二区不卡中文| 中文字幕五月婷婷免费| 91欧美视频在线观看免费| 日韩在线一区中文字幕| 欧美黑人在线一区二区| 久草视频这里只是精品| 免费在线播放不卡视频 | 二区久久久国产av色| 欧美黑人精品一区二区在线| 国产欧美日产久久婷婷| 亚洲一区二区三区国产| 日韩中文高清在线专区| 中日韩美女黄色一级片| 女生更色还是男生更色| 国产亚洲视频香蕉一区| 老司机精品国产在线视频| 国产欧美精品对白性色| 午夜精品国产精品久久久| 91精品国产综合久久精品|