天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當前位置:主頁 > 科技論文 > 軟件論文 >

基于MapReduce的kNN-join算法的研究與設(shè)計

發(fā)布時間:2018-06-06 10:22

  本文選題:MapReduce + kNN連接操作; 參考:《黑龍江大學(xué)》2016年碩士論文


【摘要】:由于互聯(lián)網(wǎng)行業(yè)的不斷發(fā)展,隨之而來的是大量的數(shù)據(jù),因此如何在這些大量數(shù)據(jù)中獲得有價值的知識成為了人們關(guān)注的焦點。在所有的數(shù)據(jù)挖掘算法中,可以利用kNN算法進行數(shù)據(jù)分類,隨著kNN算法的廣泛應(yīng)用,kNN-join算法隨之被提出,算法被廣泛的應(yīng)用在數(shù)據(jù)挖掘的各個階段:數(shù)據(jù)預(yù)處理階段和數(shù)據(jù)挖掘階段。然而隨著數(shù)據(jù)量的不斷增大,以及人們對操作效率的要求,傳統(tǒng)方法已經(jīng)無法滿足,因此產(chǎn)生了基于MapReduce的kNN-join操作。本文對基于MapReduce的kNN-join操作的的各個階段進行研究,首先,對數(shù)據(jù)進行預(yù)處理,對數(shù)據(jù)劃分算法進行優(yōu)化,對現(xiàn)有的數(shù)據(jù)劃分算法進行改進,以保證數(shù)據(jù)均勻劃分;其次,為了節(jié)約join過程中的開銷,使得每個數(shù)據(jù)劃分中的所有元素的最近k個鄰居在一個集合內(nèi),為每個數(shù)據(jù)劃分尋找種集;最后,為了均衡資源利用率與算法準確率,我們對數(shù)據(jù)劃分進行群組劃分。本文使用真實數(shù)據(jù)與合成數(shù)據(jù)相結(jié)合,對算法進行實驗,以證實算法的有效性,實驗結(jié)果顯示,我們提出的算法優(yōu)于已有算法。
[Abstract]:Due to the continuous development of the Internet industry, there is a large number of data, so how to obtain valuable knowledge in these data has become the focus of attention. Among all the data mining algorithms, the kNN algorithm can be used to classify the data. With the wide application of the kNN algorithm, the kNN-join algorithm has been proposed. The algorithm is widely used in all stages of data mining: data preprocessing and data mining. However, with the increasing amount of data and the requirement of operation efficiency, the traditional methods can not meet the requirements, so the kNN-join operation based on MapReduce is produced. In this paper, we study the stages of kNN-join operation based on MapReduce. Firstly, we preprocess the data, optimize the data partition algorithm, improve the existing data partition algorithm to ensure the uniform partition of data. In order to save the overhead in the join process, the nearest k neighbors of all the elements in each data partition are found in one set. Finally, in order to balance the resource utilization with the accuracy of the algorithm, the nearest k neighbors of all the elements in each data partition are found in a single set. We divide the data into groups. In this paper, we use real data and synthetic data to test the algorithm to verify the effectiveness of the algorithm. The experimental results show that the proposed algorithm is better than the existing algorithm.
【學(xué)位授予單位】:黑龍江大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2016
【分類號】:TP311.13
,

本文編號:1986197

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/1986197.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶f5775***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com