MapReduce并行化壓縮近鄰算法
發(fā)布時間:2018-06-08 07:11
本文選題:壓縮近鄰 + K-近鄰; 參考:《小型微型計算機(jī)系統(tǒng)》2017年12期
【摘要】:壓縮近鄰(CNN:Condensed Nearest Neighbors)是Hart針對K-近鄰(K-NN:K-Nearest Neighbors)提出的樣例選擇算法,目的是為了降低K-NN算法的內(nèi)存需求和計算負(fù)擔(dān).但在最壞情況下,CNN算法的計算時間復(fù)雜度為O(n3),n為訓(xùn)練集中包含的樣例數(shù).當(dāng)CNN算法應(yīng)用于大數(shù)據(jù)環(huán)境時,高計算時間復(fù)雜度會成為其應(yīng)用的瓶頸.針對這一問題,本文提出了基于MapReduce并行化壓縮近鄰算法.在Hadoop環(huán)境下,編程實(shí)現(xiàn)了并行化的CNN,并與原始的CNN算法在6個數(shù)據(jù)集上進(jìn)行了實(shí)驗(yàn)比較.實(shí)驗(yàn)結(jié)果顯示,本文提出的算法是行之有效的,能解決上述問題.
[Abstract]:CNN: Condensed nearest neighbor (CNN: Condensed nearest neighbor) is a sample selection algorithm proposed by Hart for K-NN: K-nearest neighbors. The aim of this algorithm is to reduce the memory requirement and computational burden of K-NN algorithm. But in the worst case, the computational complexity of CNN algorithm is the number of samples contained in the training set. When CNN algorithm is applied to big data environment, high computational time complexity will become the bottleneck of its application. To solve this problem, this paper proposes a parallel compressed nearest neighbor algorithm based on MapReduce. In Hadoop environment, the parallel CNNs are programmed and compared with the original CNN algorithm on 6 datasets. Experimental results show that the proposed algorithm is effective and can solve the above problems.
【作者單位】: 河北大學(xué)數(shù)學(xué)與信息科學(xué)學(xué)院河北省機(jī)器學(xué)習(xí)與計算智能重點(diǎn)實(shí)驗(yàn)室;浙江師范大學(xué)數(shù)理與信息工程學(xué)院;
【基金】:國家自然科學(xué)基金項(xiàng)目(71371063)資助 河北省自然科學(xué)基金項(xiàng)目(F2017201026)資助 浙江省計算機(jī)科學(xué)與技術(shù)重中之重學(xué)科(浙江師范大學(xué))課題項(xiàng)目資助
【分類號】:TP311.13
,
本文編號:1995065
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/1995065.html
最近更新
教材專著