MapReduce并行化壓縮近鄰算法
發(fā)布時(shí)間:2018-06-08 07:11
本文選題:壓縮近鄰 + K-近鄰; 參考:《小型微型計(jì)算機(jī)系統(tǒng)》2017年12期
【摘要】:壓縮近鄰(CNN:Condensed Nearest Neighbors)是Hart針對(duì)K-近鄰(K-NN:K-Nearest Neighbors)提出的樣例選擇算法,目的是為了降低K-NN算法的內(nèi)存需求和計(jì)算負(fù)擔(dān).但在最壞情況下,CNN算法的計(jì)算時(shí)間復(fù)雜度為O(n3),n為訓(xùn)練集中包含的樣例數(shù).當(dāng)CNN算法應(yīng)用于大數(shù)據(jù)環(huán)境時(shí),高計(jì)算時(shí)間復(fù)雜度會(huì)成為其應(yīng)用的瓶頸.針對(duì)這一問(wèn)題,本文提出了基于MapReduce并行化壓縮近鄰算法.在Hadoop環(huán)境下,編程實(shí)現(xiàn)了并行化的CNN,并與原始的CNN算法在6個(gè)數(shù)據(jù)集上進(jìn)行了實(shí)驗(yàn)比較.實(shí)驗(yàn)結(jié)果顯示,本文提出的算法是行之有效的,能解決上述問(wèn)題.
[Abstract]:CNN: Condensed nearest neighbor (CNN: Condensed nearest neighbor) is a sample selection algorithm proposed by Hart for K-NN: K-nearest neighbors. The aim of this algorithm is to reduce the memory requirement and computational burden of K-NN algorithm. But in the worst case, the computational complexity of CNN algorithm is the number of samples contained in the training set. When CNN algorithm is applied to big data environment, high computational time complexity will become the bottleneck of its application. To solve this problem, this paper proposes a parallel compressed nearest neighbor algorithm based on MapReduce. In Hadoop environment, the parallel CNNs are programmed and compared with the original CNN algorithm on 6 datasets. Experimental results show that the proposed algorithm is effective and can solve the above problems.
【作者單位】: 河北大學(xué)數(shù)學(xué)與信息科學(xué)學(xué)院河北省機(jī)器學(xué)習(xí)與計(jì)算智能重點(diǎn)實(shí)驗(yàn)室;浙江師范大學(xué)數(shù)理與信息工程學(xué)院;
【基金】:國(guó)家自然科學(xué)基金項(xiàng)目(71371063)資助 河北省自然科學(xué)基金項(xiàng)目(F2017201026)資助 浙江省計(jì)算機(jī)科學(xué)與技術(shù)重中之重學(xué)科(浙江師范大學(xué))課題項(xiàng)目資助
【分類(lèi)號(hào)】:TP311.13
,
本文編號(hào):1995065
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/1995065.html
最近更新
教材專(zhuān)著