基于Spark框架的高效KNN中文文本分類算法
發(fā)布時(shí)間:2018-05-11 12:28
本文選題:K-最近鄰 + 聚類。 參考:《計(jì)算機(jī)應(yīng)用》2016年12期
【摘要】:針對K-最近鄰(KNN)分類算法時(shí)間復(fù)雜度與訓(xùn)練樣本數(shù)量成正比而導(dǎo)致的計(jì)算量大的問題以及當(dāng)前大數(shù)據(jù)背景下面臨的傳統(tǒng)架構(gòu)處理速度慢的問題,提出了一種基于Spark框架與聚類優(yōu)化的高效KNN分類算法。該算法首先利用引入收縮因子的優(yōu)化K-medoids聚類算法對訓(xùn)練集進(jìn)行兩次裁剪;然后在分類過程中迭代K值獲得分類結(jié)果,并在計(jì)算過程中結(jié)合Spark計(jì)算框架對數(shù)據(jù)進(jìn)行分區(qū)迭代實(shí)現(xiàn)并行化。實(shí)驗(yàn)結(jié)果表明,在不同數(shù)據(jù)集中傳統(tǒng)K-最近鄰算法、基于K-medoids的K-最近鄰算法所耗費(fèi)時(shí)間是所提Spark框架下的K-最近鄰算法的3.92~31.90倍,所提算法具有較高的計(jì)算效率,相較于Hadoop平臺有較好的加速比,可有效地對大數(shù)據(jù)進(jìn)行分類處理。
[Abstract]:In order to solve the problem that the time complexity of K- nearest neighbor KNN algorithm is proportional to the number of training samples, and the problem of slow processing speed of traditional architecture under the background of big data, this paper proposes a new approach to solve the problem. An efficient KNN classification algorithm based on Spark framework and clustering optimization is proposed. The algorithm firstly uses the optimal K-medoids clustering algorithm with shrinkage factor to cut the training set twice, and then iterates the K value in the process of classification to obtain the classification result. In the process of calculation, the data is parallelized by partition iteration combined with Spark computing framework. The experimental results show that the traditional K- nearest neighbor algorithm based on K-medoids consumes 3.92 times as much time as the K- nearest neighbor algorithm based on Spark in different data sets, and the proposed algorithm has a high computational efficiency. Compared with Hadoop platform, it has a better speedup ratio and can effectively classify big data.
【作者單位】: 曲阜師范大學(xué)信息科學(xué)與工程學(xué)院;曲阜師范大學(xué)軟件學(xué)院;
【基金】:國家自然科學(xué)基金資助項(xiàng)目(61402258) 山東省本科高校教學(xué)改革研究項(xiàng)目(2015M102) 校級教學(xué)改革研究項(xiàng)目(jg05021*)~~
【分類號】:TP391.1
【相似文獻(xiàn)】
相關(guān)碩士學(xué)位論文 前1條
1 楊鳴;n=5情形下的Hofbauer-So-Takeuchi猜想的證明[D];四川師范大學(xué);2014年
,本文編號:1873941
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/1873941.html
最近更新
教材專著