天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁 > 科技論文 > 軟件論文 >

基于Spark框架的高效KNN中文文本分類算法

發(fā)布時(shí)間:2018-05-11 12:28

  本文選題:K-最近鄰 + 聚類。 參考:《計(jì)算機(jī)應(yīng)用》2016年12期


【摘要】:針對K-最近鄰(KNN)分類算法時(shí)間復(fù)雜度與訓(xùn)練樣本數(shù)量成正比而導(dǎo)致的計(jì)算量大的問題以及當(dāng)前大數(shù)據(jù)背景下面臨的傳統(tǒng)架構(gòu)處理速度慢的問題,提出了一種基于Spark框架與聚類優(yōu)化的高效KNN分類算法。該算法首先利用引入收縮因子的優(yōu)化K-medoids聚類算法對訓(xùn)練集進(jìn)行兩次裁剪;然后在分類過程中迭代K值獲得分類結(jié)果,并在計(jì)算過程中結(jié)合Spark計(jì)算框架對數(shù)據(jù)進(jìn)行分區(qū)迭代實(shí)現(xiàn)并行化。實(shí)驗(yàn)結(jié)果表明,在不同數(shù)據(jù)集中傳統(tǒng)K-最近鄰算法、基于K-medoids的K-最近鄰算法所耗費(fèi)時(shí)間是所提Spark框架下的K-最近鄰算法的3.92~31.90倍,所提算法具有較高的計(jì)算效率,相較于Hadoop平臺有較好的加速比,可有效地對大數(shù)據(jù)進(jìn)行分類處理。
[Abstract]:In order to solve the problem that the time complexity of K- nearest neighbor KNN algorithm is proportional to the number of training samples, and the problem of slow processing speed of traditional architecture under the background of big data, this paper proposes a new approach to solve the problem. An efficient KNN classification algorithm based on Spark framework and clustering optimization is proposed. The algorithm firstly uses the optimal K-medoids clustering algorithm with shrinkage factor to cut the training set twice, and then iterates the K value in the process of classification to obtain the classification result. In the process of calculation, the data is parallelized by partition iteration combined with Spark computing framework. The experimental results show that the traditional K- nearest neighbor algorithm based on K-medoids consumes 3.92 times as much time as the K- nearest neighbor algorithm based on Spark in different data sets, and the proposed algorithm has a high computational efficiency. Compared with Hadoop platform, it has a better speedup ratio and can effectively classify big data.
【作者單位】: 曲阜師范大學(xué)信息科學(xué)與工程學(xué)院;曲阜師范大學(xué)軟件學(xué)院;
【基金】:國家自然科學(xué)基金資助項(xiàng)目(61402258) 山東省本科高校教學(xué)改革研究項(xiàng)目(2015M102) 校級教學(xué)改革研究項(xiàng)目(jg05021*)~~
【分類號】:TP391.1

【相似文獻(xiàn)】

相關(guān)碩士學(xué)位論文 前1條

1 楊鳴;n=5情形下的Hofbauer-So-Takeuchi猜想的證明[D];四川師范大學(xué);2014年

,

本文編號:1873941

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/1873941.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶6a401***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com