基于SOM算法改進(jìn)的K-medoids算法及其研究
本文選題:聚類分析 + SOM算法 ; 參考:《太原理工大學(xué)》2017年碩士論文
【摘要】:科技的飛速發(fā)展,引起信息的急劇膨脹,給計(jì)算機(jī)存儲(chǔ)和行業(yè)數(shù)據(jù)庫(kù)帶來巨大挑戰(zhàn)。隨著數(shù)據(jù)指數(shù)級(jí)的增大,維度不斷加大,數(shù)據(jù)類型的復(fù)雜度也在不斷提升。對(duì)于這些超高維數(shù)據(jù),需要通過數(shù)據(jù)挖掘技術(shù)來探索隱藏于數(shù)據(jù)內(nèi)的信息并利用獲取的信息輔助我們做出科學(xué)合理的預(yù)測(cè)與決策。常見處理高維數(shù)據(jù)方法有:數(shù)據(jù)降維、聚類分析、回歸分析等。本文介紹了傳統(tǒng)的自組織映射(SOM)神經(jīng)網(wǎng)絡(luò)和K-medoids算法。傳統(tǒng)的SOM算法在使用時(shí),存在部分樣本點(diǎn)和對(duì)應(yīng)的權(quán)向量之間差距較大,造成聚類的準(zhǔn)確性較低;K-medoids算法在聚類前需要人為確定聚類個(gè)數(shù)和初始中心點(diǎn),而不同的聚類個(gè)數(shù)和初始中心點(diǎn)的選擇會(huì)造成不同的聚類結(jié)果。為彌補(bǔ)以上兩種方法的不足,本文提出一種自組織映射(SOM)神經(jīng)網(wǎng)絡(luò)與K-medoids算法結(jié)合的算法——改進(jìn)的SOM-K算法。文中,第一章詳細(xì)描述了大數(shù)據(jù)背景下,聚類和降維算法的研究意義;第二章主要講述了基于聚類算法距離的定義;第三章主要闡述傳統(tǒng)的K-medoids算法和SOM算法;第四章主要說明了本文提出的基于SOM算法與K-medoids算法的改進(jìn)聚類算法并比較了傳統(tǒng)的K-medoids算法、SOM算法和SOM-K算法對(duì)鳶尾花數(shù)據(jù)集的聚類結(jié)果,證實(shí)了 SOM-K算法是優(yōu)于傳統(tǒng)的K-medoids算法和SOM算法的一種算法;第五章用SOM-K算法對(duì)于全國(guó)水資源分布進(jìn)行聚類分析并結(jié)合分析結(jié)果給出詳細(xì)的結(jié)論闡述;第六章進(jìn)行總結(jié)與展望,闡明改進(jìn)算法的優(yōu)勢(shì)與不足,以便后續(xù)繼續(xù)學(xué)習(xí)與探究。
[Abstract]:The rapid development of science and technology, causing the rapid expansion of information, computer storage and industry database brings great challenges. With the increase of data exponential level, the dimension is increasing, and the complexity of data type is also increasing. For these ultra-high dimensional data, we need to explore the information hidden in the data through data mining technology and use the obtained information to help us to make scientific and reasonable prediction and decision-making. Common methods to deal with high-dimensional data are: data dimension reduction, cluster analysis, regression analysis and so on. This paper introduces the traditional self-organizing mapping SOM) neural network and K-medoids algorithm. When the traditional SOM algorithm is used, there is a big gap between the partial sample points and the corresponding weight vectors, so the accuracy of the clustering algorithm is lower than that of the K-medoids algorithm. Before clustering, the number of clusters and the initial center points need to be determined artificially. Different clustering numbers and initial centers will result in different clustering results. In order to make up for the shortcomings of the above two methods, this paper presents an improved SOM-K algorithm, which combines the self-organizing mapping (SM) neural network with the K-medoids algorithm. In the first chapter, the research significance of clustering and dimensionality reduction algorithm under big data background is described in detail; the second chapter mainly describes the definition of distance based on clustering algorithm; the third chapter mainly describes the traditional K-medoids algorithm and SOM algorithm; In chapter 4, the improved clustering algorithm based on SOM algorithm and K-medoids algorithm is introduced, and the clustering results of traditional K-medoids algorithm and SOM-K algorithm for Iris data set are compared. It is proved that the SOM-K algorithm is superior to the traditional K-medoids algorithm and the SOM algorithm. Chapter 5 uses the SOM-K algorithm to cluster the distribution of water resources in China and gives a detailed conclusion. Clarify the advantages and disadvantages of the improved algorithm, so as to continue to learn and explore.
【學(xué)位授予單位】:太原理工大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:TP311.13;TP183
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 姜明偉;;彰武縣水資源時(shí)空分布特點(diǎn)分析[J];水資源開發(fā)與管理;2016年06期
2 夏海洋;韓家新;韋文娟;;SOM在數(shù)據(jù)挖掘中的應(yīng)用[J];福建電腦;2016年06期
3 李賽;鄒麗華;;人工神經(jīng)網(wǎng)絡(luò)在聚類分析中的運(yùn)用[J];商場(chǎng)現(xiàn)代化;2016年15期
4 閔晶晶;鄧長(zhǎng)菊;曹曉鐘;劉還珠;王式功;;強(qiáng)對(duì)流天氣形勢(shì)聚類分析中SOM方法應(yīng)用[J];氣象科技;2015年02期
5 謝娟英;魯肖肖;屈亞楠;高紅超;;粒計(jì)算優(yōu)化初始聚類中心的K-medoids聚類算法[J];計(jì)算機(jī)科學(xué)與探索;2015年05期
6 潘楚;羅可;;基于改進(jìn)粒計(jì)算的K-medoids聚類算法[J];計(jì)算機(jī)應(yīng)用;2014年07期
7 孟小峰;慈祥;;大數(shù)據(jù)管理:概念、技術(shù)與挑戰(zhàn)[J];計(jì)算機(jī)研究與發(fā)展;2013年01期
8 謝娟英;郭文娟;謝維信;;基于鄰域的K中心點(diǎn)聚類算法[J];陜西師范大學(xué)學(xué)報(bào)(自然科學(xué)版);2012年04期
9 馬箐;謝娟英;;基于粒計(jì)算的K-medoids聚類算法[J];計(jì)算機(jī)應(yīng)用;2012年07期
10 侯麗敏;王文莉;;基于SOM改進(jìn)的K-Means聚類算法[J];內(nèi)蒙古大學(xué)學(xué)報(bào)(自然科學(xué)版);2011年05期
相關(guān)博士學(xué)位論文 前2條
1 孫弘顏;長(zhǎng)春市水資源系統(tǒng)的優(yōu)化配置及策略研究[D];吉林大學(xué);2007年
2 顏學(xué)峰;高維復(fù)雜模式識(shí)別的新方法[D];浙江大學(xué);2002年
相關(guān)碩士學(xué)位論文 前1條
1 萬江;基于SOM基因聚類的基因數(shù)據(jù)組織樣本聚類[D];西安電子科技大學(xué);2005年
,本文編號(hào):1791279
本文鏈接:http://sikaile.net/kejilunwen/zidonghuakongzhilunwen/1791279.html