一種改進(jìn)的遺傳算法在K-Means聚類分析中的應(yīng)用
本文選題:分類 切入點(diǎn):聚類分析 出處:《華北電力大學(xué)》2017年碩士論文
【摘要】:一般我們?cè)谔幚砟承┨厥馐虑榈臅r(shí)候,往往需要借助工具,將事物分類之后再進(jìn)行研究,比如地質(zhì)資源勘探中根據(jù)物探、鉆探的指標(biāo)對(duì)樣本進(jìn)行分類;考古生物學(xué)研究中根據(jù)發(fā)掘出的骨骸形狀和大小將它們分類;氣象衛(wèi)星系統(tǒng)中因?yàn)樗O(jiān)測(cè)到數(shù)據(jù)信息非常復(fù)雜、龐大,需要將它們按照不同的指標(biāo)進(jìn)行分類歸整,再進(jìn)行深入分析,以便做出準(zhǔn)確的預(yù)報(bào)等等,聚類分析因此應(yīng)運(yùn)而生。聚類分析技術(shù)是一門將具體或抽象的對(duì)象劃分成不同的類別的科學(xué),其中并沒有事先的分類。它也是一種重要的人類行為。隨著計(jì)算機(jī)和信息技術(shù)的快速發(fā)展,數(shù)據(jù)信息呈現(xiàn)飛速增長(zhǎng)的趨勢(shì),作為數(shù)據(jù)挖掘技術(shù)的一種重要手段,聚類分析已經(jīng)被越來越多的人所關(guān)注。K-Means算法是一種基于劃分的算法,由于其操作簡(jiǎn)單,原理通俗易懂的特點(diǎn),得到了人們的廣泛應(yīng)用和研究,成為十大典型數(shù)據(jù)挖掘算法之一。但是K-Means算法也有自身的不足,即k值不好確定的問題,初始中心只能隨機(jī)選擇,容易陷入局部最優(yōu)解等,導(dǎo)致算法很不穩(wěn)定。因此,本文在K-Means聚類算法基礎(chǔ)上引入遺傳算法,它是一種通過模仿生物演化過程而開發(fā)的進(jìn)行搜索最優(yōu)解的方法,具有良好的全局搜索能力。針對(duì)兩種算法的特點(diǎn),提出一種改進(jìn)遺傳算法應(yīng)用于K-Means聚類的混合算法,并用樣本數(shù)據(jù)集進(jìn)行了仿真實(shí)驗(yàn),實(shí)驗(yàn)表明本文算法在應(yīng)用中得到了良好的聚類效果。本文的工作主要分為兩部分:1)第一部分主要簡(jiǎn)單介紹了聚類分析、K-Means算法和遺傳算法的基本概念。重點(diǎn)概述了K-Means算法和遺傳算法的基本思想,描述了算法的組成和基本要素以及流程,最后介紹了算法的應(yīng)用。2)第二部分重點(diǎn)介紹了一種改進(jìn)的基于遺傳算法的K-Means聚類算法,并對(duì)此算法在染色體編碼、適應(yīng)度函數(shù)的選取、選擇、交叉和變異算子的設(shè)計(jì)和改進(jìn)、K-Means算法與遺傳算法的結(jié)合操作等方面進(jìn)行了全面描述。最后,為了驗(yàn)證本文提出的算法的有效性進(jìn)行了測(cè)試實(shí)驗(yàn),根據(jù)實(shí)驗(yàn)結(jié)果對(duì)兩種方法進(jìn)行對(duì)比分析,證實(shí)了本文方法的可行性和良好的聚類性能。
[Abstract]:In general we deal with some special things, often need to use the tools, the object classification after research, such as geological prospecting according to geophysical prospecting, drilling parameters to classify the samples; biological research according to the archaeological unearthed bones of the shape and size of their classification; meteorological satellite systems for monitoring the data is very complex and huge, they need to be classified according to different indicators of consolidation, then in-depth analysis, in order to make an accurate prediction, clustering analysis came into being. The clustering analysis technology is a concrete or abstract objects are divided into different categories of science, which did not advance classification. It is also a an important human behavior. With the rapid development of computer and information technology, the data has shown a rapid growth trend, as the data mining technology One of the important means of the cluster analysis has been more and more people are concerned about the.K-Means algorithm is a classification algorithm based on the principle, because of its simple operation, user-friendly features, has been widely used and the study of people, to become the ten largest one of the typical data mining algorithm. But K-Means algorithm also has its own shortcomings, i.e. the K value is not easily determined, only random selection of initial centers, easy to fall into the local optimal solution, the algorithm is very unstable. Therefore, this paper introduces the genetic algorithm based on K-Means clustering algorithm, it is a kind of imitation by the process of evolution and development of the method of searching optimal solution, has a good overall search ability. According to the characteristics of the two algorithms, this paper proposes a hybrid algorithm of improved genetic algorithm in the application of K-Means clustering, and the simulation experiments with the sample data set, experiments show this algorithm Get a good clustering effect in the application. The main work of this paper is divided into two parts: 1) the first part mainly introduces the basic concepts of clustering analysis, K-Means algorithm and genetic algorithm. Mainly introduce the basic idea of K-Means algorithm and genetic algorithm, describes the composition and basic elements of the algorithm and process, at the end of the paper the application of.2 algorithm) the second part mainly introduces an improved K-Means clustering algorithm based on genetic algorithm, and this algorithm in the chromosome encoding, adapt to the selection, the selection of fitness function, design and improvement of crossover and mutation operator, combined with the operation of K-Means algorithm and genetic algorithm is described. Finally, in order to verify the validity of the proposed algorithm was tested, the comparative analysis of the two methods according to the experimental results, confirmed the feasibility of this method and good The clustering performance.
【學(xué)位授予單位】:華北電力大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:TP18;TP311.13
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 孟子健;馬江洪;;一種可選初始聚類中心的改進(jìn)k均值算法[J];統(tǒng)計(jì)與決策;2014年12期
2 王永貴;林琳;劉憲國(guó);;結(jié)合雙粒子群和K-means的混合文本聚類算法[J];計(jì)算機(jī)應(yīng)用研究;2014年02期
3 馮波;郝文寧;陳剛;占棟輝;;K-means算法初始聚類中心選擇的優(yōu)化[J];計(jì)算機(jī)工程與應(yīng)用;2013年14期
4 耿躍;任軍號(hào);吉沛琦;;基于K-Means變異算子的混合遺傳算法聚類研究[J];計(jì)算機(jī)工程與應(yīng)用;2011年29期
5 王穎;劉建平;;基于改進(jìn)遺傳算法的K-means聚類分析[J];工業(yè)控制計(jì)算機(jī);2011年08期
6 任景彪;尹紹宏;;一種有效的k-means聚類初始中心選取方法[J];計(jì)算機(jī)與現(xiàn)代化;2010年07期
7 胡_g;畢晉芝;;遺傳優(yōu)化的K均值聚類算法[J];計(jì)算機(jī)系統(tǒng)應(yīng)用;2010年06期
8 賴玉霞;劉建平;楊國(guó)興;;基于遺傳算法的K均值聚類分析[J];計(jì)算機(jī)工程;2008年20期
9 葛繼科;邱玉輝;吳春明;蒲國(guó)林;;遺傳算法研究綜述[J];計(jì)算機(jī)應(yīng)用研究;2008年10期
10 鄧?yán)?魯瑞華;;一種改進(jìn)的抑制早熟收斂的模糊遺傳算法[J];計(jì)算機(jī)科學(xué);2007年11期
相關(guān)碩士學(xué)位論文 前4條
1 李芳;K-Means算法的k值自適應(yīng)優(yōu)化方法研究[D];安徽大學(xué);2015年
2 陶晶;基于聚類和密度的離群點(diǎn)檢測(cè)方法[D];華南理工大學(xué);2014年
3 趙松;數(shù)據(jù)挖掘中基于遺傳算法的K-means聚類算法的研究及應(yīng)用[D];浙江工業(yè)大學(xué);2014年
4 朱建宇;K均值算法研究及其應(yīng)用[D];大連理工大學(xué);2013年
,本文編號(hào):1719614
本文鏈接:http://sikaile.net/kejilunwen/zidonghuakongzhilunwen/1719614.html