天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁(yè) > 科技論文 > 軟件論文 >

基于DNA遺傳算法的聚類分析研究與應(yīng)用

發(fā)布時(shí)間:2018-09-18 21:33
【摘要】:聚類分析是模式識(shí)別中無監(jiān)督分類的一個(gè)非常重要的分支,因?yàn)閷?duì)現(xiàn)實(shí)問題研究的需要,近幾年來對(duì)聚類分析的研究越來越多,相對(duì)應(yīng)的聚類方法也逐漸增多。鑒于現(xiàn)實(shí)問題在分類過程的模糊特征,基于目標(biāo)函數(shù)的模糊C均值聚類便逐漸被廣泛應(yīng)用,F(xiàn)在學(xué)者們也開始將聚類問題轉(zhuǎn)化成圖論問題,在圖的基礎(chǔ)上進(jìn)行聚類的譜聚類也是當(dāng)前聚類研究的一個(gè)重要的方向。兩種聚類算法都是當(dāng)下研究的熱點(diǎn),但是無論哪一種都不是通用的,目前各自都存在一些缺陷,為了彌補(bǔ)算法的不足,便可以借助一些智能優(yōu)化算法對(duì)其進(jìn)行優(yōu)化,提高聚類算法的性能。本文主要研究的就是模糊C均值聚類算法、譜聚類算法以及對(duì)兩種算法進(jìn)行優(yōu)化的DNA遺傳算法。在當(dāng)前的模糊聚類算法中,模糊C均值聚類(Fuzzy C-means Clustering,簡(jiǎn)稱FCM)因其具有較好的局部搜索能力并且在執(zhí)行過程中操作簡(jiǎn)便高效而被廣泛應(yīng)用,但是模糊C均值聚類算法也有一些固有的缺陷和不足,第一:算法本身的隸屬度和為1的限定條件易造成算法對(duì)數(shù)據(jù)點(diǎn)中噪聲和離群點(diǎn)比較敏感,第二:算法對(duì)初始聚類中心的選取非常敏感并容易陷入局部最優(yōu)。本文為了克服模糊C均值聚類算法的各項(xiàng)不足,整體上采用改進(jìn)的隸屬度計(jì)算方式以期降低噪聲和離群點(diǎn)對(duì)聚類結(jié)果的影響,同時(shí)加入密度計(jì)算來優(yōu)化FCM算法對(duì)初始聚類中心敏感的不足,另外還使用了改進(jìn)之后的DNA遺傳算法協(xié)助FCM算法跳出局部最優(yōu),最終尋得全局最優(yōu)解。譜聚類(Spectral clustering,簡(jiǎn)稱SC)算法是建立在圖論的譜圖理論基礎(chǔ)上,其本質(zhì)就是把聚類問題轉(zhuǎn)換成圖劃分問題,是典型的點(diǎn)對(duì)聚類算法。譜聚類算法能夠可以有效的降低計(jì)算的復(fù)雜度,同時(shí)能夠保證聚類的質(zhì)量,但是譜聚類目前是一新興領(lǐng)域,在很多的地方仍然存在著不足,譜聚類自身可以改進(jìn)的地方有相似度矩陣的構(gòu)建、特征值和特征向量的選取以及最終實(shí)現(xiàn)聚類的過程。初始的譜聚類在相似度矩陣構(gòu)建中使用的是基于歐氏距離的高斯核函數(shù),導(dǎo)致在進(jìn)行相似度矩陣構(gòu)建時(shí),會(huì)受到高斯核參數(shù)σ不確定的影響,所以本文采用了基于調(diào)整相似度系數(shù)的相似度矩陣構(gòu)建方法創(chuàng)建相似度矩陣,無需人工設(shè)定參數(shù),所得結(jié)果會(huì)更加符合真實(shí)情況。另外,譜聚類在進(jìn)行最終的聚類過程中一般采用的都是K-means聚類方法,只是K-means方法自身存在著對(duì)初始聚類中心敏感,且容易陷入局部最優(yōu)的缺點(diǎn),所以本文在最終的聚類過程中用改進(jìn)的DNA遺傳算法優(yōu)化K-means聚類算法,以期更好的完成譜聚類過程。DNA遺傳算法與遺傳算法有著很大的相似之處,區(qū)別在于DNA遺傳算法采用了特殊的DNA編碼方式對(duì)種群個(gè)體進(jìn)行遺傳操作進(jìn)而得到問題的解。DNA遺傳算法特有的四進(jìn)制編碼方式,能夠更靈活的表示更復(fù)雜的信息,編碼精度更高。同時(shí)DNA遺傳算法還具有優(yōu)良的全局搜索能力及隱性并行性的特點(diǎn),在本文中,我們正好借助DNA遺傳算法的優(yōu)勢(shì)來對(duì)模糊C均值算法和譜聚類算法進(jìn)行優(yōu)化,以解決兩類算法本身存在的不足。另外為了提高優(yōu)化效果,本文還對(duì)DNA遺傳算法的多個(gè)遺傳算子進(jìn)行了相應(yīng)的改進(jìn)。本文通過MATLAB進(jìn)行仿真和實(shí)驗(yàn),首先使用測(cè)試函數(shù)以及人工數(shù)據(jù)集證明了改進(jìn)的DNA遺傳算法的可行性及有效性,然后使用UCI數(shù)據(jù)集分別對(duì)提出的改進(jìn)模糊C均值聚類算法和改進(jìn)的譜聚類算法進(jìn)行實(shí)驗(yàn)效果驗(yàn)證,證明算法的有效性。同時(shí)將改進(jìn)的模糊C均值算法用于搜狗實(shí)驗(yàn)室語(yǔ)料庫(kù)的文本分類中,實(shí)驗(yàn)分類結(jié)果有效證明了改進(jìn)算法的有效性。
[Abstract]:Clustering analysis is a very important branch of unsupervised classification in pattern recognition. In recent years, more and more researches have been done on clustering analysis and the corresponding clustering methods have been gradually increased because of the need for practical problems. Nowadays, scholars have begun to transform clustering problem into graph theory problem. Spectral clustering based on graph is also an important direction of current clustering research. Both clustering algorithms are hot topics, but neither of them is universal. At present, there are some defects in each of them. To make up for the shortcomings of the algorithm, some intelligent optimization algorithms can be used to optimize it and improve the performance of the clustering algorithm. This paper mainly studies the fuzzy C-means clustering algorithm, the spectral clustering algorithm and the DNA genetic algorithm which optimizes the two algorithms. Stering (FCM) is widely used because of its good local search ability and easy and efficient operation in the process of execution. But fuzzy C-means clustering algorithm also has some inherent shortcomings and shortcomings. First, the membership degree of the algorithm itself and the limitation of 1 are easy to make the algorithm sensitive to noise and outliers in the data points. In order to overcome the shortcomings of the fuzzy C-means clustering algorithm, the improved membership calculation method is adopted to reduce the influence of noise and outliers on the clustering results, and the density calculation is added to optimize the FCM algorithm for the initial clustering. In addition, the improved DNA genetic algorithm is used to help FCM algorithm jump out of the local optimum and find the global optimum. Spectral clustering (SC) algorithm is based on the spectral theory of graph theory. Its essence is to transform clustering problem into graph partitioning problem, which is a typical point-to-point clustering problem. Spectral clustering algorithm can effectively reduce the computational complexity and ensure the quality of clustering, but spectral clustering is a new field, there are still some shortcomings in many places, spectral clustering itself can be improved in the construction of similarity matrix, the selection of eigenvalues and eigenvectors and the final reality. The initial spectral clustering uses the Gaussian kernel function based on Euclidean distance in the construction of similarity matrix, which results in the uncertainties of Gaussian kernel parameter_when constructing similarity matrix. In addition, the K-means clustering method is generally used in the final clustering process, but the K-means clustering method itself is sensitive to the initial clustering center and easy to fall into local optimum shortcomings, so in the final clustering process, this paper uses a modification. The improved DNA genetic algorithm optimizes K-means clustering algorithm in order to complete the spectral clustering process better. DNA genetic algorithm and genetic algorithm have great similarities. The difference is that DNA genetic algorithm uses a special DNA coding method to conduct genetic operations on individuals and get the solution of the problem. At the same time, DNA genetic algorithm has excellent global search ability and implicit parallelism. In this paper, we use the advantages of DNA genetic algorithm to optimize the fuzzy C-means algorithm and spectral clustering algorithm to solve the problem of the existence of two kinds of algorithms. In addition, in order to improve the optimization effect, several genetic operators of DNA genetic algorithm are improved correspondingly. In this paper, MATLAB is used to simulate and experiment. Firstly, test function and artificial data set are used to prove the feasibility and validity of the improved DNA genetic algorithm. The improved fuzzy C-means clustering algorithm and the improved spectral clustering algorithm are used to verify the validity of the algorithm. At the same time, the improved fuzzy C-means algorithm is applied to the text classification of the dog laboratory corpus. The experimental results show that the improved algorithm is effective.
【學(xué)位授予單位】:山東師范大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:TP311.13

【參考文獻(xiàn)】

相關(guān)期刊論文 前10條

1 董倩;;改進(jìn)遺傳算法優(yōu)化模糊均值聚類中心的圖像分割[J];吉林大學(xué)學(xué)報(bào)(理學(xué)版);2015年04期

2 李振博;徐桂瓊;g,

本文編號(hào):2249154


資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2249154.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶a366e***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com