天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于k-means的自動(dòng)三支決策聚類方法

發(fā)布時(shí)間:2018-05-18 13:47

  本文選題:聚類 + 三支決策。 參考:《重慶郵電大學(xué)》2016年碩士論文


【摘要】:k-means算法簡(jiǎn)單易懂,效率高,自提出50多年來(lái)在聚類分析中得到了廣泛的應(yīng)用。然而,k-means算法也存在不足之處,即需要人為設(shè)定聚類數(shù)目。但是,聚類分析是一種無(wú)監(jiān)督的方法,在沒(méi)有先驗(yàn)知識(shí)的情況下很難事先確定聚類數(shù)目。另一方面,在實(shí)際應(yīng)用中一個(gè)對(duì)象和類存在多種關(guān)系:即一個(gè)對(duì)象確定屬于一個(gè)類;一個(gè)對(duì)象確定不屬于一個(gè)類;一個(gè)對(duì)象可能屬于也可能不屬于一個(gè)類,即根據(jù)目前獲得的信息難以確定地判斷對(duì)象與類的關(guān)系。比如,在社交網(wǎng)絡(luò)、生物信息處理與電子商務(wù)等領(lǐng)域中這種不確定性現(xiàn)象非常普遍。k-means算法得到的聚類結(jié)果其實(shí)是一種二支決策聚類結(jié)果,對(duì)象和類之間只有兩種關(guān)系,即對(duì)象要么屬于一個(gè)類,要么不屬于一個(gè)類。因此,傳統(tǒng)的k-means聚類方法不能有效地處理這種帶有不確定現(xiàn)象的聚類任務(wù)。為此,本文針對(duì)這種帶有不確定性現(xiàn)象的聚類問(wèn)題進(jìn)行了研究,并給出了基于k-means算法框架的自動(dòng)確定聚類數(shù)目的解決方案。1.針對(duì)k-means算法聚類數(shù)目難以自動(dòng)確定的難題,本文提出了新的用于度量聚類結(jié)果的有效性指數(shù)。定義了考慮近鄰的分離性指數(shù)和新的緊湊性指數(shù),提出了一種基于差值排序的聚類有效性指數(shù),從而提出一種自動(dòng)的k-means聚類算法。文中的有效性指數(shù)考慮對(duì)象和鄰居的分布情況以及類中對(duì)象數(shù)目?jī)蓚(gè)因素,能夠很好地度量聚類結(jié)果。2.針對(duì)傳統(tǒng)二支聚類的局限之處,引入三支決策思想擴(kuò)展原有聚類結(jié)果。傳統(tǒng)的k-means算法得到聚類結(jié)果其實(shí)是一種二支決策結(jié)果。然而,同一類中的對(duì)象對(duì)于類的形成起著不同的作用。有些對(duì)象是類中的典型對(duì)象,確定屬于該類;有些對(duì)象和類有著密切聯(lián)系,但是并不是該類的典型對(duì)象,可能屬于該類;有些對(duì)象和類沒(méi)有多少聯(lián)系,確定不屬于該類。這是一種典型的三支決策結(jié)果,即:對(duì)象確定屬于某類、可能屬于某類和確定不屬于某類。文中引入三支決策思想,結(jié)合定義的有效性指數(shù),提出一種基于k-means的自動(dòng)三支決策聚類方法。文中提出的基于k-means的三支決策聚類算法,一方面能夠自動(dòng)確定類簇個(gè)數(shù);另一方面文中得到的聚類結(jié)果對(duì)類中對(duì)象做進(jìn)一步區(qū)分能夠得到更加豐富的聚類結(jié)果,便于對(duì)聚類結(jié)果做進(jìn)一步的分析。實(shí)驗(yàn)表明,文中提出的有效性指數(shù)優(yōu)于對(duì)比的有效性指數(shù)。相較于傳統(tǒng)的二支決策聚類算法,文中提出的三支決策聚類算法能夠顯著提高聚類準(zhǔn)確率。
[Abstract]:K-means algorithm is easy to understand and efficient. It has been widely used in clustering analysis since it was proposed for more than 50 years. However, the k-means algorithm also has some shortcomings, that is, the number of clustering needs to be set artificially. However, clustering analysis is an unsupervised method, and it is difficult to determine the number of clusters in advance without prior knowledge. On the other hand, there are many relationships between an object and a class in a practical application: an object is determined to belong to a class; an object to not belonging to a class; an object may or may not belong to a class; That is to say, it is difficult to judge the relation between object and class according to the information obtained at present. For example, in the fields of social network, biological information processing and electronic commerce, this kind of uncertainty phenomenon is very common. The clustering result obtained by the k-means algorithm is actually a two-branch decision clustering result, and there are only two kinds of relationships between objects and classes. An object belongs either to a class or not to a class. Therefore, the traditional k-means clustering method can not effectively deal with this clustering task with uncertainty. Therefore, this paper studies the clustering problem with uncertainty, and gives a solution to determine the number of clusters automatically based on k-means algorithm framework. In order to solve the problem that the clustering number of k-means algorithm is difficult to determine automatically, this paper presents a new validity index to measure the clustering results. In this paper, the separation index and the new compactness index are defined, and a clustering validity index based on difference ordering is proposed, and an automatic k-means clustering algorithm is proposed. In this paper, the validity index takes into account the distribution of objects and neighbors and the number of objects in the class, which can well measure the clustering results. In view of the limitation of traditional two-branch clustering, the three-branch decision idea is introduced to extend the original clustering results. The traditional k-means algorithm to obtain clustering results is actually a two-family decision-making results. However, objects in the same class play a different role in class formation. Some objects are typical objects in a class, which are determined to belong to the class; some objects are closely related to the class, but not typical objects of the class, which may belong to the class; some objects and classes do not have much connection to determine that they do not belong to this class. This is a typical three-branch decision result, that is, the object determination belongs to a certain class, may belong to a certain class and does not belong to a certain class. An automatic three-branch decision clustering method based on k-means is proposed by introducing the idea of three-branch decision making and combining with the defined validity index. The three-branch decision clustering algorithm based on k-means, on the one hand, can automatically determine the number of clusters, on the other hand, the clustering results obtained in this paper can further distinguish the objects in the cluster and obtain more abundant clustering results. It is convenient to further analyze the clustering results. The experimental results show that the proposed validity index is better than the contrast validity index. Compared with the traditional two-branch decision clustering algorithm, the proposed three-branch decision clustering algorithm can significantly improve the clustering accuracy.
【學(xué)位授予單位】:重慶郵電大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2016
【分類號(hào)】:TP311.13

【參考文獻(xiàn)】

相關(guān)期刊論文 前2條

1 周開(kāi)樂(lè);楊善林;丁帥;羅賀;;聚類有效性研究綜述[J];系統(tǒng)工程理論與實(shí)踐;2014年09期

2 孫吉貴;劉杰;趙連宇;;聚類算法研究[J];軟件學(xué)報(bào);2008年01期

,

本文編號(hào):1906062

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/jingjilunwen/dianzishangwulunwen/1906062.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶4a426***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com