基于邊界度模型的聚類技術研究
本文選題:聚類邊界 + 聚類算法; 參考:《鄭州大學》2017年碩士論文
【摘要】:聚類是將相似的數(shù)據點劃分到同一個簇中,不相似的數(shù)據點劃分到不同的簇中的技術。在數(shù)據分析中,聚類技術可以用來分析數(shù)據集中數(shù)據的結構、聚類之間的關系等,在模式識別、生物監(jiān)測、藥品研制、信息安全監(jiān)測等領域發(fā)揮著重要的作用。但是,由于高維空間數(shù)據的稀疏性,現(xiàn)有的聚類技術對高維空間聚類時存在發(fā)現(xiàn)聚類困難和聚類精度不高等問題。與傳統(tǒng)的聚類思想不同,本文采用優(yōu)先查找聚類邊界然后向聚類中心搜索尋找聚類的思路提出了新的聚類算法。其創(chuàng)新點如下:提出了一種新的適用高維的聚類算法CASB(A Clustering Algorithm With Affine Space Based Boundary Detection)。該算法首先利用空間的仿射變換后拓撲結構不變性建立聚類邊界模型,并以此尋找聚類的邊界;然后以邊界點為基礎構建連接矩陣,再從聚類邊界向聚類內部搜索的方式形成聚類。實驗表明該算法能夠對含有不同密度、不同大小、不同形狀的高維數(shù)據聚類,與同類的算法相比具有較高的準確度,且參數(shù)選取簡單。提出了一種基于偏斜邊界檢測的聚類算法C-USB(A Clustering Algorithm Using Skewness-based Boundary Detection)。該算法首先提出一種偏斜假設,即聚類邊緣位置的點及其近鄰點在其空間分布上存在偏斜的情況;然后通過計算數(shù)據點的偏斜程度來計算數(shù)據點的邊界度并尋找聚類的邊界;最后以邊界點為基礎刪減數(shù)據點的近鄰關系構建連接矩陣形成聚類。實驗表明該算法能夠對復雜高維數(shù)據集進行聚類分析并保持較高的準確度,特別是能夠在大規(guī)模的數(shù)據集上仍然能夠取得很好地聚類效果。提出一種新的面向復雜數(shù)據的聚類算法CUSBD(Clustering Based On Skew-based Boundary Detection)。該算法同樣提出一種邊界點的分布假設,即聚類邊緣位置的點及其近鄰點在其空間分布上滿足偏斜分布(采用gamma分布);然后在此假設的基礎上計算數(shù)據點及其近鄰點的分布偏斜程度來作為該點的邊界度并尋找聚類的邊界,再以邊界點為基礎構建連接矩陣形成聚類。實驗表明該算法能夠有效地控制算法在不同密度、大小、形狀、規(guī)模的數(shù)據集中的聚類準確度,具有計算方便的特點。
[Abstract]:Clustering is a technique that divides similar data points into the same cluster and dissimilar data points into different clusters. In data analysis, clustering technology can be used to analyze the structure of data set, the relationship between clustering and so on. It plays an important role in the fields of pattern recognition, biological monitoring, drug development, information security monitoring and so on. However, due to the sparsity of high-dimensional spatial data, the existing clustering techniques are difficult to find and the accuracy of clustering is not high. Different from the traditional clustering idea, this paper proposes a new clustering algorithm based on the idea of first looking for the clustering boundary and then searching the cluster center to find the clustering. The innovations are as follows: a new clustering Algorithm with Affine space based boundary detection algorithm is proposed. Firstly, the clustering boundary model is established by using the invariance of topological structure after affine transformation of space, and then the boundary of clustering is found, and then the connection matrix is constructed based on the boundary point. Then the clustering is formed from the edge of the cluster to the internal search of the cluster. Experiments show that the algorithm can cluster high-dimensional data with different densities, sizes and shapes, and has higher accuracy than similar algorithms, and the selection of parameters is simple. This paper presents a clustering algorithm based on skew boundary detection, C-USBU A clustering Algorithm using Skewness-based boundary detection. In this algorithm, a skew assumption is first proposed, that is, the skew exists in the spatial distribution of the points at the edge of the clustering and its adjacent points, and then the boundary degree of the data points is calculated and the boundary of the clustering is found by calculating the skew degree of the data points. Finally, based on the boundary point, the nearest neighbor relation of the data point is deleted to construct the join matrix to form the clustering. Experimental results show that the proposed algorithm can be used to cluster complex high dimensional data sets with high accuracy, especially on large scale data sets. A new clustering algorithm for complex data, CUSBD clustering based on Skew-based Boundary Detection, is proposed. The algorithm also proposes a boundary point distribution hypothesis. That is, the point of clustering edge position and its nearest neighbor point satisfy skew distribution in its spatial distribution (using gamma distribution), and then calculate the skew degree of data point and its nearest neighbor point as the boundary degree of this point on the basis of this assumption. And look for the boundaries of the cluster, Then the connection matrix is constructed on the basis of boundary points to form clustering. Experiments show that the algorithm can effectively control the clustering accuracy of the algorithm in different data sets with different density, size, shape and size, and has the characteristics of convenient calculation.
【學位授予單位】:鄭州大學
【學位級別】:碩士
【學位授予年份】:2017
【分類號】:TP311.13
【相似文獻】
相關期刊論文 前10條
1 馬志方;;聚類技術及理論聚類[J];計算機科學;1988年04期
2 宋浩遠;;應用于大型數(shù)據庫的聚類技術研究[J];重慶文理學院學報(自然科學版);2008年01期
3 姚煒,田建明,趙寶珍,錢國正,陳寧寧;計算機自動勾畫人體臟器聲像圖邊界[J];中華超聲影像學雜志;2003年10期
4 楊斌;;基于聚類技術的數(shù)據動態(tài)搜索方法[J];計算機教育;2006年12期
5 陳利軍;;常用的聚類技術分析[J];湖南工業(yè)職業(yè)技術學院學報;2012年01期
6 劉佳佳;;淺論聚類技術及其在圖書館服務中的應用[J];現(xiàn)代企業(yè)教育;2012年22期
7 王萍;;運用聚類技術分析客戶信息的方法與實證研究[J];情報科學;2006年05期
8 邱保志;岳峰;;基于引力的邊界點檢測算法[J];小型微型計算機系統(tǒng);2008年02期
9 李玉擰;孟東霞;桂智明;;幾何集成的改進——特征邊界點快速計算[J];山東大學學報(工學版);2011年04期
10 劉毅;;計算機技術在圖象邊界修復中的應用[J];山東電子;1997年01期
相關會議論文 前5條
1 宋二祥;;無限地基數(shù)值模擬的傳輸邊界[A];第六屆全國結構工程學術會議論文集(第三卷)[C];1997年
2 宿太學;呂天陽;張錫哲;王鉦旋;徐長青;;一個集成多種聚類技術實驗平臺的整體設計方案和基本模塊實現(xiàn)[A];第十二屆全國圖象圖形學學術會議論文集[C];2005年
3 高煜;程昊;畢傳興;陳劍;;基于分布源邊界點的結構聲輻射和聲靈敏度的研究[A];第九屆全國振動理論及應用學術會議論文摘要集[C];2007年
4 王s,
本文編號:2010427
本文鏈接:http://sikaile.net/shoufeilunwen/xixikjs/2010427.html