高維數(shù)據(jù)流快速降維聚類算法研究
[Abstract]:With the explosive growth of data, it is more difficult to find valuable information from data and transform it into organized knowledge, so data mining emerges as the times require. As one of the important research methods of data mining, clustering analysis is widely used in many fields. With the continuous development of information technology, data flow has become a new data type, and gradually become the mainstream. Therefore, the research on clustering algorithm of data flow becomes hot and meaningful. The clustering algorithm of high-dimensional data flow includes two parts: reduction and clustering. In this paper, aiming at the shortcomings of the existing dimensionality reduction algorithm and clustering algorithm, an improved algorithm is proposed, and the advantages of the improved algorithm are proved by experiments. In this paper, on the basis of others, the high-dimensional data carrier space dimension reduction algorithm can not automatically adjust the dimensionality reduction results according to the dynamic changes of the data stream and needs to scan the data stream many times. An adaptive dimension reduction algorithm for high dimensional data carrier space based on structure tree is proposed. By improving the relative entropy to find the correlation dimension of the region, the algorithm establishes the corresponding subspace, and implements clustering in the subspace to ensure that different regions correspond to different subspaces. Using relative entropy to find regional correlation dimension is simpler and more natural than Sun Yufen's GSCDS algorithm. At the same time, the structure tree is used to save the relevant information of the partition process, and combined with the idea of backtracking algorithm, the adaptive function of high dimensional data carrier space clustering algorithm is realized. It avoids the embarrassment that the algorithm needs to rerun the subspace algorithm every time it faces the new data, and the use of the attenuation factor also avoids the excessive influence of the old data on the clustering results. The experimental results show that the algorithm achieves high clustering quality with small time complexity. The clustering algorithm based on grid is applied to the clustering processing of dimension reduction results, which preserves the advantages of efficient grid algorithm and strong adaptive ability, but the classification of grid leads to the problem of low precision of class edge, which affects the clustering quality. In this paper, an improved data flow clustering algorithm is proposed to solve the problems of low cluster edge accuracy and multiple scanning of grid to realize clustering in grid-based data flow clustering algorithm. The algorithm is mainly improved in two aspects: firstly, in the initial clustering stage, the method from inside to outside and from point to surface is used to complete clustering by scanning grid at one time to solve the problem of low efficiency caused by repeatedly scanning grid in the original algorithm; Then, by finding the maximum density connected set to distinguish the noise points and useful points in the edge area to the maximum extent, the problem of missing edge points in the original algorithm can be solved. Finally, the experimental results show that the improved algorithm has a good effect on improving the edge accuracy of the class, and has a good adaptability to the distribution of data.
【學(xué)位授予單位】:長沙理工大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2016
【分類號】:TP311.13
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 賈瑞玉;李振;;基于最小生成樹的層次K-means聚類算法[J];微電子學(xué)與計(jì)算機(jī);2016年03期
2 高亮;謝健;曹天澤;;基于Kd樹改進(jìn)的高效K-means聚類算法[J];計(jì)算技術(shù)與自動(dòng)化;2015年04期
3 邢長征;劉劍;;基于近鄰傳播與密度相融合的進(jìn)化數(shù)據(jù)流聚類算法[J];計(jì)算機(jī)應(yīng)用;2015年07期
4 王彩霞;;基于改進(jìn)引力搜索的混合K-調(diào)和均值聚類算法研究[J];計(jì)算機(jī)應(yīng)用研究;2016年01期
5 支曉斌;許朝暉;;魯棒的特征權(quán)重自調(diào)節(jié)軟子空間聚類算法[J];計(jì)算機(jī)應(yīng)用;2015年03期
6 亢紅領(lǐng);李明楚;焦棟;郭成;徐淑珍;;一種基于屬性相關(guān)度的子空間聚類算法[J];小型微型計(jì)算機(jī)系統(tǒng);2015年02期
7 高兵;張健沛;鄒啟杰;;基于共享最近鄰密度的演化數(shù)據(jù)流聚類算法[J];北京科技大學(xué)學(xué)報(bào);2014年12期
8 邢長征;王曉旭;;基于擴(kuò)展網(wǎng)格和密度的數(shù)據(jù)流聚類算法[J];計(jì)算機(jī)工程;2014年12期
9 劉波;王紅軍;成聰;楊燕;;基于屬性最大間隔的子空間聚類[J];南京大學(xué)學(xué)報(bào)(自然科學(xué));2014年04期
10 王治和;楊晏;;基于雙層網(wǎng)格和密度的數(shù)據(jù)流聚類算法[J];計(jì)算機(jī)工程;2014年04期
相關(guān)博士學(xué)位論文 前4條
1 王平水;基于聚類的匿名化隱私保護(hù)技術(shù)研究[D];南京航空航天大學(xué);2013年
2 趙旭劍;中文新聞話題動(dòng)態(tài)演化及其關(guān)鍵技術(shù)研究[D];中國科學(xué)技術(shù)大學(xué);2012年
3 魏小濤;在線自適應(yīng)網(wǎng)絡(luò)異常檢測系統(tǒng)模型與相關(guān)算法研究[D];北京交通大學(xué);2009年
4 單世民;基于網(wǎng)格和密度的數(shù)據(jù)流聚類方法研究[D];大連理工大學(xué);2006年
相關(guān)碩士學(xué)位論文 前10條
1 王理想;子空間高維聚類算法的研究[D];重慶理工大學(xué);2015年
2 胡國輝;基于不規(guī)則網(wǎng)格的高維數(shù)據(jù)流聚類算法研究[D];燕山大學(xué);2014年
3 張焯;基于聚類的軟件模塊缺陷預(yù)測方法研究[D];重慶大學(xué);2014年
4 楊志;基于粒子群的粗糙聚類算法分析與研究[D];長沙理工大學(xué);2014年
5 白云悅;基于DBSCAN和相似度的子空間聚類算法研究[D];燕山大學(xué);2013年
6 鄭燕;基于增量學(xué)習(xí)的自適應(yīng)話題追蹤技術(shù)研究[D];山東師范大學(xué);2013年
7 廖浩偉;基于網(wǎng)頁結(jié)構(gòu)聚類的Web信息提取技術(shù)研究[D];西南交通大學(xué);2013年
8 靳艷虹;基于PSO的基因表達(dá)數(shù)據(jù)聚類研究[D];中南大學(xué);2013年
9 張井;高維數(shù)據(jù)子空間聚類算法研究[D];天津大學(xué);2012年
10 劉之崗;基于有效維選擇的子空間聚類算法研究[D];燕山大學(xué);2012年
,本文編號:2479822
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2479822.html