天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁(yè) > 科技論文 > 自動(dòng)化論文 >

高維分類(lèi)數(shù)據(jù)聚類(lèi)方法研究與實(shí)現(xiàn)

發(fā)布時(shí)間:2018-04-17 02:36

  本文選題:分類(lèi)數(shù)據(jù) + 子空間聚類(lèi) ; 參考:《東華大學(xué)》2017年碩士論文


【摘要】:聚類(lèi)分析作為一種無(wú)監(jiān)督的機(jī)器學(xué)習(xí)方法,根據(jù)一定的規(guī)則,將原本雜亂無(wú)章的數(shù)據(jù)分成一系列簇,使得每個(gè)簇由相似度較高的數(shù)據(jù)組成,這為后續(xù)的數(shù)據(jù)分析提供了極大的便利,被廣泛地應(yīng)用于網(wǎng)絡(luò)服務(wù)、地理、生物、貿(mào)易等多個(gè)領(lǐng)域。但隨著數(shù)據(jù)產(chǎn)生渠道及數(shù)據(jù)收集技術(shù)的發(fā)展,用于分析的數(shù)據(jù)維度及復(fù)雜度也越來(lái)越大,傳統(tǒng)的數(shù)據(jù)聚類(lèi)算法在這些數(shù)據(jù)集上無(wú)法取得較好的聚類(lèi)結(jié)果。軟子空間聚類(lèi)作為高維數(shù)據(jù)聚類(lèi)領(lǐng)域的一個(gè)研究熱點(diǎn),受到人們?cè)絹?lái)越多的關(guān)注。但針對(duì)分類(lèi)數(shù)據(jù),目前已有的軟子空間聚類(lèi)算法大多都是基于k-modes算法的擴(kuò)展,其數(shù)據(jù)間相似性的計(jì)算及屬性(也稱(chēng)為特征)的權(quán)值計(jì)算都依賴(lài)類(lèi)中心(modes)選擇,從而modes選的好壞直接影響了最終的聚類(lèi)質(zhì)量。同時(shí),現(xiàn)有的軟子空間聚類(lèi)算法在聚類(lèi)時(shí)對(duì)缺失數(shù)據(jù)和完整數(shù)據(jù)不加以區(qū)分,也很大程度上影響了最終的聚類(lèi)結(jié)果。針對(duì)高維不完整的分類(lèi)數(shù)據(jù),本文將基于簇直方圖高寬比聚類(lèi)思想的CLOPE算法應(yīng)用于軟子空間聚類(lèi),并提出了一個(gè)新的軟子空間聚類(lèi)算法。首先,結(jié)合粗糙集提出了一個(gè)缺失數(shù)據(jù)處理方法,來(lái)處理數(shù)據(jù)集中的缺失數(shù)據(jù),同時(shí),根據(jù)屬性的平均互信息對(duì)屬性加權(quán);然后,針對(duì)CLOPE算法的聚類(lèi)質(zhì)量受數(shù)據(jù)輸入順序影響的問(wèn)題,提出了對(duì)數(shù)據(jù)完全隨機(jī)排序的 洗牌模型‖來(lái)最大程度消除數(shù)據(jù)輸入順序?qū)ψ罱K聚類(lèi)質(zhì)量的影響;最后,利用Scala語(yǔ)言在Spark平臺(tái)上實(shí)現(xiàn)了該算法,使其能用于大規(guī)模數(shù)據(jù)的聚類(lèi)。本文選擇UCI中的真實(shí)數(shù)據(jù)作為本文的實(shí)驗(yàn)數(shù)據(jù),進(jìn)行了4組實(shí)驗(yàn),分別用來(lái)驗(yàn)證洗牌模型及屬性加權(quán)方法的有效性、缺失數(shù)據(jù)處理方法的有效性、本文提出的軟子空間算法的有效性及對(duì)數(shù)據(jù)規(guī)模的可擴(kuò)展性。實(shí)驗(yàn)結(jié)果表明,本文算法(未使用缺失數(shù)據(jù)處理方法的版本)的聚類(lèi)質(zhì)量明顯優(yōu)于CLOPE。與最大頻率填補(bǔ)和不做處理這兩種方式相比,隨著數(shù)據(jù)缺失率的增加,本文提出的缺失數(shù)據(jù)處理方法的優(yōu)勢(shì)也越明顯。與另外兩個(gè)典型的針對(duì)分類(lèi)數(shù)據(jù)的軟子空間聚類(lèi)算法相比,無(wú)論是從聚類(lèi)質(zhì)量還是運(yùn)行時(shí)間上,本文算法都有明顯的優(yōu)勢(shì)。
[Abstract]:Clustering analysis as an unsupervised machine learning method, according to certain rules, the original data is divided into a series of clusters, so that each cluster is composed of data with high similarity.This provides great convenience for subsequent data analysis and is widely used in many fields, such as network services, geography, biology, trade and so on.However, with the development of data generation channel and data collection technology, the dimension and complexity of data used for analysis are increasing, and the traditional data clustering algorithm can not obtain better clustering results on these data sets.Soft subspace clustering, as a research hotspot in the field of high dimensional data clustering, has attracted more and more attention.However, for classified data, most of the existing soft subspace clustering algorithms are based on the extension of k-modes algorithm, and the calculation of similarity between data and the weight calculation of attributes (also called features) depend on the selection of class center.Therefore, the quality of modes selection has a direct impact on the final clustering quality.At the same time, the existing soft subspace clustering algorithms do not distinguish the missing data from the complete data in clustering, and to a large extent affect the final clustering results.In this paper, CLOPE algorithm based on cluster histogram aspect ratio clustering is applied to soft subspace clustering, and a new soft subspace clustering algorithm is proposed.Firstly, a missing data processing method based on rough set is proposed to deal with the missing data in the dataset. At the same time, the attributes are weighted according to the average mutual information of the attributes.Aiming at the problem that the clustering quality of CLOPE algorithm is affected by the order of data input, a shuffling model of complete random sorting of data is proposed to eliminate the effect of data input order on the final clustering quality to the greatest extent.The algorithm is implemented on Spark platform by using Scala language, which can be used for large scale data clustering.In this paper, the real data in UCI is chosen as the experimental data, and four groups of experiments are conducted to verify the validity of shuffling model and attribute weighting method, and the validity of missing data processing method.In this paper, the validity of soft subspace algorithm and its scalability to data scale are discussed.The experimental results show that the clustering quality of the proposed algorithm (not using the version of missing data processing method) is obviously superior to that of CLOPE.Compared with the maximum frequency filling method and the non-processing method, the advantages of the proposed missing data processing method are more obvious with the increase of the data loss rate.Compared with the other two typical soft subspace clustering algorithms for classified data, this algorithm has obvious advantages in terms of clustering quality and running time.
【學(xué)位授予單位】:東華大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類(lèi)號(hào)】:TP181;TP311.13

【參考文獻(xiàn)】

相關(guān)期刊論文 前4條

1 丁祥武;郭濤;王梅;金冉;;一種大規(guī)模分類(lèi)數(shù)據(jù)聚類(lèi)算法及其并行實(shí)現(xiàn)[J];計(jì)算機(jī)研究與發(fā)展;2016年05期

2 李曄鋒;樂(lè)嘉錦;王梅;張濱;劉良旭;;MR-CLOPE: A Map Reduce based transactional clustering algorithm for DNS query log analysis[J];Journal of Central South University;2015年09期

3 程玉根;;2004—2007年鹽城地區(qū)無(wú)償獻(xiàn)血者血液檢測(cè)結(jié)果分析[J];中國(guó)輸血雜志;2009年01期

4 李潔,高新波,焦李成;模糊CLOPE算法及其參數(shù)優(yōu)選[J];控制與決策;2004年11期

,

本文編號(hào):1761714

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/zidonghuakongzhilunwen/1761714.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶18527***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com
九九热视频网在线观看| 高清不卡一卡二卡区在线| 一二区中文字幕在线观看| 国产又粗又深又猛又爽又黄| 亚洲av日韩一区二区三区四区| 国产精品一区二区不卡中文| 国产黑人一区二区三区| 亚洲色图欧美另类人妻| 成人欧美精品一区二区三区| 欧美一区二区不卡专区| 久久国内午夜福利直播| 亚洲熟女国产熟女二区三区| 年轻女房东2中文字幕| 日本人妻丰满熟妇久久| 亚洲精品美女三级完整版视频| 亚洲视频在线观看你懂的| 欧美韩日在线观看一区| 亚洲国产成人精品一区刚刚| 在线观看日韩欧美综合黄片| 午夜小视频成人免费看| 免费精品国产日韩热久久| 国产亚洲精品一二三区| 国产午夜精品在线免费看| 不卡在线播放一区二区三区| 青青草草免费在线视频| 国产精品欧美一区二区三区不卡 | 日韩女优精品一区二区三区| 日本不卡片一区二区三区| av国产熟妇露脸在线观看| 少妇人妻中出中文字幕| 男人的天堂的视频东京热| 婷婷色香五月综合激激情| 麻豆最新出品国产精品| 国产一区二区三区四区中文| 日本免费一本一二区三区| 男人操女人下面国产剧情| 亚洲精品av少妇在线观看| 91精品国产品国语在线不卡| 国产丝袜美女诱惑一区二区| 激情丁香激情五月婷婷| 91欧美一区二区三区|