天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁(yè) > 碩博論文 > 信息類博士論文 >

半監(jiān)督聚類算法對(duì)于流和多密度數(shù)據(jù)

發(fā)布時(shí)間:2020-09-02 10:35
   Clustering is one of the most common data mining tasks, used frequently for data categorization and analysis in both industry and academia. In many domains where clustering is applied, some prior knowledge is available either in the form of labeled data(specifying the category to which an instance belongs) or pairwise constraints on some of the instances(specifying whether two instances should be in same or different clusters). The focus of our research is on semisupervised clustering, where we study how prior knowledge can be incorporated into clustering algorithms.Semi-supervised clustering aims to improve the clustering performance by considering user supervision in the form of pairwise constraints. However, most current algorithms are passive in the sense that pairwise constraints are provided beforehand and selected randomly. This may lead to the use of constraints that are redundant, unnecessary, or even harmful to the clustering results. For those reasons, we would like to optimize the selection of the constraints for semisupervised clustering. Moreover, semi-supervised clustering algorithms imposes several challenges to be addressed, such as dealing with multi-density data, how to handle the evolving patterns that are important characteristics of streaming data with dynamic distributions, capable of performing fast and incremental processing of data objects, and suitably addressing time and memory limitations.In this thesis, we consider three main contributions. The first contribution of this thesis, we consider batch-mode active learning for semi-supervised clustering algorithms in an iterative manner. First, we select a batch of informative query instances such that the distribution represented by the selected query set and the available labeled data is closest to the distribution represented by the unlabeled data. Then, we query them with the existing neighborhoods to determine which neighborhood they belong. The experimental results with state-of-the-art methods on different real world dataset demonstrate the effectiveness and efficiency of the proposed method.In the second contribution of this thesis, we address the problem of streaming data. Data stream mining is an active research area that has recently emerged to discover knowledge from large amounts of continuously generated data. We propose an algorithm that extending Affinity Propagation(AP) to handle evolving data steam with dynamic distributions. We present a semisupervised clustering technique(SSAPStream) that incorporates labeled exemplars into the APalgorithm to deal with changes in the data distribution, which requires the stream model to be updated as soon as possible. The experimental results on synthetic and real data sets validate the effectiveness of our algorithm in handling dynamically evolving data streams. Also, we study the execution time and memory usage of SSAPStream, which are important efficiency factors for streaming algorithms.The third contribution of this thesis addresses the problem of clustering multi-density data and arbitrary shapes. Density-based clustering methods are the most important due to their high ability to detect arbitrary shaped clusters. Existing methods are based on DBSCAN which is a typical density-based clustering algorithm and its clustering performance depends on two specified parameters(Eps and Minpts) that define a single density. Most of existing methods are unsupervised, which cannot utilize the small number of prior knowledge. We propose a semisupervised clustering(called Semi Den) algorithm that discovers clusters of different densities and arbitrary shapes. The idea of the proposed algorithm is to partition the dataset into different density levels and compute the density parameters for each density level set. Then, use the pairwise constraints for expanding the clustering process based on the computed density parameters. Evaluating Semi Den algorithm on both synthetic and real datasets confirms that the proposed algorithm gives better results than other semi-supervised and unsupervised density based approaches.
【學(xué)位單位】:北京理工大學(xué)
【學(xué)位級(jí)別】:博士
【學(xué)位年份】:2015
【中圖分類】:TP311.13

【相似文獻(xiàn)】

相關(guān)期刊論文 前10條

1 YODJAIPHET Anusorn;THEERA-UMPON Nipon;AUEPHANWIRIYAKUL Sansanee;;Instance reduction for supervised learning using input-output clustering method[J];Journal of Central South University;2015年12期

2 HU LuanYun;CHEN YanLei;XU Yue;ZHAO YuanYuan;YU Le;WANG Jie;GONG Peng;;A 30 meter land cover mapping of China with an efficient clustering algorithm CBEST[J];Science China(Earth Sciences);2014年10期

3 Amineh Amini;Teh Ying Wah;Hadi Saboohi;;On Density-Based Data Streams Clustering Algorithms: A Survey[J];Journal of Computer Science & Technology;2014年01期

4 岳士弘,李平,郭繼東,周水庚;A statistical information-based clustering approach in distance space[J];Journal of Zhejiang University Science A(Science in Engineering);2005年01期

5 ;DCAD:a Dual Clustering Algorithm for Distributed Spatial Databases[J];Geo-Spatial Information Science;2007年02期

6 DENG Min;LIU QiLiang;WANG JiaQiu;SHI Yan;;A general method of spatio-temporal clustering analysis[J];Science China(Information Sciences);2013年10期

7 ;Comparison of Supervised Clustering Methods for the Analysis of DNA Microarray Expression Data[J];Agricultural Sciences in China;2008年02期

8 WANG Jindong;HE Jiajing;ZHANG Hengwei;YU Zhiyong;;CSFW-SC: Cuckoo Search Fuzzy-Weighting Algorithm for Subspace Clustering Applying to High-Dimensional Clustering[J];中國(guó)通信;2015年S2期

9 李風(fēng)環(huán);Zhao Zongfei;Wang Zhenyu;;Hierarchical clustering based on single-pass for breaking topic detection and tracking[J];High Technology Letters;2018年04期

10 XIE Naiming;SU Bentao;CHEN Nanlei;;Construction mechanism of whitenization weight function and its application in grey clustering evaluation[J];Journal of Systems Engineering and Electronics;2019年01期

相關(guān)會(huì)議論文 前10條

1 ;A Semi-supervised Clustering Algorithm Based on Rough Reduction[A];2009中國(guó)控制與決策會(huì)議論文集(3)[C];2009年

2 Ping Zhou;Jiayin Wei;Yongbin Qin;;A Semi-Supervised Text Clustering Algorithm with Word Distribution Weights[A];2013教育技術(shù)與信息系統(tǒng)國(guó)際會(huì)議論文集[C];2013年

3 ;A Hybrid Clustering Algorithm Based on Grid Density and Rough Sets[A];第二十七屆中國(guó)控制會(huì)議論文集[C];2008年

4 ;A Novel Supervised Multi-model Modeling Method Based on k-means Clustering[A];Proceedings of 2010 Chinese Control and Decision Conference[C];2010年

5 Aoran Xu;Tao Yang;Jianwei Ji;Yang Gao;;Application of fuzzy clustering algorithm in the evaluation of abandoned wind power[A];第30屆中國(guó)控制與決策會(huì)議論文集(4)[C];2018年

6 ;Mining Cluster-Defining Actionable Rules[A];第二十一屆中國(guó)數(shù)據(jù)庫(kù)學(xué)術(shù)會(huì)議論文集(技術(shù)報(bào)告篇)[C];2004年

7 ;Clustering Analysis with Information System Approaches[A];Proceedings of the 2011 Chinese Control and Decision Conference(CCDC)[C];2011年

8 ;A hybrid of fuzzy-link clustering and classification for seismic data[A];第六屆(2011)中國(guó)管理學(xué)年會(huì)——商務(wù)智能分會(huì)場(chǎng)論文集[C];2011年

9 武丁明;古槿;張奇?zhèn)?;A new gene network clustering algorithm based on minimum spanning tree[A];第四屆全國(guó)生物信息學(xué)與系統(tǒng)生物學(xué)學(xué)術(shù)大會(huì)論文集[C];2010年

10 Lin Hou;Lin Wang;Arthur Berg;Minping Qian;Yunping Zhu;Fangting Li;鄧明華;;Comparison and evaluation of network clustering algorithms applied to genetic interaction networks[A];第五屆全國(guó)生物信息學(xué)與系統(tǒng)生物學(xué)學(xué)術(shù)大會(huì)論文集[C];2012年

相關(guān)博士學(xué)位論文 前6條

1 阿特瓦(Walid Said Abdelhamid Atwa);半監(jiān)督聚類算法對(duì)于流和多密度數(shù)據(jù)[D];北京理工大學(xué);2015年

2 Muhammad Zia-ur-Rehman;動(dòng)態(tài)數(shù)據(jù)流挖掘關(guān)鍵技術(shù)研究[D];西南交通大學(xué);2014年

3 Amjad Mahmood;半監(jiān)督進(jìn)化集成及其在網(wǎng)絡(luò)視頻分類中的應(yīng)用[D];西南交通大學(xué);2015年

4 許振浩;拓?fù)涔芫W(wǎng)法地下水模擬研究與工程應(yīng)用[D];山東大學(xué);2013年

5 Naser Farajzadeh;基于超概率編碼的多類分類器[D];浙江大學(xué);2013年

6 魏立梅;聚類分析新方法的研究與應(yīng)用[D];西安電子科技大學(xué);1998年

相關(guān)碩士學(xué)位論文 前10條

1 白明雪;詞匯聚合對(duì)高中英語(yǔ)詞匯學(xué)習(xí)影響的實(shí)驗(yàn)研究[D];河北師范大學(xué);2019年

2 穆罕默德奧馬爾法魯克(Muhammad Omer Farooq);基于明星的視頻人臉驗(yàn)證和聚類算法研究[D];哈爾濱工業(yè)大學(xué);2017年

3 戴維斯;移動(dòng)約束群組AdHoc網(wǎng)絡(luò)研究[D];華中科技大學(xué);2009年

4 Iakovleva Tatiana;[D];北京理工大學(xué);2016年

5 Mazen Hassan Hodeib;[D];湖南大學(xué);2007年

6 Nassir Abdullah Nassir(那西爾);[D];中南大學(xué);2012年

7 ISRAR KHAN;[D];北京郵電大學(xué);2016年

8 Tanakrit Wongwitit;[D];哈爾濱工程大學(xué);2012年

9 ZAKOUNI AMIYNE(阿米);[D];中南大學(xué);2012年

10 徐建鵬;高維局部共表達(dá)模式挖掘算法的研究[D];哈爾濱工業(yè)大學(xué);2009年



本文編號(hào):2810488

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/shoufeilunwen/xxkjbs/2810488.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶bbd90***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com