大規(guī)模數(shù)據(jù)場景下的有監(jiān)督(遷移)聚類技術(shù)研究
發(fā)布時間:2018-03-08 20:40
本文選題:聚類算法 切入點(diǎn):模糊C均值 出處:《江南大學(xué)》2017年博士論文 論文類型:學(xué)位論文
【摘要】:人工智能經(jīng)過60多年的發(fā)展已經(jīng)取得了巨大進(jìn)步,作為人工智能領(lǐng)域中最活躍分支之一的機(jī)器學(xué)習(xí)也相應(yīng)地得到快速發(fā)展。聚類作為一種有效的數(shù)據(jù)分析方法和工具,一直以來,在學(xué)術(shù)界和工業(yè)界受到廣泛關(guān)注和應(yīng)用。然而,隨著科學(xué)技術(shù)的不斷發(fā)展和計(jì)算機(jī)技術(shù)的廣泛應(yīng)用,新的問題和挑戰(zhàn)不斷涌現(xiàn),其中遷移場景下的聚類和大規(guī)模數(shù)據(jù)場景下的聚類是目前面臨的兩個突出問題。本研究課題主要關(guān)注的是上述兩個場景下的聚類問題。我們在研究傳統(tǒng)聚類方法時發(fā)現(xiàn),直接使用傳統(tǒng)聚類方法對遷移應(yīng)用場景和大規(guī)模數(shù)據(jù)場景下的數(shù)據(jù)執(zhí)行聚類任務(wù)時,往往不能獲得理想的聚類性能或者有時甚至無法運(yùn)行相關(guān)算法。其面臨的常見挑戰(zhàn)是:1)在遷移場景中,由于行業(yè)建立之初往往無數(shù)據(jù)積累或者采集到的數(shù)據(jù)樣本量不足,亦或者由于采集設(shè)備的不穩(wěn)定等因素導(dǎo)致采集到的數(shù)據(jù)樣本受到了污染,在這樣的情況下,如果直接使用傳統(tǒng)的聚類算法,常常導(dǎo)致聚類性能不穩(wěn)定甚至失效。2)在大規(guī)模數(shù)據(jù)場景中,由于要處理的數(shù)據(jù)樣本量大,而用于處理的機(jī)器內(nèi)存有限,不能一次裝載所有要處理的數(shù)據(jù),直接導(dǎo)致不能使用傳統(tǒng)的聚類算法來對該數(shù)據(jù)進(jìn)行處理分析。為了解決傳統(tǒng)聚類算法應(yīng)用到上述兩種新興應(yīng)用場景時所面臨的問題,本研究課題以經(jīng)典模糊聚類算法為基礎(chǔ),以遷移應(yīng)用場景和大規(guī)模數(shù)據(jù)應(yīng)用場景為切入點(diǎn),對相關(guān)算法進(jìn)行改造和重構(gòu)使其適應(yīng)新應(yīng)用場景的需求。主要內(nèi)容安排如下:(1)第二章節(jié)至第四章節(jié)重點(diǎn)研究遷移應(yīng)用場景下的模糊聚類算法改造和應(yīng)用。其中第二章節(jié)至第三章節(jié)探討的是對經(jīng)典模糊聚類算法的改造和重構(gòu);第四章節(jié)討論的是知識遷移在具體的圖像分割應(yīng)用中的使用。具體來說,第二章節(jié)是在模糊C均值(FCM)聚類算法的基礎(chǔ)上,對其目標(biāo)函數(shù)進(jìn)行修改,提出了一個全新的PPKTFCM聚類算法。該算法同時滿足兩個規(guī)則:樣本點(diǎn)與歷史類中心點(diǎn)距離和極小規(guī)則和隸屬度變化極小規(guī)則,由于兩個規(guī)則的應(yīng)用使得該新算法具有了知識遷移的功能,進(jìn)而提高了其聚類性能。第三章節(jié)是在極大熵聚類算法(MECA)的基礎(chǔ)上,同時加入兩個新的約束規(guī)則:隸屬度重要程度受約束規(guī)則和聚類中心點(diǎn)變化最小規(guī)則,產(chǎn)生了新的基于極大熵的知識遷移模糊聚類MEKTFCA算法。由于知識遷移的應(yīng)用,提高了其在樣本量不足和樣本受到污染場景下的聚類性能。第四章節(jié)是通過修改經(jīng)典FCM算法的目標(biāo)函數(shù)產(chǎn)生新的目標(biāo)函數(shù),使新的目標(biāo)函數(shù)中增加了能夠吸收空間鄰居知識能力的正則項(xiàng)。由于該正則項(xiàng)的加入提高了新算法在圖像分割應(yīng)用中的魯棒性。(2)第五章節(jié)至第六章節(jié)重點(diǎn)研究了大規(guī)模數(shù)據(jù)應(yīng)用場景下的模糊聚類算法改造和重構(gòu)。其中第五章節(jié)參考了經(jīng)典的基于增量式處理的歷史在線模糊C代表點(diǎn)聚類算法(HOFCMD)和在線模糊C代表點(diǎn)聚類算法(OFCMD)的運(yùn)行原理,但改進(jìn)了這兩種算法只使用單個代表點(diǎn)表示一個類時的不足,提出了應(yīng)用于大規(guī)模數(shù)據(jù)場景的增量式多代表點(diǎn)模糊聚類MMFCA算法。該算法通過多個代表點(diǎn)使得每個聚類信息更加豐富,同時在聚類過程中考慮歷史聚類點(diǎn)對之間的約束關(guān)系,進(jìn)而提高了新提出的MMFCA算法的聚類性能。第六章節(jié)是受OFCMD和FC-QR算法思想的啟發(fā)。提出了具有加權(quán)代表性,二次正則化和成對約束三重優(yōu)化機(jī)制的基于多代表點(diǎn)的大規(guī)模數(shù)據(jù)模糊聚類LS-FMMdC算法。該多重優(yōu)化機(jī)制和多代表點(diǎn)的使用貢獻(xiàn)了最終LS-FMMdC算法在聚類性能上的提高。需要說明的是,第五章節(jié)和第六章節(jié)重點(diǎn)探討的是大規(guī)模數(shù)據(jù)應(yīng)用場景下的聚類問題。其中在處理大規(guī)模數(shù)據(jù)集時使用的是數(shù)據(jù)分塊技術(shù),在處理數(shù)據(jù)塊時包含著先前數(shù)據(jù)塊獲得的知識遷移到后續(xù)數(shù)據(jù)塊的機(jī)制。所以,該兩章節(jié)是大規(guī)模數(shù)據(jù)場景和遷移場景的綜合研究。
[Abstract]:Artificial intelligence after 60 years of development has made great progress in the field of artificial intelligence, as one of the most active branch of machine learning has been the rapid development of the cluster. As a kind of effective data analysis methods and tools, has attracted widespread attention and application in academic and industrial circles. However, with the wide application of the continuous development of computer technology and science and technology, new problems and challenges continue to emerge, including clustering scenarios and large-scale data migration scenarios are two prominent problems faced. This research is mainly about the clustering problem of the two scenarios. We found in the study of traditional clustering method the direct use of traditional clustering methods, perform clustering tasks on migration scenarios and massive data scene data, clustering can not get ideal to Or sometimes even unable to run the algorithm. The common challenges facing it is: 1) in the migration of the scene, due to the beginning of the establishment of the industry often no accumulation of data or data collected by the insufficient sample, or due to instability and other factors lead to acquisition equipment collected data samples were contaminated, in this the case, if the direct use of the traditional clustering algorithm, the clustering performance often leads to instability and failure of.2) in large scale data in the scene, due to the large amount of data processing, and for processing machines with limited memory, can not load all the data to be processed once, using traditional clustering algorithms can not directly lead to analyzing the data. In order to solve the problems in traditional clustering algorithm is applied to the two emerging application scenarios of this research subject in classical fuzzy clustering algorithm based on, To migrate the application scenarios and large data applications as the starting point, and reconstructed the correlation algorithm to adapt to the new application scenarios. The main contents are as follows: (1) study section to fourth chapters focus on second migration scenarios fuzzy clustering algorithm. The transformation and application of second chapter to the third chapter. The reform and reconstruction of classical fuzzy clustering algorithm; the fourth chapter is the application of knowledge transfer in segmentation using the specific image. Specifically, the second chapter is the fuzzy C means (FCM) clustering algorithm based on modification of the objective function, we propose a new PPKTFCM clustering algorithm. The two rule of the algorithm at the same time: the sample and the history class center distance and minimum rules and membership changes are minimal rules, due to the application of the two rules of the new algorithm has The knowledge transfer function, so as to improve the clustering performance. The third chapter is the maximum entropy clustering algorithm (MECA) based on the addition of two new rules: membership degree constraint rules and clustering minimum change rules, generating new based on maximum entropy fuzzy knowledge transfer the MEKTFCA clustering algorithm. Due to the application of knowledge transfer, improve the clustering performance of pollution scenarios by the insufficient sample and the sample. The fourth chapter is to produce a new objective function in the objective function to modify the classical FCM algorithm, the new target function is added in the regularization term to absorb knowledge and ability. Due to spatial neighbor the regularization improves the new algorithm in the application of image segmentation in robustness. (2) the fifth chapter to the sixth chapter focuses on the fuzzy clustering algorithm for large data application scenarios. The fifth chapter and reconstruction. With reference to the classic history of online incremental processing based on fuzzy C point clustering algorithm (HOFCMD) and online fuzzy C representative point clustering algorithm (OFCMD) operation principle, but the improvement of the two algorithms using only a single representative points to represent a class of problems, put forward the application of incremental in the massive data scene representative points fuzzy clustering MMFCA algorithm. The algorithm through a number of representative points so that each cluster more abundant information, considering the constraints of history clustering between pairs of points in the process of clustering, and then improve the clustering performance of the new algorithm MMFCA. The sixth chapter is inspired by OFCMD and FC-QR algorithm the representative is put forward. The weighted fuzzy clustering algorithm for large data, LS-FMMdC two regularization and pairwise constraints optimization mechanism based on three representative points. The multiple optimization The use of mechanisms and representative points contribute to the final LS-FMMdC algorithm in clustering performance improvement. That is, fifth chapters and sixth chapters focus on the discussion of large-scale clustering application scenarios. The use in handling large data sets of data block technology in processing the data block contains the previous data block to obtain knowledge migration to follow-up mechanism data blocks. Therefore, the two chapter is a comprehensive study of large-scale data migration and the scene of the scene.
【學(xué)位授予單位】:江南大學(xué)
【學(xué)位級別】:博士
【學(xué)位授予年份】:2017
【分類號】:TP311.13
【參考文獻(xiàn)】
相關(guān)期刊論文 前6條
1 鄧趙紅;張江濱;蔣亦樟;史熒中;王士同;;基于模糊子空間聚類的0階L2型TSK模糊系統(tǒng)[J];電子與信息學(xué)報(bào);2015年09期
2 趙鳳;劉漢強(qiáng);范九倫;;基于互補(bǔ)空間信息的多目標(biāo)進(jìn)化聚類圖像分割[J];電子與信息學(xué)報(bào);2015年03期
3 趙雪梅;李玉;趙泉華;;結(jié)合高斯回歸模型和隱馬爾可夫隨機(jī)場的模糊聚類圖像分割[J];電子與信息學(xué)報(bào);2014年11期
4 魯偉明;杜晨陽;魏寶剛;沈春輝;葉振超;;基于MapReduce的分布式近鄰傳播聚類算法[J];計(jì)算機(jī)研究與發(fā)展;2012年08期
5 ;Inductive transfer learning for unlabeled target-domain via hybrid regularization[J];Chinese Science Bulletin;2009年14期
6 孫吉貴;劉杰;趙連宇;;聚類算法研究[J];軟件學(xué)報(bào);2008年01期
,本文編號:1585465
本文鏈接:http://sikaile.net/shoufeilunwen/xxkjbs/1585465.html
最近更新
教材專著