聯(lián)合聚類算法研究及應(yīng)用
發(fā)布時間:2018-03-18 18:34
本文選題:聯(lián)合聚類 切入點:非負(fù)矩陣分解 出處:《浙江大學(xué)》2012年碩士論文 論文類型:學(xué)位論文
【摘要】:聚類分析技術(shù)以研究對象之間的相似性為基礎(chǔ),將具有類似模式的對象在茫茫的數(shù)據(jù)集中聚集成多個不同的類。多年來,聚類分析是被國內(nèi)外專家學(xué)者深入研究和學(xué)習(xí),提出了很多優(yōu)秀的方法,取得了很多不錯的成果,使得這項效果顯著、功能強大的數(shù)據(jù)挖掘分析技術(shù)得到了很大的發(fā)展。近年來,隨著計算機技術(shù)的日新月異,互聯(lián)網(wǎng)行業(yè)的飛速發(fā)展,數(shù)據(jù)信息越來越豐富,規(guī)模越來越龐大,人們逐漸的發(fā)現(xiàn),傳統(tǒng)的基于單一類型的聚類技術(shù)由于其自身存在的伸縮性能較差、處理多類型數(shù)據(jù)能力匱乏等缺點,已經(jīng)越來越不能滿足用戶的需求。在這樣的背景下,針對二類型乃至多類型數(shù)據(jù)的聯(lián)合聚類技術(shù)應(yīng)運而生。 多類型聯(lián)合聚類技術(shù)近年來吸引了越來越多的眼球,這項技術(shù)應(yīng)用廣泛,能在基因分析、搜索引擎、電子商務(wù)等多個領(lǐng)域發(fā)揮極大的作用,但其發(fā)展仍然有很大局限性和不成熟性。本文就此課題開展研究,主要做了四方面的工作:(1)簡單的介紹了聚類分析技術(shù)的歷史背景、研究意義以及國內(nèi)外的研究現(xiàn)狀,深入分析已有的聚類分析技術(shù)的發(fā)展情況,仔細(xì)剖析了這些技術(shù)的優(yōu)缺點。(2)基于對這些已有的優(yōu)秀的聚類技術(shù)的分析和理解,本文建立了一種基于EM迭代更新的非負(fù)矩陣分解(Tri-NMF)的模型,該模型結(jié)合了復(fù)雜譜圖劃分原理以及基于準(zhǔn)則劃分原理的長處,同時加入權(quán)重調(diào)整因子,使得模型在綜合了兩者優(yōu)點的同時又能針對不同的數(shù)據(jù)進(jìn)行靈活的調(diào)整。(3)在此模型的理論基礎(chǔ)之上,建立了一套基于Tri-NMF模型的聯(lián)合聚類算法族,囊括了二類型乃至多類型數(shù)據(jù)聯(lián)合聚類的硬分析方法和軟分析方法。(4)為了驗證系統(tǒng)的有效性和實用性,本文抽取了兩個標(biāo)準(zhǔn)數(shù)據(jù)集進(jìn)行了充分細(xì)致的實驗。實驗結(jié)果顯示,在準(zhǔn)確率(AC)和歸一化互信息(NMI)兩個經(jīng)典的被廣泛采用作為聚類分析技術(shù)衡量指標(biāo)的表現(xiàn)上,本文提出的聯(lián)合聚類方法族都要優(yōu)于其他幾種已有的優(yōu)秀的聚類分析技術(shù)。這些都證明了本文提出的基于Tri-NMF模型的聯(lián)合聚類算法族的有效性和正確性,以及良好的伸縮性能,因此具有很好的實用價值和應(yīng)用前景。
[Abstract]:Clustering analysis technology is based on the similarity between the research objects, the objects with similar patterns are clustered into different classes in the vast data set. For many years, clustering analysis has been deeply studied and studied by experts and scholars at home and abroad. Many excellent methods have been put forward, and many good results have been achieved, which make the technology of data mining and analysis with remarkable effect and powerful function have been greatly developed. In recent years, with the rapid development of computer technology, With the rapid development of the Internet industry, the data information is more and more abundant and the scale is more and more large. People have gradually found that the traditional clustering technology based on single type has poor scalability because of its own existence. The lack of ability to deal with multiple types of data has become increasingly unable to meet the needs of users. In this context, the United clustering technology for two types and even multiple types of data emerged as the times require. Multi-type combined clustering technology has attracted more and more attention in recent years. This technology is widely used and can play a great role in many fields, such as gene analysis, search engine, electronic commerce and so on. However, the development of cluster analysis is still limited and immature. In this paper, four aspects of research are carried out, including the historical background of cluster analysis, the significance of cluster analysis, and the current research situation at home and abroad. Based on the analysis and understanding of these excellent clustering techniques, the advantages and disadvantages of these techniques are analyzed. In this paper, a non-negative matrix decomposition Tri-NMF-based model based on EM iteration is established. The model combines the advantages of the principle of complex spectral graph partitioning and the principle of criterion partitioning, and adds a weight adjustment factor. The model not only integrates the advantages of the two methods, but also adjusts the different data flexibly. On the basis of the theory of this model, a set of joint clustering algorithms based on Tri-NMF model is established. In order to verify the effectiveness and practicability of the system, two standard data sets are extracted for detailed experiments. In terms of accuracy rate (AC) and normalized mutual information (NMI), two classical methods are widely used as indicators for cluster analysis. The proposed joint clustering method family is superior to other excellent clustering analysis techniques, which prove the validity and correctness of the proposed joint clustering algorithm family based on Tri-NMF model, and its good scalability. Therefore, it has good practical value and application prospect.
【學(xué)位授予單位】:浙江大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2012
【分類號】:TP311.13
【引證文獻(xiàn)】
相關(guān)碩士學(xué)位論文 前1條
1 張秀秀;基于圖像服裝檢索系統(tǒng)設(shè)計與實現(xiàn)[D];電子科技大學(xué);2013年
,本文編號:1630800
本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/1630800.html
最近更新
教材專著