天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁 > 科技論文 > 數(shù)學(xué)論文 >

基于變分求解的有監(jiān)督狄利克雷過程混合主成分分析

發(fā)布時間:2018-07-07 23:33

  本文選題:狄利克雷過程 + 混合模型; 參考:《中山大學(xué)》2015年碩士論文


【摘要】:狄利克雷過程混合模型(DPM)與傳統(tǒng)的有限混合模型相比,能夠解決簇個數(shù)未知的問題,并且可隨著數(shù)據(jù)規(guī)模的增長自適應(yīng)地調(diào)整簇的數(shù)量,因此近年來得到了廣泛的應(yīng)用。有監(jiān)督狄利克雷過程混合模型(SDPM)通過將DPM與有監(jiān)督學(xué)習(xí)模型相結(jié)合,使有監(jiān)督學(xué)習(xí)中的協(xié)變量和響應(yīng)值的聯(lián)合分布可以通過狄利克雷過程來非參數(shù)地建模,在每個簇中都學(xué)習(xí)出對應(yīng)的局部專家模型。當(dāng)簇的個數(shù)大于一時,線性有監(jiān)督模型將變成全局非線性的,這拓展了線性模型的學(xué)習(xí)能力并提高了模型的靈活性。然而,由于上述模型是直接根據(jù)協(xié)變量來對模型進行訓(xùn)練的,當(dāng)特征維數(shù)較高時會遭遇維數(shù)災(zāi)難的問題。為了解決這個問題,本文提出在SDPM中引入概率主成分分析(PPCA),形成有監(jiān)督狄利克雷過程混合主成分分析模型(SDPM-PCA)。PPCA作為常用的降維算法,通過將高維數(shù)據(jù)投影到低維隱空間,能夠有效提升模型的訓(xùn)練速度并且避免過擬合情況的發(fā)生。SDPM-PCA假設(shè)模型中的協(xié)變量以及響應(yīng)變量是由PPCA中處于低維隱空間的隱變量獨立產(chǎn)生的,并使用狄利克雷過程來非參數(shù)地建模。通過將聚簇、有監(jiān)督學(xué)習(xí)以及降維這三個過程進行聯(lián)合學(xué)習(xí),SDPM-PCA可以在每個簇中進行局部降維,然后在低維隱空間中訓(xùn)練局部有監(jiān)督模型,從而在避免維數(shù)災(zāi)難的同時提升降維效果,以及提高模型在低維空間上的預(yù)測性能。本文基于變分推斷法來對SDPM-PCA進行近似求解,相對于基于蒙特卡洛模擬的采樣算法,能夠提供更快的訓(xùn)練速度以及確定性的近似解,為模型在高維數(shù)據(jù)場景下的應(yīng)用提供了可行性。最后,本文將SDPM-PCA在回歸問題上根據(jù)貝葉斯線性回歸模型進行實例化,使用多組真實世界數(shù)據(jù)進行了實驗測試并與SDPM及其他常用的回歸算法進行對比。實驗結(jié)果表明,通過設(shè)定合適的隱空間維數(shù),SDPM-PCA能提供更好的降維效果,并且通常在處理高維回歸問題時具有更好以及更穩(wěn)定的預(yù)測性能。
[Abstract]:Compared with the traditional finite hybrid model, Drickley process hybrid model (DPM) can solve the problem that the number of clusters is unknown, and can adjust the number of clusters adaptively with the increase of data scale, so it has been widely used in recent years. By combining DPM with supervised learning model, the supervised Drickley process hybrid model (SDPM) enables the joint distribution of covariables and response values in supervised learning to be modeled nonparametric through the Delikley process. The corresponding local expert model is obtained in each cluster. When the number of clusters is larger than 1, the linear supervised model will become globally nonlinear, which extends the learning ability of the linear model and improves the flexibility of the model. However, because the above model is trained directly according to the covariable, the problem of dimension disaster will be encountered when the characteristic dimension is high. To solve this problem, this paper proposes to introduce probabilistic principal component analysis (PPCA) into SDPM to form a supervised Delikler process mixed principal component analysis model (SDPM-PCA) .PPCA as a commonly used dimensionality reduction algorithm, by projecting high-dimensional data into low-dimensional hidden space. It can effectively improve the training speed of the model and avoid the occurrence of over-fitting. The covariables and response variables in the SDPM-PCA hypothesis model are generated independently by the hidden variables in the low-dimensional hidden space of the PPCA. And use the Delikley process to non-parametric modeling. By combining the three processes of clustering, supervised learning and dimensionality reduction, SDPM-PCA can reduce the local dimension in each cluster, and then train the locally supervised model in low-dimensional hidden space. Thus, the dimensionality reduction effect and the prediction performance of the model in low dimensional space can be improved while avoiding the dimensionality disaster. In this paper, the approximate solution of SDPM-PCA is based on variational inference method. Compared with the sampling algorithm based on Monte Carlo simulation, it can provide faster training speed and deterministic approximate solution. It provides the feasibility for the application of the model in the high dimensional data scene. Finally, SDPM-PCA is instantiated on the basis of Bayesian linear regression model in regression problem, and the experiments are carried out using multiple sets of real world data and compared with SDPM and other commonly used regression algorithms. The experimental results show that SDPM-PCA can provide better dimensionality reduction effect by setting appropriate hidden space dimension and usually has better and more stable prediction performance when dealing with high-dimensional regression problems.
【學(xué)位授予單位】:中山大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2015
【分類號】:TP301.6;O212.1

【相似文獻】

相關(guān)期刊論文 前6條

1 許珠香;江弋;;基于潛在狄利克雷分配模型的醫(yī)療數(shù)據(jù)研究[J];廈門大學(xué)學(xué)報(自然科學(xué)版);2013年03期

2 許兩有;許珠香;;潛在狄利克雷分配模型在網(wǎng)絡(luò)日志的應(yīng)用[J];廈門大學(xué)學(xué)報(自然科學(xué)版);2013年04期

3 梁曉毅;狄里可雷空間的循環(huán)性[J];西安科技大學(xué)學(xué)報;2004年03期

4 江雨燕;李平;王清;;用于多標簽分類的改進Labeled LDA模型[J];南京大學(xué)學(xué)報(自然科學(xué)版);2013年04期

5 常彥勛;素數(shù)冪分布定理(英文)[J];北方交通大學(xué)學(xué)報;1999年02期

6 ;[J];;年期

相關(guān)碩士學(xué)位論文 前2條

1 李康;基于變分求解的有監(jiān)督狄利克雷過程混合主成分分析[D];中山大學(xué);2015年

2 梁鎮(zhèn)鋒;基于狄利克雷混合過程半監(jiān)督分類模型研究[D];中山大學(xué);2013年



本文編號:2106644

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/yysx/2106644.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶975ab***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com