天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當前位置:主頁 > 科技論文 > 軟件論文 >

歸一化高維數(shù)據(jù)降維與可視化研究

發(fā)布時間:2018-04-13 19:37

  本文選題:數(shù)據(jù)嵌入 + 數(shù)據(jù)可視化; 參考:《北京郵電大學》2016年碩士論文


【摘要】:本論文主要研究降維算法,這種降維算法不僅專門針對被歸一化的高維數(shù)據(jù),還可以把這種高維數(shù)據(jù)降維到可以進行可視化的低維空間。在實際的工業(yè)以及科研中,為了對高維數(shù)據(jù)的聚類情況進行有效且直觀的分析和展示,直接對高維數(shù)據(jù)可視化是一個有效又便捷的方法,也即使用散點圖,圖上的每一個點對應每一個高維數(shù)據(jù),這樣可以直觀的展現(xiàn)數(shù)據(jù)的分布情況,甚至是聚類情況。但是能夠直接可視化的數(shù)據(jù)的維度一般要求不超過3維,所以針對高維數(shù)據(jù)的可視化,降維是一種有效的方法。另外,降維的實質(zhì)是讓高維空間中的數(shù)據(jù)的結構,盡量的接近被映射到的低維空間中的數(shù)據(jù)的結構,所以,降維算法必須要考慮數(shù)據(jù)的分布結構,比如常見的被歸一化的數(shù)據(jù),這一類數(shù)據(jù)或者是分布在超平面或者是超球面上,則針對于這種歸一化高維數(shù)據(jù)的降維算法,如果想取得更好的降維效果,則必須要讓降維算法針對歸一化的數(shù)據(jù)的結構進行專門優(yōu)化。直至今日,已經(jīng)有許多針對高維數(shù)據(jù)的低維嵌入的方法,這些方法甚至可以有效的進行數(shù)據(jù)的可視化操作,比如t-SNE算法,這種算法的一個基礎假設是數(shù)據(jù)分布在一個不受限的歐式空間中,且某一鄰域內(nèi)數(shù)據(jù)分布符合高斯分布。然而在實際應用中,數(shù)據(jù)經(jīng)常分布在受限空間,其分布形態(tài)很難用高斯分布來模擬。例如,對于高維超球面數(shù)據(jù)(L2歸一化的數(shù)據(jù)),則vMF(vonMises-Fisher)分布是比高斯分布更好的描述方法;而針對超平面數(shù)據(jù)(L1歸一化的數(shù)據(jù)),則dirichlet(狄利克雷)分布是更好的描述方法;诖,本文提出兩種基于vMF分布和dirichlet分布的數(shù)據(jù)嵌入方法。因為只要數(shù)據(jù)的維度不超過3維,則一定可以進行如前所述的可視化,而這種畫圖的方法不具有較高的科研價值;而研究能夠把歸一化的高維數(shù)據(jù)映射到不超過3維的低維空間的降維算法才具有一定的科研價值,才是本論文的研究重點。所以本論文不專門介紹可視化的方法,而直接研究分析這種適合針對歸一化的高維數(shù)據(jù)進行可視化的降維算法。論文的主要工作內(nèi)容包括:1、分析傳統(tǒng)的針對高維數(shù)據(jù)進行低維嵌入的算法,尤其是針對目前效果較好的t-SNE算法,詳細分析其相對于其它傳統(tǒng)方法的優(yōu)勢,以及在處理“受限空間”分布的數(shù)據(jù)的缺陷。2、針對超球面分布的數(shù)據(jù),提出一種基于vMF分布進行數(shù)據(jù)描述的新嵌入算法:vMF-SNE算法。分析這種算法的執(zhí)行過程,并從實驗上對比t-SNE算法。3、針對超平面分布的數(shù)據(jù),提出一種基于dirichlet分布進行數(shù)據(jù)描述的新嵌入算法:dirichlet-SNE算法。同樣分析其執(zhí)行過程,并從實驗上對比t-SNE算法。本論文針對兩種歸一化的高維數(shù)據(jù),研究兩種新的適合可視化的降維算法,并從實驗上對比當下較好的t-SNE算法,分析得出這兩種算法的優(yōu)勢,對于理論和應用都具有一定價值。
[Abstract]:This paper mainly studies the dimensionality reduction algorithm which not only aims at the normalized high-dimensional data but also reduces the dimension of the high-dimensional data to a low dimensional space which can be visualized.In the actual industry and scientific research, in order to analyze and display the clustering of high-dimensional data effectively and intuitively, it is an effective and convenient method to visualize the high-dimensional data directly, that is, to use scattered plot.Each point on the graph corresponds to each high-dimensional data, which can show the distribution of the data directly, even the clustering situation.But the dimensionality of data that can be visualized directly is generally not more than 3 dimensional, so dimension reduction is an effective method for visualization of high dimensional data.In addition, the essence of dimensionality reduction is to make the structure of the data in the high-dimensional space as close as possible to the structure of the data mapped to the low-dimensional space. Therefore, the dimensionality reduction algorithm must consider the distribution structure of the data, such as the common normalized data.This kind of data is distributed on the hyperplane or hypersphere, then the dimensionality reduction algorithm for the normalized high-dimensional data, if you want to obtain better dimensionality reduction effect,The dimensionality reduction algorithm must be specially optimized for the structure of normalized data.Up to now, there are many low-dimensional embedding methods for high-dimensional data, which can even be used to visualize data, such as t-SNE algorithm.One of the basic assumptions of this algorithm is that the data is distributed in an unconstrained Euclidean space and the data distribution in a neighborhood conforms to Gao Si's distribution.However, in practical application, data is often distributed in restricted space, and its distribution form is difficult to be simulated by Gao Si distribution.For example, for high dimensional hyperspherical data, the vMFN Mises-Fisher distribution is a better description method than Gao Si distribution, while for the hyperplane data with L1 normalized data, the Dirichlet distribution is a better description method.Based on this, two data embedding methods based on vMF distribution and dirichlet distribution are proposed.As long as the dimension of the data is not more than 3 dimensions, we can visualize the above mentioned data, and this method of drawing does not have high scientific research value.The research on dimensionality reduction algorithm which can map normalized high-dimensional data to low-dimensional space with no more than three dimensions has certain scientific research value and is the focus of this paper.Therefore, this paper does not specially introduce the visualization method, but directly studies and analyzes the dimensionality reduction algorithm which is suitable for the visualization of normalized high-dimensional data.The main work of this paper includes: 1, analyzing the traditional algorithm of low-dimensional embedding for high-dimensional data, especially for the t-SNE algorithm, which has good effect at present, and analyzing in detail its advantages over other traditional methods.For the data of hypersphere distribution, a new embedding algorithm named:: vMF-SNE algorithm is proposed to describe the data based on vMF distribution.The execution process of this algorithm is analyzed and compared with t-SNE algorithm .3experimentally. A new embedding algorithm named: Drichlet-SNE algorithm based on dirichlet distribution is proposed to describe the data of hyperplane distribution.At the same time, the execution process is analyzed, and the t-SNE algorithm is compared experimentally.In this paper, two new dimensionality reduction algorithms suitable for visualization are studied for two kinds of normalized high-dimensional data, and the advantages of these two algorithms are analyzed and compared with the better t-SNE algorithm in experiments, which are valuable for both theory and application.
【學位授予單位】:北京郵電大學
【學位級別】:碩士
【學位授予年份】:2016
【分類號】:TP311.13

【參考文獻】

相關期刊論文 前1條

1 徐克龍;;淺談矩陣的特征向量特征值的意義[J];科技創(chuàng)新與應用;2013年30期



本文編號:1745910

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/1745910.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權申明:資料由用戶c6df6***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com