基于低秩稀疏子空間的數(shù)據(jù)挖掘算法研究
[Abstract]:High-dimensional data not only has high-dimensional attributes, but also contains a lot of redundancy, noise and outliers, which makes the spatial structure of high-dimensional data more complex. It is not conducive to data mining algorithm using the real association structure of data to build a better model. An important step is to capture the size of association between samples or attributes by learning coefficient matrices. However, the learning process is sensitive to noise and outliers. Sparse learning can make coefficient matrices sparse, i.e. there are large coefficient values between related samples or attributes, and coefficient values between uncorrelated samples or attributes. Because the sparse coefficient matrix is very small or even zero, it can effectively reflect the relationship between data, so that the data mining algorithm can effectively remove redundancy, noise and outliers, thus achieving very good robustness. In addition, high-dimensional data can be represented by a set of low-dimensional subspaces. Therefore, the use of subspace learning to transform complex high-dimensional data space into simple low-dimensional subspace is more conducive to the data mining algorithm to find hidden global structure and local structure in the data, so as to obtain more effective data mining results. Because the rank of matrix becomes larger, data mining algorithms can not capture the real low-rank structure in high-dimensional data. Therefore, the size of rank can be clearly reduced by using low-rank constraints in the learning process of coefficient matrix. For example, the model only uses global structure information or local structure information. A few algorithms can construct the model through more comprehensive structure information. However, they do not combine sparse learning with low rank constraints and subspace learning to obtain complementary structure information in the data, so as to get more effective data mining model. Secondly, the data mining task is divided into several independent steps to complete, even though these independent steps can get the optimal solution of each step in their respective optimization process, but can not ensure that the final solution is the global optimal solution. To overcome the shortcomings of existing data mining algorithms, innovative multiple output regression algorithm and subspace clustering algorithm are proposed to mine high-dimensional data more effectively. The main research results of this paper can be summarized as follows: 1) A low-rank feature based on low-rank constraints and feature selection is proposed. Reduction for multi-output regression, or LFR for short, solves the problem that existing multiple-output regression analysis algorithms do not fully utilize the inherent associations in high-dimensional data. The correlation between attributes and attributes, the correlation between output variables and output variables, and the correlation between training samples and training samples can improve the prediction ability of multiple output regression models. In addition, the regression coefficient matrix is represented by the product of two new matrices with low rank constraints, and the output variables are searched indirectly by low rank constraints on the regression coefficient matrix. In addition, sample selection is carried out by combining l2,1-norm with loss function term to remove the influence of outliers on regression model learning. LFR algorithm has very good ability of multiple output regression prediction.2) A subspace clustering algorithm based on low rank constraints and sparse learning (LSS) is proposed. Line spectrum clustering can not ensure the final solution is the optimal solution, and does not consider learning similarity matrix from the low-dimensional structure of the original data. In the fourth chapter, LSS algorithm is proposed, which combines sparse learning, low rank constraints, sample self-expression and subspace learning to achieve better clustering results. For example, LSS algorithm uses sparse learning to select features of coefficient matrices to remove redundant features and noises, and learns similarity matrices from the original data space and its low-dimensional space respectively, and then optimizes the two matrices in the iterative optimization process, so that the similarity matrix can better reflect the real data phase. In addition, the Laplacian matrix of the similarity matrix is constrained by low rank constraints, so that the best similarity matrix and the best clustering results can be obtained simultaneously in the iterative optimization process. In this paper, sparse learning, low rank constraints and subspace learning are studied, and two new data mining algorithms are proposed to solve the shortcomings of the existing multiple output regression algorithm and subspace clustering algorithm in the field of data mining, which adds new ideas and applications to the research of data mining algorithms. Experiments on real open datasets show that the two algorithms proposed in this paper can achieve very good mining results under various evaluation indicators.
【學(xué)位授予單位】:廣西師范大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:TP311.13
【相似文獻(xiàn)】
相關(guān)期刊論文 前10條
1 陳文鋒;;基于統(tǒng)計(jì)信息的數(shù)據(jù)挖掘算法[J];統(tǒng)計(jì)與決策;2008年15期
2 王清毅,張波,蔡慶生;目前數(shù)據(jù)挖掘算法的評(píng)價(jià)[J];小型微型計(jì)算機(jī)系統(tǒng);2000年01期
3 胡浩紋,魏軍,胡濤;模糊數(shù)據(jù)挖掘算法在人力資源管理中的應(yīng)用[J];計(jì)算機(jī)與數(shù)字工程;2002年05期
4 萬(wàn)國(guó)華,陳宇曉;數(shù)據(jù)挖掘算法及其在股市技術(shù)分析中的應(yīng)用[J];計(jì)算機(jī)應(yīng)用;2004年11期
5 文俊浩,胡顯芝,何光輝,徐玲;小波在數(shù)據(jù)挖掘算法中的運(yùn)用[J];重慶大學(xué)學(xué)報(bào)(自然科學(xué)版);2004年12期
6 鄒志文,朱金偉;數(shù)據(jù)挖掘算法研究與綜述[J];計(jì)算機(jī)工程與設(shè)計(jì);2005年09期
7 趙澤茂,何坤金,胡友進(jìn);基于距離的異常數(shù)據(jù)挖掘算法及其應(yīng)用[J];計(jì)算機(jī)應(yīng)用與軟件;2005年09期
8 趙晨,諸靜;過(guò)程控制中的一種數(shù)據(jù)挖掘算法[J];武漢大學(xué)學(xué)報(bào)(工學(xué)版);2005年05期
9 王振華,柴玉梅;基于決策樹(shù)的分布式數(shù)據(jù)挖掘算法研究[J];河南科技;2005年02期
10 胡作霆;董蘭芳;王洵;;圖的數(shù)據(jù)挖掘算法研究[J];計(jì)算機(jī)工程;2006年03期
相關(guān)會(huì)議論文 前10條
1 賀煒;邢春曉;潘泉;;因果不完備條件下的數(shù)據(jù)挖掘算法[A];第二十二屆中國(guó)數(shù)據(jù)庫(kù)學(xué)術(shù)會(huì)議論文集(技術(shù)報(bào)告篇)[C];2005年
2 劉玲;張興會(huì);;基于神經(jīng)網(wǎng)絡(luò)的數(shù)據(jù)挖掘算法研究[A];全國(guó)第二屆信號(hào)處理與應(yīng)用學(xué)術(shù)會(huì)議專刊[C];2008年
3 陳曦;曾凡鋒;;數(shù)據(jù)挖掘算法在風(fēng)險(xiǎn)評(píng)估中的應(yīng)用[A];2007通信理論與技術(shù)新發(fā)展——第十二屆全國(guó)青年通信學(xué)術(shù)會(huì)議論文集(上冊(cè))[C];2007年
4 郭新宇;梁循;;大型數(shù)據(jù)庫(kù)中數(shù)據(jù)挖掘算法SLIQ的研究及仿真[A];2004年中國(guó)管理科學(xué)學(xué)術(shù)會(huì)議論文集[C];2004年
5 張沫;欒媛媛;秦培玉;羅丹;;基于聚類算法的多維客戶行為細(xì)分模型研究與實(shí)現(xiàn)[A];2011年通信與信息技術(shù)新進(jìn)展——第八屆中國(guó)通信學(xué)會(huì)學(xué)術(shù)年會(huì)論文集[C];2011年
6 潘國(guó)林;楊帆;;數(shù)據(jù)挖掘算法在保險(xiǎn)客戶分析中的應(yīng)用[A];全國(guó)第20屆計(jì)算機(jī)技術(shù)與應(yīng)用學(xué)術(shù)會(huì)議(CACIS·2009)暨全國(guó)第1屆安全關(guān)鍵技術(shù)與應(yīng)用學(xué)術(shù)會(huì)議論文集(上冊(cè))[C];2009年
7 張乃岳;張力;張學(xué)燕;;基于字段匹配的CRM數(shù)據(jù)挖掘算法與應(yīng)用[A];邏輯學(xué)及其應(yīng)用研究——第四屆全國(guó)邏輯系統(tǒng)、智能科學(xué)與信息科學(xué)學(xué)術(shù)會(huì)議論文集[C];2008年
8 祖巧紅;陳定方;胡吉全;;客戶分析中的數(shù)據(jù)挖掘算法比較研究[A];12省區(qū)市機(jī)械工程學(xué)會(huì)2006年學(xué)術(shù)年會(huì)湖北省論文集[C];2006年
9 李怡凌;馬亨冰;;一種基于本體的關(guān)聯(lián)規(guī)則挖掘算法[A];全國(guó)第19屆計(jì)算機(jī)技術(shù)與應(yīng)用(CACIS)學(xué)術(shù)會(huì)議論文集(下冊(cè))[C];2008年
10 盛立;劉希玉;高明;;基于粗糙集理論的數(shù)據(jù)挖掘算法研究[A];山東省計(jì)算機(jī)學(xué)會(huì)2005年信息技術(shù)與信息化研討會(huì)論文集(二)[C];2005年
相關(guān)重要報(bào)紙文章 前1條
1 ;選擇合適的數(shù)據(jù)挖掘算法[N];計(jì)算機(jī)世界;2007年
相關(guān)博士學(xué)位論文 前4條
1 陳云開(kāi);基于粗糙集和聚類的數(shù)據(jù)挖掘算法及其在反洗錢(qián)中的應(yīng)用研究[D];華中科技大學(xué);2007年
2 張靜;基于粗糙集理論的數(shù)據(jù)挖掘算法研究[D];西北工業(yè)大學(xué);2006年
3 沙朝鋒;基于信息論的數(shù)據(jù)挖掘算法[D];復(fù)旦大學(xué);2008年
4 梁瑾;模糊粗糙單調(diào)數(shù)據(jù)挖掘算法及在污水處理中應(yīng)用研究[D];華南理工大學(xué);2011年
相關(guān)碩士學(xué)位論文 前10條
1 謝亞鑫;基于Hadoop的數(shù)據(jù)挖掘算法的研究[D];華北電力大學(xué);2015年
2 彭軍;基于新型異構(gòu)計(jì)算平臺(tái)的數(shù)據(jù)挖掘算法研究與實(shí)現(xiàn)[D];電子科技大學(xué);2015年
3 楊維;基于Hadoop的健康物聯(lián)網(wǎng)數(shù)據(jù)挖掘算法研究與實(shí)現(xiàn)[D];東北大學(xué);2013年
4 張永芳;基于Hadoop平臺(tái)的并行數(shù)據(jù)挖掘算法研究[D];安徽理工大學(xué);2016年
5 李圍成;基于FP-樹(shù)的時(shí)空數(shù)據(jù)挖掘算法研究[D];河南工業(yè)大學(xué);2016年
6 官凱;基于MapReduce的圖挖掘研究[D];貴州師范大學(xué);2016年
7 陳名輝;基于YARN和Spark框架的數(shù)據(jù)挖掘算法并行研究[D];湖南師范大學(xué);2016年
8 劉少龍;面向大數(shù)據(jù)的高效數(shù)據(jù)挖掘算法研究[D];華北電力大學(xué)(北京);2016年
9 羅俊;數(shù)據(jù)挖掘算法的并行化研究及其應(yīng)用[D];青島大學(xué);2016年
10 甘昕艷;數(shù)據(jù)挖掘算法在電子病歷系統(tǒng)中的應(yīng)用研究[D];廣西大學(xué);2013年
,本文編號(hào):2194417
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2194417.html