天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

魯棒非負矩陣分解算法研究

發(fā)布時間:2018-04-02 10:40

  本文選題:數(shù)據(jù)挖掘 切入點:非負矩陣分解 出處:《北京交通大學》2017年碩士論文


【摘要】:隨著互聯(lián)網(wǎng)的發(fā)展,大數(shù)據(jù)時代悄無聲息地走到了我們身旁,每天用戶各種各樣的行為產(chǎn)生了數(shù)以億計的數(shù)據(jù),這其中就包括了社交信息,購物信息以及瀏覽信息等。大量數(shù)據(jù)中包含著很多我們平常并不可見的用戶行為規(guī)律,這些規(guī)律往往能帶來更好的經(jīng)濟效益或者更高的工作效率等。因此,如何從海量的數(shù)據(jù)中找到對于自己來說有價值的信息成為了大數(shù)據(jù)時代的熱點,數(shù)據(jù)挖掘正是在這種迫切的需求下應(yīng)運而生。矩陣分解是數(shù)據(jù)挖掘中的一個重要研究領(lǐng)域,它被廣泛地應(yīng)用于圖像和文本的挖掘中。但在實際應(yīng)用中矩陣分解往往要面臨圖像像素值不能為負以及文檔統(tǒng)計中負值沒有意義等問題,如果不能對負值進行一個很好的處理,就會使算法的可解釋性大大降低。為了增強可解釋性,非負矩陣分解慢慢地進入了人們的視線。非負矩陣分解為分解后的基矩陣和系數(shù)矩陣增加了非負約束,這一約束很好地契合了一些實際應(yīng)用場景中負值沒有意義的特點,增強了算法的可解釋性。除此之外,其還具有求解過程收斂速度快以及占用存儲空間小的特點,這些優(yōu)勢使其非常適合作為大數(shù)據(jù)的處理方法。但是,經(jīng)典的非負矩陣分解算法對于噪聲數(shù)據(jù)的控制并不是很好,它對于誤差的平方計算放大了噪聲數(shù)據(jù)對算法結(jié)果的影響,限制了其在實際場景中的應(yīng)用。在后續(xù)改進中,通過不再對數(shù)據(jù)點之間的冗余進行平方計算,只是進行簡單地累加,在一定程度上降低了噪聲數(shù)據(jù)的影響,但其不能很好地適應(yīng)數(shù)據(jù)集中噪聲數(shù)據(jù)比例的變化,致使其在一些數(shù)據(jù)集中不能得到理想的結(jié)果。本文針對此問題提出了兩個非負矩陣分解算法,分別是截斷式魯棒非負矩陣分解算法以及雙重截斷式魯棒非負矩陣分解算法。截斷式魯棒非負矩陣分解算法在基于L_(2,1)范數(shù)的魯棒非負矩陣分解算法的基礎(chǔ)上引入了數(shù)據(jù)點個數(shù)截斷參數(shù),用計算出的每個數(shù)據(jù)點的冗余與之進行比較,比之大者,截斷為零,反之繼續(xù)進行計算。這樣就將誤差大的噪聲數(shù)據(jù)點剔除了出去,減小了對算法結(jié)果的影響,同時可以通過截斷參數(shù)對數(shù)據(jù)集中噪聲數(shù)據(jù)比例變化進行適應(yīng),增強了算法的魯棒性。雙重截斷式魯棒非負矩陣分解算法在截斷式魯棒非負矩陣分解算法的基礎(chǔ)上更進一步,其更好地考慮了數(shù)據(jù)的本質(zhì)結(jié)構(gòu),引入Ridge Leverage Score對識別噪聲數(shù)據(jù)的計算標準進行了改進,同時增加了對噪聲屬性的處理,引入了用于控制噪聲屬性個數(shù)的截斷參數(shù)。這些改進提高了結(jié)果的準確性,增強了算法的魯棒性,使其能適應(yīng)復雜的實際應(yīng)用場景,得以廣泛應(yīng)用。
[Abstract]:With the development of the Internet, big data came quietly to us in the age of big data, and hundreds of millions of data were generated by the various behaviors of users every day, including social information. Shopping information and browsing information. A lot of data contains a lot of user behavior laws that we don't usually see. These laws often bring better economic benefits or higher work efficiency. How to find valuable information from the mass of data has become a hot spot in big data's era, and data mining comes into being in this urgent need. Matrix decomposition is an important research field in data mining. It is widely used in image and text mining, but in practical application, matrix decomposition often faces problems such as that the pixel value of the image cannot be negative and the negative value in document statistics has no meaning, if the negative value can not be processed well, the matrix decomposition often faces the problem that the pixel value of the image cannot be negative and the negative value in the document statistics is meaningless. In order to enhance interpretability, the nonnegative matrix factorization slowly enters the attention of people. The nonnegative matrix factorization adds nonnegative constraints to the basis matrix and coefficient matrix after decomposition. This constraint fits well with some characteristics of negative values in practical applications, and enhances the interpretability of the algorithm. In addition, it has the advantages of fast convergence and small storage space. These advantages make it very suitable for big data. However, the classical nonnegative matrix decomposition algorithm is not very good for the control of noise data, and it amplifies the effect of noise data on the result of the algorithm for the square calculation of errors. In the subsequent improvement, by not square the redundancy between the data points, it is simply accumulated to reduce the impact of noise data to a certain extent. However, it can not adapt to the change of noise data ratio in data set, so it can not get ideal results in some data sets. In this paper, two non-negative matrix decomposition algorithms are proposed to solve this problem. The truncated robust non-negative matrix factorization algorithm and the double truncated robust non-negative matrix factorization algorithm are respectively. The truncated robust non-negative matrix factorization algorithm is introduced on the basis of the robust non-negative matrix factorization algorithm based on the LS-1) norm. The number of data points is truncated. The redundancy of each calculated data point is compared with that of the calculated data point. The larger data points are truncated to zero, and the calculation is carried out on the contrary. Thus, the noise data points with large errors are eliminated and the influence on the algorithm results is reduced. At the same time, the robustness of the algorithm can be enhanced by using truncation parameters to adapt to the change of noise data scale in the dataset. The dual truncated robust non-negative matrix decomposition algorithm is further based on the truncated robust non-negative matrix decomposition algorithm. It considers the essential structure of the data better, and introduces the Ridge Leverage Score to improve the calculation standard of the noise recognition data, at the same time, the processing of the noise attribute is added. The truncation parameters used to control the number of noise attributes are introduced. These improvements improve the accuracy of the results, enhance the robustness of the algorithm, and enable it to adapt to complex practical application scenarios and be widely used.
【學位授予單位】:北京交通大學
【學位級別】:碩士
【學位授予年份】:2017
【分類號】:TP391.41

【參考文獻】

相關(guān)期刊論文 前5條

1 楊俊美;余華;韋崗;;獨立分量分析及其在信號處理中的應(yīng)用[J];華南理工大學學報(自然科學版);2012年11期

2 陳紅艷;馬上;;獨立分量分析在圖像處理中的應(yīng)用[J];計算機工程;2011年S1期

3 郭武;張鵬;王潤生;;獨立分量分析及其在圖像處理中的應(yīng)用現(xiàn)狀[J];計算機工程與應(yīng)用;2008年23期

4 陳才扣,楊健,楊靜宇,高秀梅;基于圖像矩陣的廣義主分量分析[J];電子與信息學報;2004年12期

5 田媚,羅四維;基于奇異值分解變換的數(shù)據(jù)壓縮方法探討[J];北方交通大學學報;2003年02期



本文編號:1700004

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/jingjilunwen/jiliangjingjilunwen/1700004.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶7d1d0***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com