稀疏優(yōu)化在機器學(xué)習中的若干應(yīng)用
發(fā)布時間:2018-05-09 11:57
本文選題:非光滑優(yōu)化 + 稀疏優(yōu)化 ; 參考:《大連理工大學(xué)》2013年博士論文
【摘要】:近年來,利用解的稀疏性和其他內(nèi)在結(jié)構(gòu)成為眾多計算和工程領(lǐng)域中共同關(guān)注的問題.稀疏的內(nèi)含不僅是指“只有很少的非零分量”,它蘊含著“具有一種簡單結(jié)構(gòu)”.本文對機器學(xué)習中不同問題的稀疏結(jié)構(gòu)進行建模,并在必要時改進經(jīng)典的稀疏優(yōu)化算法進行求解.論文的主要工作可概括如下: 1.第2章給出了本文在解決不同的機器學(xué)習問題中所提出的稀疏優(yōu)化模型及算法.所提出的稀疏優(yōu)化模型有同樣的抽象結(jié)構(gòu),即在一個具有某種簡單或特定結(jié)構(gòu)的假設(shè)空間上極小化某個損失泛函.本文中給出的盒子約束的Lasso模型及塊PCA模型均具有這一結(jié)構(gòu).該章給出了求解盒子約束的Lasso模型的同倫算法及求解塊PCA模型的Splitting算法. 2.第3章研究了求解盒子約束的Lasso模型的同倫算法的收斂性并檢驗了該算法的數(shù)值性能.該章的工作指出同倫算法收斂性不是顯然成立.在無退化指標假設(shè)和其它較弱的條件下,該章證明了同倫算法具有有限終止性.另外,該章討論了退化和循環(huán)的問題.當前已有眾多算法可求解該模型,但數(shù)值實驗證明同倫算法具有特別的優(yōu)勢:適于最優(yōu)解非常稀疏的問題及需要計算整條正則化路徑的情形.這是第4章協(xié)同過濾數(shù)據(jù)可預(yù)測性問題的計算中所采用的關(guān)鍵技術(shù). 3.第4章研究了協(xié)同過濾問題中評分數(shù)據(jù)的可預(yù)測性問題.當前協(xié)同過濾方面的大部分工作主要研究算法性能的改進.該章指出,受評分數(shù)據(jù)自身的限制,評分矩陣中有一部分未知評分是難于給出準確預(yù)測的.第4章提出了一個新的度量——相關(guān)性,以度量用戶在某個商品上的評分能被準確預(yù)測的可能性.一個用戶一商品對的相關(guān)性由相關(guān)的用戶和商品構(gòu)成的社區(qū)所確定.作為相關(guān)性度量的應(yīng)用,提出了基于數(shù)據(jù)的組合方法(DOC)以應(yīng)用于推薦系統(tǒng). 4.第5章研究從時間序列基因表達數(shù)據(jù)中推斷基因正則化網(wǎng)絡(luò)(GRN).由于計算復(fù)雜度較大,大部分GRN重建方法僅限于推斷較低連通性的單個網(wǎng)絡(luò).該章提出了網(wǎng)絡(luò)和社區(qū)識別方法,結(jié)合社區(qū)結(jié)構(gòu)信息,從基因表達數(shù)據(jù)中推斷多個子網(wǎng)絡(luò).其中的塊PCA模型,通過第2章給出的Splitting算法,可有效求解網(wǎng)絡(luò)中的社區(qū)結(jié)構(gòu). 5.第6章研究了作為蛋白質(zhì)鑒別關(guān)鍵步驟的肽段識別問題.序列數(shù)據(jù)庫搜索是當前肽段識別的主流方法.但搜索引擎給出的大量的匹配是不正確的.現(xiàn)有方法大多基于半監(jiān)督或監(jiān)督學(xué)習框架,充分利用了誘騙PSM的樣本及標簽信息,但目標PSM樣本點自身信息沒有被充分利用.該章提出了一個稱為FC-Ranker的新的評分方法,給每個目標PSM賦予一個非負權(quán)重,反映其匹配正確的可能性.特別地,FC-Ranker通過模糊支持向量機分類模型和所提出的模糊Silhouette指標迭代更新該權(quán)重.FC-Ranker在ROC指標、相同F(xiàn)DR水平下鑒別目標PSM的數(shù)目等方面的性能表現(xiàn)超過了主流后驗數(shù)據(jù)庫搜索方法.
[Abstract]:In recent years, the use of sparse solutions and other internal structures has become a common concern in many fields of computation and engineering. Sparse inclusion means not only that there are few non-zero components, but also that there is a simple structure. In this paper, the sparse structure of different problems in machine learning is modeled, and the classical sparse optimization algorithm is improved to solve the problem when necessary. The main work of the thesis can be summarized as follows: 1. In chapter 2, the sparse optimization model and algorithm for solving different machine learning problems are given. The proposed sparse optimization model has the same abstract structure, that is, minimizing a loss functional in a hypothetical space with some simple or specific structure. Both the box constrained Lasso model and the block PCA model presented in this paper have this structure. In this chapter, the homotopy algorithm for solving box constrained Lasso model and the Splitting algorithm for solving block PCA model are given. 2. In chapter 3, the convergence of homotopy algorithm for Lasso model with box constraints is studied and its numerical performance is tested. The work of this chapter points out that the convergence of homotopy algorithm is not obvious. Under the assumption of no degenerate index and other weaker conditions, the homotopy algorithm is proved to be finitely terminated in this chapter. In addition, the problem of degradation and cycle is discussed in this chapter. At present, there are many algorithms to solve the model, but the numerical experiments show that the homotopy algorithm has a special advantage: it is suitable for the problem where the optimal solution is very sparse and the entire regularization path needs to be calculated. This is the key technique used in the calculation of the predictability of cooperative filtering data in Chapter 4. 3. Chapter 4 studies the predictability of scoring data in collaborative filtering. Most of the current collaborative filtering efforts are focused on improving the performance of the algorithm. It is pointed out in this chapter that due to the limitation of the scoring data, it is difficult to predict accurately some unknown scores in the scoring matrix. In Chapter 4, a new metric, correlation, is proposed to measure the probability that a user's score on a product can be accurately predicted. The relevance of a user-to-commodity pair is determined by the associated user and the community of goods. As an application of correlation measurement, a data-based combination method (DOC) is proposed to be used in recommendation systems. 4. Chapter 5 studies the inference of gene regularization network from time series gene expression data. Because of the high computational complexity, most of the GRN reconstruction methods are limited to inferring a single network with low connectivity. In this chapter, network and community identification methods are proposed, and multiple subnetworks are inferred from gene expression data combined with community structure information. The block PCA model can effectively solve the community structure in the network through the Splitting algorithm given in Chapter 2. 5. In chapter 6, the recognition of peptide segment as the key step of protein identification is studied. Sequence database search is the main method of peptide recognition. But the search engine gives a large number of matches is incorrect. Most of the existing methods are based on semi-supervised or supervised learning framework, which make full use of the sample and label information of decoy PSM, but the information of target PSM sample points is not fully utilized. In this chapter, a new scoring method called FC-Ranker is proposed, which assigns a non-negative weight to each target PSM, reflecting the possibility of a correct match. In particular, FC-Ranker iteratively updates the weight through the fuzzy support vector machine classification model and the proposed fuzzy Silhouette index. The performance of FC-Ranker in terms of ROC index and the number of target PSM at the same FDR level is better than that of the mainstream posteriori database search method.
【學(xué)位授予單位】:大連理工大學(xué)
【學(xué)位級別】:博士
【學(xué)位授予年份】:2013
【分類號】:TP181
【參考文獻】
相關(guān)博士學(xué)位論文 前1條
1 安百國;關(guān)于模型稀疏性的研究[D];東北師范大學(xué);2012年
,本文編號:1865933
本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/1865933.html
最近更新
教材專著