天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于標記依賴關(guān)系的多標記學習算法研究

發(fā)布時間:2018-06-05 14:24

  本文選題:數(shù)據(jù)挖掘 + 分類學習; 參考:《北京交通大學》2016年博士論文


【摘要】:多標記分類是機器學習和數(shù)據(jù)挖掘中一個重要的研究問題,其目的是為了預測實例同時具有的多個標記。在大多實際應用中,實例的多個標記之間存在著潛在的依賴關(guān)系,發(fā)掘其中蘊含的有用信息往往能夠有效地提高分類模型的學習性能。因此,如何學習和利用標記間的依賴關(guān)系,已經(jīng)成為當前多標記分類學習領域的關(guān)鍵問題之一。本文首先對研究現(xiàn)狀進行了總結(jié),分析了現(xiàn)有方法的優(yōu)缺點。接著,探索了學習和利用不同類型和應用場景下的多標記間依賴關(guān)系的多種途徑,提出了多種更加有效的多標記分類模型和算法。本文取得的研究成果主要如下:(1)分類器鏈等模型往往隨機地為每個標記確定其所依賴的其它標記,因此可能獲得與實際不符的結(jié)果。為解決這一問題,本文提出了一種利用樹型貝葉斯網(wǎng)絡來表示標記間依賴關(guān)系的方法。該方法通過明確度量多標記間依賴程度的大小,來構(gòu)建一個以標記為節(jié)點,標記間依賴程度大小為權(quán)重的網(wǎng)絡結(jié)構(gòu),從而能夠合理地確定多標記間的依賴關(guān)系。進一步,還利用集成學習技術(shù)構(gòu)建了多個可能的標記間依賴結(jié)構(gòu),從而能夠更充分地考慮多標記間的相互依賴關(guān)系。實驗結(jié)果驗證了該算法的有效性,這表明通過度量標記間的依賴程度大小并充分考慮標記間的相互依賴關(guān)系,能夠進一步提升分類模型的性能。(2)提出了一種利用圖結(jié)構(gòu)表示標記間的依賴程度,并將多標記間依賴關(guān)系的迭代傳播表示成在圖上的隨機游走過程的多標記學習算法。該方法首先構(gòu)建了標記間的圖結(jié)構(gòu),并利用重啟動隨機游走模型來模擬標記間依賴關(guān)系在圖中的迭代傳播過程。對給定測試實例,該方法首先給出各標記為其真實標記的初始概率,然后采用類似PageRank的方法迭代地更新各標記的值直到收斂為止。這種迭代重復更新的過程使得,各標記不僅能考慮和其有直接依賴關(guān)系的標記對其的影響,也能考慮其它間接的依賴關(guān)系。實驗結(jié)果表明,該算法在多種評價標準下都明顯優(yōu)于其它對比算法,尤其當數(shù)據(jù)集具有較多的標記時。這表明,考慮標記間依賴關(guān)系的迭代傳播,能夠更為有效地發(fā)掘和利用其中潛在的有用信息。(3)在上一種方法的基礎上,進一步提出了一種能夠考慮多種潛在因素,并通過最優(yōu)化給定的目標函數(shù)來學習多標記間最優(yōu)的依賴程度的多標記學習算法。該方法利用了多核學習的思想,首先基于不同的依賴關(guān)系定義,從不同方面給出了標記間依賴程度的多種度量結(jié)果,然后以這些度量為輸入利用線性模型學習標記間的最終依賴程度。該方法的優(yōu)勢包括:一是能夠綜合考慮從不同角度出發(fā)的標記間依賴程度的度量;二是其通過最小化分類模型所采用的損失函數(shù)來估計線性模型的參數(shù),因此能夠?qū)W習到對當前分類任務最優(yōu)的標記間依賴程度。實驗結(jié)果表明,通過優(yōu)化目標函數(shù)而學習到的標記間依賴關(guān)系和程度,和上一種方法等對比算法相比,該方法能明顯地提升相應分類模型的性能。(4)針對弱標記和存在大量標記的問題,本文基于矩陣分解模型提出了一種學習最優(yōu)的標記排序的方法。該方法能夠?qū)⒃瓨擞浛臻g映射到一個低維空間,從而能夠顯著地減少標記個數(shù)并因此降低計算量。對訓練集中的每個實例,都可以獲得兩個標記集合:已經(jīng)明確給出的標記,和其它沒有明確給出的標記,F(xiàn)有方法中大多假設,若標記沒有明確給出則即為實例的非相關(guān)標記(非1即0)。為避免該假設可能引入的錯誤信息,本文所提方法僅假設,對每個實例,和沒有明確給出的標記相比,那些明確給出的標記更應該是實例的相關(guān)標記。相應地,該方法設計了一種類似AUC曲線的損失函數(shù),并通過優(yōu)化該損失函數(shù)使得在為實例預測的標記排序中,那些明確給出的標記都盡量排在沒有明確給出的標記之前。因此,該方法能夠在存在弱標記的情況下,充分利用標記間的依賴關(guān)系來產(chǎn)生一個更為合理的標記排序。實驗結(jié)果驗證了該方法在特定數(shù)據(jù)集合上有著更好的性能。以上研究成果從利用不同類型的標記依賴關(guān)系的角度出發(fā),提出了相應的學習方法和模型并通過實驗驗證了其有效性,為實際應用和進一步研究奠定了良好的基礎。
[Abstract]:Multi label classification is an important research problem in machine learning and data mining. The purpose is to predict multiple markers at the same time. In most practical applications, there is a potential dependency between multiple markers in the actual application, and the discovery of useful information contained in it can effectively improve the learning of the classification model. Therefore, how to learn and utilize the dependency between tags has become one of the key problems in the field of multi label classification learning. First, this paper summarizes the present situation and analyzes the advantages and disadvantages of the existing methods. Then, it explores how to learn and utilize the multi label dependence between different types and Application scenarios. A variety of more effective multi label classification models and algorithms are proposed. The main results obtained in this paper are as follows: (1) the classifier chain and other models are often randomly assigned to each tag to determine the other markers that they depend on, so it may obtain the results that are not in conformity with the actual situation. The Bias network represents the method of interdependency between markings. This method constructs a network structure which is marked as a node and the size of the dependency between markings is weighted, so that the dependence between multiple markers can be reasonably determined by measuring the size of the dependency degree between multiple markings. A number of possible inter label dependence structures are built to more fully consider the interdependence between multiple markers. The experimental results verify the effectiveness of the algorithm, which shows that the performance of the classification model can be further improved by measuring the dependence of the markers and taking full account of the interdependence between the markers. (2) proposed A graph structure is used to represent the dependence between markings, and the iterative propagation of the dependency relationship between multiple markers is represented as a multi label learning algorithm for random walk on a graph. This method first constructs the graph structure between tags, and uses reboot random walk model to simulate the iterative transmission of the inter label dependency in the graph. For a given test instance, the method first gives the initial probability of each mark as its real mark, and then iteratively updates the values of the tags to convergence by using a similar PageRank method. This iterative process makes the markers not only consider the effects of the tags that have direct dependence on them and their effects, too. We can consider other indirect dependencies. The experimental results show that the algorithm is obviously superior to other algorithms under a variety of evaluation criteria, especially when the data sets have more markers. This shows that the iterative propagation of the dependency relationship between tags can be more effective in discovering and utilizing potential useful information. (3) in the last one On the basis of the method, we propose a multi label learning algorithm which can consider a variety of potential factors and learn the optimal dependence degree between multiple markers by optimizing a given objective function. This method uses the idea of multi-core learning, first based on the definition of different dependency relations, and gives the inter label dependence from different aspects. The results of a variety of degrees are then used to learn the final dependence between markings using the linear model as input. The advantages of this method include: first, it is able to consider the measure of the dependence among the markings from different angles, and two is to estimate the linear model by minimizing the loss function used by the minimized classification model. The parameters of the type are therefore able to learn the degree of dependence between the markers that are optimal for the current classification tasks. The experimental results show that the correlation and degree between the markers learned by optimizing the target function and the level of the previous method can obviously improve the performance of the phase stress classification model. (4) for the weak markup and existence, In this paper, based on the matrix decomposition model, this paper proposes a method of learning optimal label ordering. This method can map the original markup space into a low dimensional space, thus can significantly reduce the number of tags and thus reduce the amount of computation. For each instance of the training set, two tag sets can be obtained: already Most of the existing methods assume that if the markup is not explicitly given is an unrelated mark of an instance (not 1 or 0). In order to avoid the error information that the hypothesis may introduce, the proposed method only assumes that for each instance, compared with the unexplicitly given markup, the method is assumed. The markup that is given should be the correlation marker of the instance. Accordingly, the method designs a loss function similar to the AUC curve, and by optimizing the loss function, the explicitly given markings are arranged before the clearly given markup in the case prediction, so the method can exist. In the case of weak markup, a more reasonable markup sort is produced by making full use of the dependency relationship between tags. The experimental results show that the method has better performance on a specific set of data. The above results are based on the use of different types of label dependence and the corresponding learning methods and models are put forward. The validity of the method is verified by experiments, which lays a good foundation for practical application and further research.
【學位授予單位】:北京交通大學
【學位級別】:博士
【學位授予年份】:2016
【分類號】:TP181

【相似文獻】

相關(guān)期刊論文 前10條

1 佘俏俏;俞揚;姜遠;周志華;;一種基于標記傳播的大規(guī)模圖像分類方法[J];計算機研究與發(fā)展;2012年11期

2 王亮,朱征宇;基于擴展標記圖的Web信息抽取器[J];計算機工程;2005年08期

3 錢竹青;譚慶平;劉峰;楊艷萍;;基于本體論和標記圖相似性的Web服務匹配算法[J];微電子學與計算機;2006年10期

4 陳瀅,徐宏炳,王能斌;基于標記圖的Web數(shù)據(jù)模型[J];計算機學報;1999年03期

5 朱征宇;朱慶生;王茜;;基于擴展標記圖的虛擬網(wǎng)頁技術(shù)[J];計算機科學;2001年11期

6 張煜;孫家;張曉東;;一種基于距離變換與標記圖的邊緣匹配方法[J];武漢大學學報(信息科學版);2006年08期

7 史安生;呂東輝;張海燕;楊云峰;;足部標記圖像中標尺提取與像素測量[J];計算機應用;2009年02期

8 呂勇;宋詞;周剛;;雙重識別功能的印刷套印標記設計及誤差分析[J];包裝工程;2012年05期

9 古天龍;;Petri網(wǎng)理論及其應用[J];桂林電子工業(yè)學院學報;1992年01期

10 ;[J];;年期

相關(guān)會議論文 前3條

1 Raimund Parzmair;荊德君;;高溫產(chǎn)品的標記和跟蹤[A];1999中國鋼鐵年會論文集(上)[C];1999年

2 錢竹青;譚慶平;劉峰;楊艷萍;;基于本體論和標記圖相似性的Web服務匹配算法[A];2006年全國開放式分布與并行計算學術(shù)會議論文集(二)[C];2006年

3 沈志軍;馬瑞娟;俞明亮;蔡志翔;杜平;許建蘭;;油蟠桃組合SSR標記連鎖圖譜及主要糖酸風味性狀的QTL分析[A];中國園藝學會桃分會成立暨學術(shù)研討會論文集[C];2007年

相關(guān)博士學位論文 前1條

1 付彬;基于標記依賴關(guān)系的多標記學習算法研究[D];北京交通大學;2016年

相關(guān)碩士學位論文 前6條

1 姚前;基于部分標記圖的頻繁子圖挖掘算法研究[D];重慶大學;2009年

2 高振華;基于標記間相關(guān)性的多標記分類算法[D];中南大學;2013年

3 任晉滔;基于多標記學習的中醫(yī)問診系統(tǒng)的研究[D];華東理工大學;2012年

4 王亮;基于擴展標記圖的網(wǎng)頁瀏覽與檢索研究[D];重慶大學;2004年

5 劉倩;民國出版標記的設計與文化[D];北京印刷學院;2015年

6 郝虹;基于樣例及屬性特征分析的多標記分類算法研究[D];山東師范大學;2015年

,

本文編號:1982291

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/shoufeilunwen/xxkjbs/1982291.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶d6778***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com
欧美激情中文字幕综合八区| 国产目拍亚洲精品区一区| 国产三级视频不卡在线观看| 婷婷色国产精品视频一区| 成人免费视频免费观看| 一区二区在线激情视频| 小草少妇视频免费看视频| 黄片免费播放一区二区| 国产免费成人激情视频| 中文字字幕在线中文乱码二区| 日韩丝袜诱惑一区二区| 在线日韩欧美国产自拍| 亚洲成人久久精品国产| 久久精品欧美一区二区三不卡| 真实偷拍一区二区免费视频| 日本乱论一区二区三区| 久久精品国产亚洲av麻豆尤物| 亚洲国产成人爱av在线播放下载| 中文字幕无线码一区欧美| 插进她的身体里在线观看骚| 蜜桃臀欧美日韩国产精品| 大香蕉伊人一区二区三区| 欧美日韩在线视频一区| 亚洲高清一区二区高清| 加勒比系列一区二区在线观看| 欧美一区二区三区十区| 日本熟女中文字幕一区| 91在线国内在线中文字幕| 亚洲欧美日韩中文字幕二欧美| 国产一区二区精品丝袜 | 亚洲免费黄色高清在线观看| 亚洲专区中文字幕在线| 日本中文字幕在线精品| 精品al亚洲麻豆一区| 久久亚洲精品成人国产| 日韩欧美二区中文字幕| 午夜精品一区免费视频| 日本丁香婷婷欧美激情| 日韩一级欧美一级久久| 视频一区日韩经典中文字幕| 国产精品一区二区视频大全|