天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁 > 科技論文 > 自動化論文 >

利用未標記數(shù)據(jù)的機器學(xué)習(xí)方法研究

發(fā)布時間:2018-04-23 10:26

  本文選題:機器學(xué)習(xí) + 半監(jiān)督學(xué)習(xí); 參考:《南京大學(xué)》2017年碩士論文


【摘要】:機器學(xué)習(xí)需要有標記數(shù)據(jù)來訓(xùn)練模型進行預(yù)測,有標記數(shù)據(jù)的獲取通常需要人工參與,因此價格非常昂貴。在很多實際應(yīng)用中,未標記數(shù)據(jù)可以較為容易地大量獲取,如何利用廉價的未標記數(shù)據(jù)一直以來都是機器學(xué)習(xí)領(lǐng)域中的研究熱點。目前出現(xiàn)了兩種利用未標記數(shù)據(jù)的方法:一種是自動利用未標記數(shù)據(jù)輔助有標記數(shù)據(jù)提升學(xué)習(xí)性能的半監(jiān)督學(xué)習(xí);雖然該類方法大多能夠提升學(xué)習(xí)性能,但都基于潛在的模型假設(shè),當(dāng)模型假設(shè)與數(shù)據(jù)分布存在偏差時可能會降低學(xué)習(xí)性能;另一種是通過眾包以較低的代價給數(shù)據(jù)提供標記,進而可以精確利用未標記數(shù)據(jù)以降低學(xué)習(xí)風(fēng)險。本文主要圍繞半監(jiān)督學(xué)習(xí)和眾包進行研究,取得了以下進展:第一,針對半監(jiān)督學(xué)習(xí)中的重要風(fēng)范協(xié)同訓(xùn)練易受不充分視圖的影響這一問題,提出了一種新型的加權(quán)協(xié)同訓(xùn)練算法。視圖不充分時協(xié)同訓(xùn)練過程中會出現(xiàn)與最優(yōu)分類器不一致的樣本,該算法通過檢測潛在的不一致樣本并降低其權(quán)值以減少這些樣本對訓(xùn)練過程的影響。實驗結(jié)果表明,與標準的協(xié)同訓(xùn)練算法相比該算法有更好的泛化性能與更強的魯棒性。第二,針對眾包過程中任務(wù)標記依賴于任務(wù)難度這一特點,提出了一種新型的任務(wù)分配算法。該算法通過估計部分任務(wù)的難度構(gòu)建訓(xùn)練集學(xué)得預(yù)測難度的模型,將任務(wù)分為簡單和困難兩類。對于簡單的任務(wù)可利用眾包進行標記;而對于困難的任務(wù),則需雇傭?qū)<覟槠涮峁└哔|(zhì)量標記。實驗結(jié)果表明該算法能夠在提高標記質(zhì)量的同時降低標記代價。此外,本文還對利用未標記數(shù)據(jù)的模型復(fù)用進行了研究,該場景中用戶需要集成多個無法修改的預(yù)訓(xùn)練模型,針對這一問題,本文提出了一種新型的多視圖模型復(fù)用算法。該算法通過信念傳播估計預(yù)訓(xùn)練模型的可靠性,并基于未標記數(shù)據(jù)上的多視圖一致性指導(dǎo)這一估計過程,進而利用估計得到的可靠性加權(quán)集成多個預(yù)訓(xùn)練模型。實驗結(jié)果表明該方法能夠顯著提升分類精度。
[Abstract]:Machine learning requires labeled data to train models for prediction, and the acquisition of labeled data usually requires manual participation, so the price is very expensive. In many practical applications, unlabeled data can be easily obtained in large quantities. How to use cheap unlabeled data has always been a hot topic in the field of machine learning. At present, there are two methods to use unlabeled data: one is to use unlabeled data automatically to assist semi-supervised learning with labeled data to improve learning performance, although most of these methods can improve learning performance. But both are based on underlying model assumptions, which can reduce learning performance when the model assumption deviates from the data distribution; the other is to tag the data at a lower cost through crowdsourcing. Furthermore, unlabeled data can be used accurately to reduce the risk of learning. This paper mainly focuses on semi-supervised learning and crowdsourcing, and has made the following progress: first, aiming at the problem that the important cooperative training in semi-supervised learning is easily affected by insufficient views, A new weighted cooperative training algorithm is proposed. When the view is not sufficient, there will be samples that are inconsistent with the optimal classifier. The algorithm can reduce the influence of these samples on the training process by detecting the potentially inconsistent samples and reducing their weights. Experimental results show that the proposed algorithm has better generalization performance and better robustness than the standard cooperative training algorithm. Secondly, a new task assignment algorithm is proposed to solve the problem that task marking depends on task difficulty in crowdsourcing. By estimating the difficulty of some tasks, the algorithm constructs a training set model to predict the difficulty, and divides the task into two categories: simple and difficult. Simple tasks can be tagged with crowdsourcing; for difficult tasks, specialists are hired to provide high quality tags. Experimental results show that the proposed algorithm can improve the marking quality and reduce the marking cost. In addition, this paper also studies the reuse of models using unlabeled data. In this scenario, users need to integrate several pre-training models that can not be modified. In order to solve this problem, a new multi-view model reuse algorithm is proposed in this paper. The algorithm estimates the reliability of the pre-training model through belief propagation, and guides the estimation process based on multi-view consistency on unlabeled data, and then integrates multiple pre-training models weighted by the estimated reliability. Experimental results show that this method can significantly improve the classification accuracy.
【學(xué)位授予單位】:南京大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:TP181

【共引文獻】

相關(guān)期刊論文 前10條

1 朱小香;許金森;薩U喲,

本文編號:1791566


資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/zidonghuakongzhilunwen/1791566.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶36a80***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com
日韩午夜老司机免费视频 | 免费精品国产日韩热久久| 亚洲一区二区三区日韩91| 国产精品推荐在线一区| 国产精品伦一区二区三区四季| 99久久国产综合精品二区| 色哟哟在线免费一区二区三区| 91爽人人爽人人插人人爽| 欧美色婷婷综合狠狠爱| 免费大片黄在线观看国语| 五月情婷婷综合激情综合狠狠| a久久天堂国产毛片精品| 黄色激情视频中文字幕| 亚洲婷婷开心色四房播播| 欧美日韩精品一区二区三区不卡| 亚洲欧美日本国产不卡| 日本精品中文字幕在线视频 | 成人日韩在线播放视频| 人妻中文一区二区三区| 老司机精品在线你懂的| 免费在线成人午夜视频| 国产精品激情在线观看| 久热在线视频这里只有精品| 日本 一区二区 在线| 日韩精品一区二区毛片| 日韩精品在线观看完整版 | 日韩人妻有码一区二区| 亚洲欧美日韩中文字幕二欧美| 四十女人口红哪个色好看| 日韩精品视频香蕉视频| 高潮少妇高潮久久精品99| 熟女少妇久久一区二区三区| 少妇肥臀一区二区三区| 色婷婷成人精品综合一区| 久久亚洲国产视频三级黄| 欧美丝袜诱惑一区二区| 日韩欧美综合中文字幕| 欧美成人精品一区二区久久| 日韩特级黄片免费观看| 亚洲精品一区二区三区日韩| 激情综合五月开心久久|