基于ELM的主動(dòng)學(xué)習(xí)的研究
發(fā)布時(shí)間:2018-05-13 05:09
本文選題:主動(dòng)學(xué)習(xí) + 極限學(xué)習(xí)機(jī) ; 參考:《河北大學(xué)》2017年碩士論文
【摘要】:現(xiàn)實(shí)世界中存在著大量無(wú)類標(biāo)的數(shù)據(jù),如醫(yī)療圖像數(shù)據(jù)、網(wǎng)頁(yè)數(shù)據(jù)、視頻數(shù)據(jù)等。在大數(shù)據(jù)時(shí)代,這種情況更加突出。標(biāo)注這些無(wú)類標(biāo)的數(shù)據(jù),需要付出巨大的代價(jià)。主動(dòng)學(xué)習(xí)是解決這一問題的有效手段,是近幾年機(jī)器學(xué)習(xí)和數(shù)據(jù)挖掘領(lǐng)域中的研究熱點(diǎn)之一。本文在分類的框架下,研究了基于在線序列極限學(xué)習(xí)機(jī)的主動(dòng)學(xué)習(xí)問題。本文的貢獻(xiàn)主要包括以下兩點(diǎn):(1)研究了隨機(jī)權(quán)分布對(duì)極限學(xué)習(xí)機(jī)性能影響,得出了如下結(jié)論:(a1)對(duì)于不同的問題或不同的數(shù)據(jù)集,服從[-1,1]區(qū)間均勻分布的隨機(jī)權(quán)不一定是最優(yōu)的選擇;(a2)用服從均勻分布和高斯分布的隨機(jī)數(shù)初始化輸入層權(quán)值和隱含層結(jié)點(diǎn)的偏置得到的測(cè)試精度沒有本質(zhì)的區(qū)別。(2)提出了一種基于在線序列極限學(xué)習(xí)機(jī)的主動(dòng)學(xué)習(xí)算法。提出的算法具有三個(gè)優(yōu)點(diǎn):(b1)利用在線序列極限學(xué)習(xí)機(jī)增量學(xué)習(xí)的特點(diǎn),可顯著提高學(xué)習(xí)系統(tǒng)的效率;(b2)利用樣例熵作為啟發(fā)式度量無(wú)類標(biāo)樣例的重要性,這種度量能充分刻畫樣例對(duì)分類貢獻(xiàn)的信息量;(b3)用K-近鄰分類器作為Oracle標(biāo)注選出的無(wú)類標(biāo)樣例的類別,K-近鄰Oracle獨(dú)立于評(píng)價(jià)樣例重要性的分類器。實(shí)驗(yàn)結(jié)果顯示,本文提出的算法具有學(xué)習(xí)速度快,標(biāo)注準(zhǔn)確的特點(diǎn)。
[Abstract]:In the real world, there are a lot of unclassified data, such as medical image data, web data, video data and so on. In the big data era, this situation is even more prominent. Tagging these unmarked data requires a great deal of cost. Active learning is an effective method to solve this problem, and it is one of the research hotspots in the field of machine learning and data mining in recent years. In this paper, the active learning problem based on online sequence limit learning machine is studied in the framework of classification. The contributions of this paper mainly include the following two points: 1) the influence of random weight distribution on the performance of LLM is studied, and the following conclusion is drawn: 1) for different problems or different data sets, The random weight of uniform distribution from [-1] interval is not necessarily the best choice.) there is no essential difference between the accuracy of testing obtained by initializing the input layer weights from the random numbers of uniform distribution and Gao Si distribution and bias of hidden layer nodes. An active learning algorithm based on online sequence learning machine is proposed. The proposed algorithm has three advantages: (1) using the incremental learning of on-line sequence limit learning machine, the efficiency of the learning system can be improved significantly. (2) the importance of using sample entropy as a heuristic to measure the sample without class can be significantly improved. This metric can fully describe the contribution of samples to classification information / b _ 3) using K-nearest neighbor classifier as the classifier for Oracle tagging and selecting a class of non-class sample samples, K-nearest neighbor Oracle is independent of evaluating the importance of sample examples. Experimental results show that the proposed algorithm has the characteristics of fast learning speed and accurate labeling.
【學(xué)位授予單位】:河北大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:TP181
【參考文獻(xiàn)】
相關(guān)期刊論文 前2條
1 翟俊海;李塔;翟夢(mèng)堯;王熙照;;ELM算法中隨機(jī)映射作用的實(shí)驗(yàn)研究[J];計(jì)算機(jī)工程;2012年20期
2 田春娜;高新波;李潔;;基于嵌入式Bootstrap的主動(dòng)學(xué)習(xí)示例選擇方法[J];計(jì)算機(jī)研究與發(fā)展;2006年10期
,本文編號(hào):1881784
本文鏈接:http://sikaile.net/kejilunwen/zidonghuakongzhilunwen/1881784.html
最近更新
教材專著