在線學(xué)習(xí)算法研究與應(yīng)用

發(fā)布時(shí)間：2018-03-05 13:14

本文選題：在線學(xué)習(xí)　切入點(diǎn)：時(shí)間序列　出處：《浙江大學(xué)》2017年博士論文　論文類型：學(xué)位論文

【摘要】：隨著信息技術(shù)的飛速發(fā)展和互聯(lián)網(wǎng)應(yīng)用的日益普及,數(shù)據(jù)產(chǎn)生的速度越來(lái)越快。傳統(tǒng)的以批量數(shù)據(jù)處理為特點(diǎn)的離線學(xué)習(xí)算法無(wú)法適應(yīng)大數(shù)據(jù)場(chǎng)景下流式數(shù)據(jù)的特點(diǎn)。在線學(xué)習(xí)算法能夠持續(xù)不斷地接受數(shù)據(jù),動(dòng)態(tài)實(shí)時(shí)地更新模型,適合大規(guī)模和流式數(shù)據(jù)的處理受到了研究者的高度重視,是當(dāng)前機(jī)器學(xué)習(xí)領(lǐng)域的熱點(diǎn)問(wèn)題之一。在線學(xué)習(xí)算法的研究主要包括三個(gè)方面:(1)在線學(xué)習(xí)算法的理論分析;(2)在線學(xué)習(xí)算法應(yīng)用在不同的機(jī)器學(xué)習(xí)任務(wù)中;(3)在線學(xué)習(xí)算法的收斂速率。本文圍繞上述問(wèn)題,從理論分析到具體應(yīng)用對(duì)在線學(xué)習(xí)算法進(jìn)行了比較系統(tǒng)的研究,一方面對(duì)已有算法的不足進(jìn)行改進(jìn),一方面對(duì)若干未解問(wèn)題提出新的解決方案。具體而言,本文的創(chuàng)新點(diǎn)如下:1.ADMM(Alternating Direction Method of Multipliers)是一個(gè)通用的優(yōu)化框架,廣泛應(yīng)用于分布式機(jī)器學(xué)習(xí)的各種任務(wù)中。為了加速在線ADMM算法,將傳統(tǒng)的在線ADMM算法的遺憾度理論分析從基于輪次的分析拓展到基于梯度變化的分析。論文針對(duì)兩種類型的在線ADMM學(xué)習(xí)算法(FTRL-ADMM和PGD-ADMM),分別提出了改進(jìn)的在線ADMM算法,并給出基于梯度變化的遺憾度分析,證明了提出的算法比已有的算法具有更緊湊的遺憾度上界。2.ARIMA 模型(Autoregressive integrated moving average)是時(shí)間序列預(yù)測(cè)中廣泛使用的線性模型。然而,現(xiàn)有的關(guān)于ARIMA模型的學(xué)習(xí)算法都是離線學(xué)習(xí)算法且噪音項(xiàng)必須滿足嚴(yán)格的假設(shè)條件,這嚴(yán)重阻礙了 ARIMA模型的通用性以及解決海量時(shí)間序列預(yù)測(cè)問(wèn)題。因此,本文松弛了關(guān)于ARIMA模型噪音項(xiàng)的假設(shè)并提出了 ARIMA模型的在線學(xué)習(xí)算法。通過(guò)理論分析證明了提出的ARIMA模型在線學(xué)習(xí)算法能夠趨近于最優(yōu)的ARIMA模型離線學(xué)習(xí)算法。在人工數(shù)據(jù)集和真實(shí)數(shù)據(jù)集上進(jìn)行一系列的驗(yàn)證,實(shí)驗(yàn)結(jié)果證明了所提出的算法的效率和有效性。3.近年來(lái),通過(guò)在線學(xué)習(xí)求解非負(fù)矩陣分解任務(wù)的NN-PA算法在推薦系統(tǒng)的應(yīng)用上取得了巨大的成功。為了加速NN-PA算法的收斂速度,論文提出了 NN-APA算法,利用二階的梯度信息進(jìn)行每輪更新,利用“專家學(xué)習(xí)”技術(shù)實(shí)現(xiàn)在線學(xué)習(xí)任務(wù)的參數(shù)自動(dòng)調(diào)整。本文給出了新算法的理論分析,并證明了它比NN-PA算法收斂更快。在一系列關(guān)于推薦系統(tǒng)的數(shù)據(jù)集上進(jìn)行了深度地實(shí)驗(yàn)分析,進(jìn)一步驗(yàn)證了新算法的效率和效力。4.協(xié)同主題回歸(Collaborative Topic Regression,簡(jiǎn)稱CTR)模型結(jié)合了概率矩陣分解(probabilistic matrix factorization 簡(jiǎn)稱 PMF)模型以及主題模型(topic modeling,例如LDA),利用文本信息提升推薦的準(zhǔn)確率。盡管該模型在推薦領(lǐng)域取得了巨大的成功,然而現(xiàn)有的CTR模型推導(dǎo)算法bdi-CTR存在嚴(yán)重的缺陷。首先,bdi-CTR算法是離線算法,無(wú)法適應(yīng)流式的數(shù)據(jù)或者現(xiàn)實(shí)中的大數(shù)據(jù)場(chǎng)景;其次,bdi-CTR算法首先用LDA計(jì)算產(chǎn)品相關(guān)的主題表達(dá),然后把該結(jié)果推送到PMF求解過(guò)程中,它忽略了 PMF對(duì)LDA的作用,也就是說(shuō),該算法并沒(méi)有考慮推薦預(yù)測(cè)信息對(duì)LDA推導(dǎo)主題模型的作用。因此本文提出了一個(gè)在線聯(lián)合推導(dǎo)算法obi-CTR。提出的算法不但可以處理流式數(shù)據(jù),還能利用PMF模型的結(jié)果來(lái)強(qiáng)化LDA模型的推導(dǎo),兩個(gè)模型互相曾增強(qiáng)從而達(dá)到聯(lián)合優(yōu)化的目的。實(shí)驗(yàn)結(jié)果顯示,obi-CTR算法不但能高效地處理流式數(shù)據(jù)以及海量數(shù)據(jù),還能同時(shí)增強(qiáng)主題模型的主題表達(dá)以及推薦系統(tǒng)的預(yù)測(cè)性能。
[Abstract]:With the rapid development of information technology and the increasing popularity of Internet applications, data generated faster and faster. In the traditional batch data processing for the characteristics of the off-line learning algorithm can not adapt to the characteristics of big data scene downflow data. Online learning algorithm can continuously receive data, real-time dynamically update the model for large scale and flow cytometry data has been highly valued by the researchers, is currently one of the hot issues in the field of machine learning. Online learning algorithm mainly includes three aspects: (1) online learning algorithm theory analysis; (2) online learning algorithm learning tasks in different machines; (3) online learning the convergence rate of the algorithm. Based on the above problems, from the theoretical analysis to the specific application of online learning algorithm is studied, a lack of existing algorithms for Improved, puts forward a new solution to some unsolved problems. Specifically, the innovations of this paper are as follows: 1.ADMM (Alternating Direction Method of Multipliers) is a general optimization framework, various tasks are widely used in distributed machine learning. In order to speed up the online ADMM algorithm, the traditional ADMM algorithm online regret the degree of theoretical analysis from the round analysis to based on gradient analysis of change. According to the two types of online ADMM learning algorithm (FTRL-ADMM and PGD-ADMM), were proposed to improve the online ADMM algorithm, and gives the gradient of regret degree analysis based on the proposed algorithm, proved to have more compact upper bound of regret the.2.ARIMA model is better than the existing algorithm (Autoregressive integrated moving average) is a widely used linear model of time series prediction. However, some are about ARIMA Model learning algorithm are off-line learning algorithm and the noise term must meet the strict assumptions, which seriously hindered the universality of the ARIMA model and solve massive time series prediction. Therefore, this paper relaxes on the ARIMA model of the noise hypothesis and put forward the ARIMA model of online learning algorithm. It is proved that the offline a learning algorithm of ARIMA model of online learning algorithm of ARIMA model is proposed to approach the optimal. To verify a series of artificial and real data sets. The experimental results prove that the proposed algorithm's efficiency and effectiveness of.3. in recent years, through the online learning NN-PA algorithm for solving non negative matrix factorization task has made great the successful application in recommendation system. In order to accelerate the convergence speed of NN-PA algorithm, this paper proposes the NN-APA algorithm, using the gradient information into the two order For each round of updates, automatically adjust the parameters by using "expert learning" technology to achieve online learning tasks. This paper gives the analysis of the new algorithm theory, and prove that it converges faster than NN-PA. In a series of recommendation system data sets were analyzed in depth experiments, further verify the efficiency and effectiveness of the new.4. Synergetic Algorithm (Collaborative Topic Regression, the theme of regression referred to as CTR) model combines probabilistic matrix factorization (probabilistic matrix factorization PMF (topic) model and subject model modeling, such as LDA), use to enhance the accuracy of recommendation text information. Although the model in the recommended field has achieved great success, however, the bdi-CTR CTR model there are serious defects in the existing algorithms. Firstly, bdi-CTR algorithm is offline algorithm, unable to adapt to the large flow of data or data scenes in reality; Secondly, the calculation expression of the theme product related bdi-CTR algorithm with LDA at first, then put the result onto the PMF solving process, it ignores the effect of PMF on LDA, that is to say, the algorithm does not consider the effect of recommended predictive information for derivation of the LDA topic model. Therefore, this paper proposes an online joint inference algorithm the obi-CTR. algorithm can not only deal with streaming data, are also using the results of the PMF model to strengthen the LDA model, two models with each other so as to achieve the purpose of Ceng Zengqiang joint optimization. Experimental results show that the obi-CTR algorithm not only can efficiently handle the data stream and massive data, but also enhance the performance prediction model to express the theme the theme and the recommendation system.

【學(xué)位授予單位】：浙江大學(xué)
【學(xué)位級(jí)別】：博士
【學(xué)位授予年份】：2017
【分類號(hào)】：TP181

【相似文獻(xiàn)】

相關(guān)期刊論文前10條

1 程彩娟;“八后問(wèn)題”的算法與程序設(shè)計(jì)[J];天津職業(yè)技術(shù)師范學(xué)院學(xué)報(bào);1991年02期

2 葛磊;武芳;王鵬波;張冬林;;3維建筑綜合中基于最小特征的面平移算法[J];測(cè)繪科學(xué)技術(shù)學(xué)報(bào);2009年02期

3 駱雯,孫延明,陳振威,陳錦昌;判斷點(diǎn)與封閉多邊形相對(duì)關(guān)系的改進(jìn)算法[J];機(jī)械;1999年03期

4 李林;盧顯良;;一種基于切割映射的規(guī)則沖突消除算法[J];電子學(xué)報(bào);2008年02期

5 劉巧玲;張紅英;林茂松;;一種簡(jiǎn)單快速的圖像去霧算法[J];計(jì)算機(jī)應(yīng)用與軟件;2013年07期

6 林亞平,楊小林;快速概率分析進(jìn)化算法及其性能研究[J];電子學(xué)報(bào);2001年02期

7 章郡鋒;吳曉紅;黃曉強(qiáng);何小海;;基于暗原色先驗(yàn)去霧的改進(jìn)算法[J];電視技術(shù);2013年23期

8 楊鐵軍;靳婷;;一種動(dòng)態(tài)整周模糊值求解算法及其仿真分析[J];系統(tǒng)工程與電子技術(shù);2007年01期

9 周秀玲;郭平;陳寶維;王靜;;幾種計(jì)算超體積算法的比較研究[J];計(jì)算機(jī)工程;2011年03期

10 吳一戎,胡東輝,彭海良;Chirp Scaling SAR成象算法及其實(shí)現(xiàn)[J];電子科學(xué)學(xué)刊;1995年03期

相關(guān)會(huì)議論文前10條

1 尹冀鋒;;一種新的圖象自適應(yīng)增強(qiáng)算法[A];四川省通信學(xué)會(huì)一九九二年學(xué)術(shù)年會(huì)論文集[C];1992年

2 寧春平;田家瑋;郭延輝;王影;張英濤;鄭桂霞;劉研;;計(jì)算機(jī)輔助增強(qiáng)、分割算法在鑒別乳腺良、惡性腫塊中的應(yīng)用價(jià)值[A];中華醫(yī)學(xué)會(huì)第十次全國(guó)超聲醫(yī)學(xué)學(xué)術(shù)會(huì)議論文匯編[C];2009年

3 謝麗聰;;SVB查詢改寫算法的改進(jìn)[A];第二十一屆中國(guó)數(shù)據(jù)庫(kù)學(xué)術(shù)會(huì)議論文集（研究報(bào)告篇）[C];2004年

4 鄭存紅;;復(fù)雜背景下相關(guān)跟蹤算法研究及DSP實(shí)現(xiàn)[A];中國(guó)光學(xué)學(xué)會(huì)2010年光學(xué)大會(huì)論文集[C];2010年

5 楊文杰;吳軍;;RFID抗沖突算法研究[A];2008通信理論與技術(shù)新進(jìn)展——第十三屆全國(guó)青年通信學(xué)術(shù)會(huì)議論文集（上）[C];2008年

6 高山;畢篤彥;魏娜;;一種基于UPF的小目標(biāo)TBD算法[A];第十四屆全國(guó)圖象圖形學(xué)學(xué)術(shù)會(huì)議論文集[C];2008年

7 周磊;張衛(wèi)華;王曉奇;張軍;;基于流水算法的智能路障機(jī)器人設(shè)計(jì)[A];2011年全國(guó)電子信息技術(shù)與應(yīng)用學(xué)術(shù)會(huì)議論文集[C];2011年

8 潘巍;李戰(zhàn)懷;陳群;索博;李衛(wèi)榜;;面向MapReduce的非對(duì)稱分片復(fù)制連接算法優(yōu)化技術(shù)研究[A];第29屆中國(guó)數(shù)據(jù)庫(kù)學(xué)術(shù)會(huì)議論文集（B輯）（NDBC2012）[C];2012年

9 李偉偉;蔡康穎;鄭新;王文成;;3D模型中重復(fù)結(jié)構(gòu)的多尺度快速檢測(cè)算法[A];第六屆和諧人機(jī)環(huán)境聯(lián)合學(xué)術(shù)會(huì)議（HHME2010)、第19屆全國(guó)多媒體學(xué)術(shù)會(huì)議（NCMT2010）、第6屆全國(guó)人機(jī)交互學(xué)術(shù)會(huì)議（CHCI2010）、第5屆全國(guó)普適計(jì)算學(xué)術(shù)會(huì)議（PCC2010）論文集[C];2010年

10 楊任爾;陳懇;勵(lì)金祥;;基于棱邊方向檢測(cè)的運(yùn)動(dòng)自適應(yīng)去隔行算法[A];Proceedings of 2010 Chinese Control and Decision Conference[C];2010年

相關(guān)重要報(bào)紙文章前10條

1 國(guó)泰君安資產(chǎn)管理部;“算法交易”是道指暴跌罪魁禍?zhǔn)?[N];上海證券報(bào);2010年

2 農(nóng)行浙江東陽(yáng)支行吳新國(guó) 周龍飛;銀行如何創(chuàng)建學(xué)習(xí)型組織[N];上海金融報(bào);2003年

3 西北師范大學(xué) 李瑾瑜;校長(zhǎng)：如何引領(lǐng)和促進(jìn)教師學(xué)習(xí)[N];中國(guó)教育報(bào);2008年

4 永壽縣店頭中學(xué) 劉俊鋒;大力提倡合作學(xué)習(xí) 全面促進(jìn)有效教學(xué)[N];咸陽(yáng)日?qǐng)?bào);2009年

5 本報(bào)評(píng)論員;要在真學(xué)習(xí)上下功夫[N];酒泉日?qǐng)?bào);2009年

6 本報(bào)記者李天然;學(xué)習(xí)應(yīng)該是一種終身行為[N];大連日?qǐng)?bào);2010年

7 劉繼芳;淺議建設(shè)學(xué)習(xí)型黨組織中的“學(xué)習(xí)”內(nèi)涵[N];伊犁日?qǐng)?bào)(漢);2010年

8 哈爾濱市第五醫(yī)院蒙碩;淺談醫(yī)院創(chuàng)建學(xué)習(xí)型黨組織[N];黑龍江日?qǐng)?bào);2010年

9 翟愛(ài)霞;淺談如何深入推進(jìn)學(xué)習(xí)型黨組織建設(shè)[N];太行日?qǐng)?bào);2011年

10 李振上海交通大學(xué)國(guó)際與公共事務(wù)學(xué)院;制度變遷中的制度學(xué)習(xí)[N];中國(guó)社會(huì)科學(xué)報(bào);2012年

相關(guān)博士學(xué)位論文前10條

1 劉成昊;在線學(xué)習(xí)算法研究與應(yīng)用[D];浙江大學(xué);2017年

2 馮輝;網(wǎng)絡(luò)化的并行與分布式優(yōu)化算法研究及應(yīng)用[D];復(fù)旦大學(xué);2013年

3 許玉杰;云計(jì)算環(huán)境下海量數(shù)據(jù)的并行聚類算法研究[D];大連海事大學(xué);2014年

4 李琰;基于貓群算法的高光譜遙感森林類型識(shí)別研究[D];東北林業(yè)大學(xué);2015年

5 陳加順;海洋環(huán)境下聚類算法的研究[D];南京航空航天大學(xué);2014年

6 王洋;基于群體智能的通信網(wǎng)絡(luò)告警關(guān)聯(lián)規(guī)則挖掘算法研究[D];太原理工大學(xué);2015年

7 雷雨;面向考試時(shí)間表問(wèn)題的啟發(fā)式進(jìn)化算法研究[D];西安電子科技大學(xué);2015年

8 熊霖;大數(shù)據(jù)下的數(shù)據(jù)選擇與學(xué)習(xí)算法研究[D];西安電子科技大學(xué);2015年

9 周雷;基于圖結(jié)構(gòu)的目標(biāo)檢測(cè)與分割算法研究[D];上海交通大學(xué);2014年

10 王冰;人工蜂群算法的改進(jìn)及相關(guān)應(yīng)用的研究[D];北京理工大學(xué);2015年

相關(guān)碩士學(xué)位論文前10條

1 姚鑫宇;EMD去噪與MUSIC算法在DOA估計(jì)中的聯(lián)合應(yīng)用[D];昆明理工大學(xué);2015年

2 陸進(jìn);面向含噪數(shù)據(jù)聚類相關(guān)算法的研究[D];復(fù)旦大學(xué);2014年

3 李家昌;基于能量約束的超聲圖像自動(dòng)分割算法[D];華南理工大學(xué);2015年

4 陳堅(jiān);基于密度和約束的數(shù)據(jù)流聚類算法研究[D];蘭州大學(xué);2015年

5 高健;基于Zynq7000平臺(tái)的去霧算法研究及實(shí)現(xiàn)[D];南京理工大學(xué);2015年

6 顧磊;基于Hadoop的聚類算法的數(shù)據(jù)優(yōu)化及其應(yīng)用研究[D];南京信息工程大學(xué);2015年

7 楊燕霞;基于Hadoop平臺(tái)的并行關(guān)聯(lián)規(guī)則挖掘算法研究[D];四川師范大學(xué);2015年

8 王羽;基于MapReduce的社區(qū)發(fā)現(xiàn)算法的設(shè)計(jì)與實(shí)現(xiàn)[D];南京理工大學(xué);2015年

9 許振佳;流式數(shù)據(jù)的并行聚類算法研究[D];曲阜師范大學(xué);2015年

10 董琴;人工蜂群算法的改進(jìn)與應(yīng)用[D];大連海事大學(xué);2015年

，

本文編號(hào)：1570396

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會(huì)員下載

Download by Member

本文鏈接：http://sikaile.net/shoufeilunwen/xxkjbs/1570396.html

上一篇：基于新型二維材料及異質(zhì)結(jié)光電探測(cè)器的研究
下一篇：基于群智能優(yōu)化的機(jī)器學(xué)習(xí)方法研究及應(yīng)用

論文發(fā)表

·知網(wǎng)|萬(wàn)方|維普|龍?jiān)磡省級(jí)|國(guó)家級(jí)|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

在線學(xué)習(xí)算法研究與應(yīng)用