基于CPU-GPU的條件隨機場并行化研究

發(fā)布時間：2018-04-14 12:37

本文選題：條件隨機場 + 協(xié)作并行模型��；參考：《華中科技大學(xué)》2013年碩士論文

【摘要】：條件隨機場作為一種機器學(xué)習(xí)算法，廣泛應(yīng)用于詞性標注、信息抽取、圖像分割等領(lǐng)域。實際應(yīng)用中，條件隨機場能靈活地生成大量描述性特征來增強學(xué)習(xí)效果，但特征維度會輕易突破百萬級別，使訓(xùn)練過程中需要消耗大量的時間來計算梯度和似然函數(shù)，從而在一定程度上限制了條件隨機場解決實際問題的能力。現(xiàn)有的解決方案分別利用多核CPU或GPU對條件隨機場并行化，然而受限于CPU或GPU本身架構(gòu)特性，加速效果并不讓人滿意。針對上述問題，提出了一種利用CPU-GPU協(xié)同加速條件隨機場模型的新方法。在CPU-GPU協(xié)作并行架構(gòu)中，利用CPU處理內(nèi)存復(fù)雜度高和分支判斷較多的模型訓(xùn)練優(yōu)化過程，，解決了傳統(tǒng)GPU并行中因處理器內(nèi)存有限和弱分支判斷處理能力而導(dǎo)致的整體并行效果低的問題。在GPU并行化方面，針對密集型計算過程的不同特性，提出兩級并行方法以獲得最大化并行效果：狀態(tài)矩陣中所有元素在獨立計算層Node Level并行；而狀態(tài)矩陣中所有可能路徑的計算則基于序列依賴計算層Sentence Level并行。此外，針對GPU的內(nèi)存訪問特性，優(yōu)化了數(shù)據(jù)的內(nèi)存布局，減少不必要的內(nèi)存數(shù)據(jù)通信，提高了并行效率。實驗結(jié)果表明，基于CPU-GPU協(xié)同并行條件隨機場的方法，相比于CPU的單線程處理方式，在模型準確率持平的情況下，訓(xùn)練過程加速可達到10倍以上，預(yù)測過程可達到15倍以上的加速；另一方面，相比于只使用GPU并行的方法，協(xié)作并行加速性能有效提升50%。
[Abstract]:As a machine learning algorithm, conditional random field is widely used in the fields of part of speech tagging, information extraction, image segmentation and so on.In practical application, conditional random fields can flexibly generate a large number of descriptive features to enhance the learning effect, but the feature dimension can easily break through millions of levels, which makes it take a lot of time to calculate the gradient and likelihood function in the process of training.To a certain extent, it limits the ability of conditional random field to solve practical problems.The existing solutions use multicore CPU or GPU to parallelize conditional random fields, but due to the architectural characteristics of CPU or GPU, the acceleration effect is not satisfactory.In order to solve the above problems, a new method using CPU-GPU cooperative acceleration condition random field model is proposed.In the CPU-GPU collaborative parallel architecture, the model training optimization process with high memory complexity and more branch judgment is processed by CPU.The problem of low overall parallel effect caused by limited memory and weak branch judgment processing ability in traditional GPU parallelism is solved.In the aspect of GPU parallelization, according to the different characteristics of intensive computing process, a two-level parallel method is proposed to maximize the parallel effect: all the elements in the state matrix are parallel in the independent computing layer Node Level;The computation of all possible paths in the state matrix is based on Sentence Level parallelism.In addition, according to the memory access characteristics of GPU, the memory layout of data is optimized, the unnecessary memory data communication is reduced, and the parallel efficiency is improved.The experimental results show that, compared with the single-thread processing method of CPU, the training process can be accelerated more than 10 times with the same accuracy of the model based on the CPU-GPU co-parallel conditional random field method.The prediction process can be accelerated by more than 15 times. On the other hand, compared with only using GPU parallelism, the performance of collaborative parallel acceleration can be effectively improved by 50%.
【學(xué)位授予單位】：華中科技大學(xué)
【學(xué)位級別】：碩士
【學(xué)位授予年份】：2013
【分類號】：TP181;TP332

【參考文獻】

相關(guān)期刊論文前1條

1 王相海;陳明瑩;方玲玲;;概率圖模型及其圖像與視頻應(yīng)用研究[J];中國圖象圖形學(xué)報;2009年09期

本文編號：1749367

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/jisuanjikexuelunwen/1749367.html

上一篇：容錯處理器陣列的快速重構(gòu)算法研究
下一篇：數(shù)字博物館海量數(shù)據(jù)的分布式存儲關(guān)鍵技術(shù)研究與實現(xiàn)

論文發(fā)表

·知網(wǎng)|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于CPU-GPU的條件隨機場并行化研究