卷積神經(jīng)網(wǎng)絡處理器的設計與實現(xiàn)

發(fā)布時間：2018-06-16 03:54

本文選題：卷積神經(jīng)網(wǎng)絡 + 自定義指令��；參考：《西安理工大學》2017年碩士論文

【摘要】：卷積神經(jīng)網(wǎng)絡(Convolutional NeuralNetwork, CNN)是一種先進的深度學習架構(gòu),被廣泛地應用于圖像識別、語音識別、自然語言識別等各個領域。卷積神經(jīng)網(wǎng)絡具有數(shù)據(jù)密集和計算密集的特點,傳統(tǒng)的CPU平臺無法充分挖掘CNN的并行性,運算耗時長,且實現(xiàn)代價較高。而專用CNN芯片具有速度和成本上的優(yōu)勢,但可配置性差,不能靈活地適應CNN不同層特征圖的數(shù)量變化。通過分析CNN算法特點及問題,在傳統(tǒng)通用ZION處理器的基礎上,通過設計專用指令并改進架構(gòu),設計了一種可以兼顧CNN并行運算能力及靈活性的新型的卷積神經(jīng)網(wǎng)絡處理器。主要研究內(nèi)容如下:1.設計專用指令。首先,對CNN算法進行操作類型統(tǒng)計和分析,發(fā)現(xiàn)卷積運算,下采樣,激活函數(shù)等操作類型出現(xiàn)頻率較高。針對此特點,設計了相應的運算功能指令,用一條功能指令完成原本需要多條指令實現(xiàn)的運算過程。其次,設計向量訪存指令,實現(xiàn)一次讀寫多條數(shù)據(jù),以減少訪存指令數(shù)量,提高訪存效率。最后,基于RISC-V32指令集及其擴展指令的規(guī)則,完成CNN專用指令系統(tǒng)的設計。2.處理器架構(gòu)設計。在本研究組設計的通用七級流水結(jié)構(gòu)ZION處理器的基礎上,設計了支持CNN專用指令的流水功能部件。針對卷積運算中同一卷積模板在輸入特征圖不同位置做卷積時的數(shù)據(jù)復用特點,設計復用結(jié)構(gòu),從而減少特征圖數(shù)據(jù)讀取次數(shù),降低訪存需求。此外,為減小訪存延遲對并行運算的影響,采用雙Buffer模式分時緩存不同特征圖的數(shù)據(jù),減少運算單元空置時間,提高并行效率。在指令和架構(gòu)設計的基礎上,采用Verilog HDL實現(xiàn)了專用指令的流水功能部件設計,完成了一個七級流水結(jié)構(gòu)的卷積神經(jīng)網(wǎng)絡處理器的整體系統(tǒng)設計,并通過功能仿真。。該CNN處理器不僅能實現(xiàn)通用算法,還對CNN算法有顯著加速效果。針對CNN算法,采用MNIST手寫數(shù)字字符庫作為樣本集,對設計的卷積神經(jīng)網(wǎng)絡處理器進行了測試。與通用ZION處理器相比,處理速度提升6.955倍,速度面積比提升3.398倍。
[Abstract]:Convolutional Neural Network (CNN) is an advanced deep learning architecture, which is widely used in image recognition, speech recognition, natural language recognition and other fields. Convolutional neural networks are data-intensive and computation-intensive. The traditional CPU platform can not fully exploit the parallelism of CNN. The computation time is long and the cost of implementation is high. The special CNN chip has the advantage of speed and cost, but it is not configurable, so it can not adapt to the change of the number of CNN layers. Based on the analysis of the characteristics and problems of CNN algorithm, a novel convolutional neural network processor is designed, which can take into account the parallel computing capability and flexibility of CNN by designing special instructions and improving the architecture on the basis of the traditional general-purpose Zion processor. The main research contents are as follows: 1. Design special instructions. Firstly, the operation type statistics and analysis of CNN algorithm show that the operation types such as convolution operation, downsampling, activation function and so on appear more frequently. In view of this characteristic, the corresponding operation function instruction is designed, and one function instruction is used to complete the operation process which is originally needed to be realized by multiple instructions. Secondly, the vector access instruction is designed to read and write multiple data at a time, so as to reduce the number of access instructions and improve the efficiency of memory access. Finally, based on the rules of RISC-V32 instruction set and its extended instruction, the design of CNN special instruction system is completed. Processor architecture design. On the basis of the general seven stage pipelined architecture Zion processor designed by our team, a pipelining function unit supporting CNN special instructions is designed. Aiming at the data reuse characteristics of the same convolution template in different places of input feature map in convolution operation, the multiplexing structure is designed, so as to reduce the number of times of reading feature graph data and reduce the demand of memory access. In addition, in order to reduce the influence of memory access delay on parallel operation, the data of different feature graphs are cached by using double buffer mode, which reduces the vacancy time of operation units and improves the parallel efficiency. On the basis of instruction and architecture design, the pipelining function part of special instruction is designed with Verilog HDL. The whole system design of a seven-stage convolutional neural network processor with pipelined structure is completed. The CNN processor can not only implement the general algorithm, but also accelerate the CNN algorithm significantly. For CNN algorithm, MNIST handwritten numeric character library is used as sample set to test the designed convolution neural network processor. Compared with the general purpose Zion processor, the processing speed is increased by 6.955 times, and the speed area ratio is increased by 3.398 times.
【學位授予單位】：西安理工大學
【學位級別】：碩士
【學位授予年份】：2017
【分類號】：TP183;TP332

【參考文獻】

相關期刊論文前3條

1 余子健;馬德;嚴曉浪;沈君成;;基于FPGA的卷積神經(jīng)網(wǎng)絡加速器[J];計算機工程;2017年01期

2 方睿;劉加賀;薛志輝;楊廣文;;卷積神經(jīng)網(wǎng)絡的FPGA并行加速方案設計[J];計算機工程與應用;2015年08期

3 武曉島;于鵬;謝學軍;;透過專利看微處理器的技術(shù)發(fā)展(三)——預譯碼技術(shù)專利引證分析[J];中國集成電路;2009年03期

相關博士學位論文前1條

1 陸志堅;基于FPGA的卷積神經(jīng)網(wǎng)絡并行結(jié)構(gòu)研究[D];哈爾濱工程大學;2013年

，

本文編號：2025178

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/jisuanjikexuelunwen/2025178.html

上一篇：面向異構(gòu)分布式計算環(huán)境的并行任務調(diào)度優(yōu)化方法
下一篇：面向數(shù)據(jù)中心的光網(wǎng)絡資源管理的方法研究

論文發(fā)表

·知網(wǎng)|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

卷積神經(jīng)網(wǎng)絡處理器的設計與實現(xiàn)