卷積神經(jīng)網(wǎng)絡(luò)處理器的設(shè)計與實現(xiàn)
發(fā)布時間:2018-06-16 03:54
本文選題:卷積神經(jīng)網(wǎng)絡(luò) + 自定義指令。 參考:《西安理工大學(xué)》2017年碩士論文
【摘要】:卷積神經(jīng)網(wǎng)絡(luò)(Convolutional NeuralNetwork, CNN)是一種先進(jìn)的深度學(xué)習(xí)架構(gòu),被廣泛地應(yīng)用于圖像識別、語音識別、自然語言識別等各個領(lǐng)域。卷積神經(jīng)網(wǎng)絡(luò)具有數(shù)據(jù)密集和計算密集的特點,傳統(tǒng)的CPU平臺無法充分挖掘CNN的并行性,運算耗時長,且實現(xiàn)代價較高。而專用CNN芯片具有速度和成本上的優(yōu)勢,但可配置性差,不能靈活地適應(yīng)CNN不同層特征圖的數(shù)量變化。通過分析CNN算法特點及問題,在傳統(tǒng)通用ZION處理器的基礎(chǔ)上,通過設(shè)計專用指令并改進(jìn)架構(gòu),設(shè)計了一種可以兼顧C(jī)NN并行運算能力及靈活性的新型的卷積神經(jīng)網(wǎng)絡(luò)處理器。主要研究內(nèi)容如下:1.設(shè)計專用指令。首先,對CNN算法進(jìn)行操作類型統(tǒng)計和分析,發(fā)現(xiàn)卷積運算,下采樣,激活函數(shù)等操作類型出現(xiàn)頻率較高。針對此特點,設(shè)計了相應(yīng)的運算功能指令,用一條功能指令完成原本需要多條指令實現(xiàn)的運算過程。其次,設(shè)計向量訪存指令,實現(xiàn)一次讀寫多條數(shù)據(jù),以減少訪存指令數(shù)量,提高訪存效率。最后,基于RISC-V32指令集及其擴(kuò)展指令的規(guī)則,完成CNN專用指令系統(tǒng)的設(shè)計。2.處理器架構(gòu)設(shè)計。在本研究組設(shè)計的通用七級流水結(jié)構(gòu)ZION處理器的基礎(chǔ)上,設(shè)計了支持CNN專用指令的流水功能部件。針對卷積運算中同一卷積模板在輸入特征圖不同位置做卷積時的數(shù)據(jù)復(fù)用特點,設(shè)計復(fù)用結(jié)構(gòu),從而減少特征圖數(shù)據(jù)讀取次數(shù),降低訪存需求。此外,為減小訪存延遲對并行運算的影響,采用雙Buffer模式分時緩存不同特征圖的數(shù)據(jù),減少運算單元空置時間,提高并行效率。在指令和架構(gòu)設(shè)計的基礎(chǔ)上,采用Verilog HDL實現(xiàn)了專用指令的流水功能部件設(shè)計,完成了一個七級流水結(jié)構(gòu)的卷積神經(jīng)網(wǎng)絡(luò)處理器的整體系統(tǒng)設(shè)計,并通過功能仿真。。該CNN處理器不僅能實現(xiàn)通用算法,還對CNN算法有顯著加速效果。針對CNN算法,采用MNIST手寫數(shù)字字符庫作為樣本集,對設(shè)計的卷積神經(jīng)網(wǎng)絡(luò)處理器進(jìn)行了測試。與通用ZION處理器相比,處理速度提升6.955倍,速度面積比提升3.398倍。
[Abstract]:Convolutional Neural Network (CNN) is an advanced deep learning architecture, which is widely used in image recognition, speech recognition, natural language recognition and other fields. Convolutional neural networks are data-intensive and computation-intensive. The traditional CPU platform can not fully exploit the parallelism of CNN. The computation time is long and the cost of implementation is high. The special CNN chip has the advantage of speed and cost, but it is not configurable, so it can not adapt to the change of the number of CNN layers. Based on the analysis of the characteristics and problems of CNN algorithm, a novel convolutional neural network processor is designed, which can take into account the parallel computing capability and flexibility of CNN by designing special instructions and improving the architecture on the basis of the traditional general-purpose Zion processor. The main research contents are as follows: 1. Design special instructions. Firstly, the operation type statistics and analysis of CNN algorithm show that the operation types such as convolution operation, downsampling, activation function and so on appear more frequently. In view of this characteristic, the corresponding operation function instruction is designed, and one function instruction is used to complete the operation process which is originally needed to be realized by multiple instructions. Secondly, the vector access instruction is designed to read and write multiple data at a time, so as to reduce the number of access instructions and improve the efficiency of memory access. Finally, based on the rules of RISC-V32 instruction set and its extended instruction, the design of CNN special instruction system is completed. Processor architecture design. On the basis of the general seven stage pipelined architecture Zion processor designed by our team, a pipelining function unit supporting CNN special instructions is designed. Aiming at the data reuse characteristics of the same convolution template in different places of input feature map in convolution operation, the multiplexing structure is designed, so as to reduce the number of times of reading feature graph data and reduce the demand of memory access. In addition, in order to reduce the influence of memory access delay on parallel operation, the data of different feature graphs are cached by using double buffer mode, which reduces the vacancy time of operation units and improves the parallel efficiency. On the basis of instruction and architecture design, the pipelining function part of special instruction is designed with Verilog HDL. The whole system design of a seven-stage convolutional neural network processor with pipelined structure is completed. The CNN processor can not only implement the general algorithm, but also accelerate the CNN algorithm significantly. For CNN algorithm, MNIST handwritten numeric character library is used as sample set to test the designed convolution neural network processor. Compared with the general purpose Zion processor, the processing speed is increased by 6.955 times, and the speed area ratio is increased by 3.398 times.
【學(xué)位授予單位】:西安理工大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:TP183;TP332
【參考文獻(xiàn)】
相關(guān)期刊論文 前3條
1 余子健;馬德;嚴(yán)曉浪;沈君成;;基于FPGA的卷積神經(jīng)網(wǎng)絡(luò)加速器[J];計算機(jī)工程;2017年01期
2 方睿;劉加賀;薛志輝;楊廣文;;卷積神經(jīng)網(wǎng)絡(luò)的FPGA并行加速方案設(shè)計[J];計算機(jī)工程與應(yīng)用;2015年08期
3 武曉島;于鵬;謝學(xué)軍;;透過專利看微處理器的技術(shù)發(fā)展(三)——預(yù)譯碼技術(shù)專利引證分析[J];中國集成電路;2009年03期
相關(guān)博士學(xué)位論文 前1條
1 陸志堅;基于FPGA的卷積神經(jīng)網(wǎng)絡(luò)并行結(jié)構(gòu)研究[D];哈爾濱工程大學(xué);2013年
,本文編號:2025178
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/2025178.html
最近更新
教材專著