數(shù)據(jù)并行處理器中指令流出的協(xié)同性研究

發(fā)布時間：2019-01-04 12:03

【摘要】：盡管在過去的20年中，半導體工藝的發(fā)展和體系結(jié)構(gòu)技術(shù)的推動，使得微處理器的性能提升了有上千倍之多。然而來自應用的性能需求卻依然與處理器的實際性能之間存在著日益拉大的差距。特別是隨著半導體工藝的繼續(xù)進步變得舉步維艱，芯片功耗的負面影響逐步凸顯，，如何縮小處理器實際性能與應用需求之間的差距，成為一個艱巨而又緊迫的任務。融合了多核、SIMD(Single InstructionMultiple Data)以及VLIW(Very Long Instruction Word)技術(shù)的數(shù)據(jù)并行處理器，以其高效的數(shù)據(jù)并行性開發(fā)能力，為繼續(xù)提高處理器的性能帶來了一道曙光。然而，不可忽視的一點是，數(shù)據(jù)并行處理器在帶來希望的同時，其自身依然存在指令流出的協(xié)同性問題。本文正是針對該問題，以指令流出技術(shù)為落腳點，從兩個方面加強了數(shù)據(jù)并行處理器中指令流出的協(xié)同性：即多種指令流出方式的高效融合和通過克服性能瓶頸達到硬件資源間的協(xié)同配合。本文取得的主要研究成果如下： 1).分析推演了數(shù)據(jù)并行處理器中多核、SIMD及VLIW在關(guān)注功耗開銷前提下的高效融合模型。本文通過在Amdahl定律中加入對SIMD、VLIW技術(shù)的表征，將Amdahl定律成功應用于數(shù)據(jù)并行處理器,并給出有關(guān)多核數(shù)目、SIMD寬度和VLIW長度的設(shè)計指導。本文還將限制數(shù)據(jù)并行處理器性能的關(guān)鍵瓶頸鎖定在串行處理、分支結(jié)構(gòu)以及對同時多寬度SIMD的支持等問題上。 2).提出了用于加速串行處理應用，并提供控制處理高效配合的雙核化框架。包括三項關(guān)鍵技術(shù)：kernel級軟件流水、動態(tài)解耦耦合機制、統(tǒng)一分支和快速數(shù)據(jù)共享技術(shù)。本文通過kernel級軟件流水的方法開發(fā)出大量的串、并行應用kernel間的并行性，并通過動態(tài)解耦、耦合機制，高效的實現(xiàn)了對串、并行應用間并行性的開發(fā)，消除了串行處理類應用的瓶頸效應。此外、本文采用統(tǒng)一分支及快速數(shù)據(jù)共享技術(shù)進一步提高了雙核化框架在緊耦合狀態(tài)下的性能。 3).提出了用于克服分支結(jié)構(gòu)瓶頸效應的指令混洗機制。該機制在保持了SIMD結(jié)構(gòu)高效性的同時，兼具了MIMD結(jié)構(gòu)在處理分支問題時的靈活性，從而使得不同的SIMD lane能夠根據(jù)各自的分支結(jié)果獲取相應的指令，實現(xiàn)不同分支路徑的并行執(zhí)行。與此同時，由于在該機制中執(zhí)行相同分支路徑的SIMD lane仍然以SIMD的方式執(zhí)行，因此很好的保持了SIMD結(jié)構(gòu)本身的高效性。指令混洗機制在SIMD與MIMD結(jié)構(gòu)之間搭建了一座橋梁，極大的提升了數(shù)據(jù)并行處理器的執(zhí)行效率。 4).擴展了指令混洗機制，提出支持SIMD lane動態(tài)及靜態(tài)分組的多SIMD多數(shù)據(jù)流(MSMD)結(jié)構(gòu)。該結(jié)構(gòu)能夠在高效支持分支問題的同時，滿足應用中對同時多寬度SIMD的需求，支持多個具有不同SIMD寬度需求的應用kernel并行執(zhí)行。此外，多SIMD多數(shù)據(jù)流結(jié)構(gòu)改進了指令混洗機制中指令buffer的映射算法，進一步提升了SIMD結(jié)構(gòu)在處理分支問題時的性能。 5).將雙核化框架與多SIMD多數(shù)據(jù)流結(jié)構(gòu)有機結(jié)合，形成協(xié)同指令流出技術(shù)，實現(xiàn)對數(shù)據(jù)并行處理器中串行處理、分支以及同時多寬度SIMD問題的綜合突破與硬件資源的協(xié)同配合。本文還對該結(jié)構(gòu)在全芯片的RTL級環(huán)境中進行了設(shè)計實現(xiàn)，實現(xiàn)結(jié)果表明，協(xié)同指令流出技術(shù)能夠以合理的開銷，實現(xiàn)數(shù)據(jù)并行處理器中硬件資源的高效協(xié)同配合。數(shù)據(jù)并行處理器結(jié)構(gòu)仍然是一個熱點研究課題。許多關(guān)鍵問題還有待更加系統(tǒng)、更具有實際意義的研究。本文通過多種指令流出方式的融合模型研究，為數(shù)據(jù)并行處理器的設(shè)計提供了系統(tǒng)的指導，并針對限制數(shù)據(jù)并行處理器性能的關(guān)鍵瓶頸，提出了高效的解決辦法。驗證和評估結(jié)果表明，本文所提的解決辦法是有效的，能夠應用于未來數(shù)據(jù)并行處理器的設(shè)計和實現(xiàn)。
[Abstract]:In the past 20 years, the development of the semiconductor process and the advancement of the architecture technology have improved the performance of the microprocessor by more than a thousand times. the performance requirements from the application, however, still have an increasing gap between the actual performance of the processor. In particular, with the continuous progress of the semiconductor process, the negative effect of the chip power consumption is becoming more and more obvious, and how to reduce the gap between the actual performance and the application demand of the processor becomes a difficult and urgent task. The data-parallel processor with multi-core, SIMD (Single Instruction Multiple Data) and VLIW (Very Long Instruction Word) technology is used to develop the high-efficiency data parallelism. The non-negligible point, however, is that the data parallel processor, at the same time as it brings the hope, still has the problem of the co-existence of the instruction outflow. In this paper, aiming at this problem, the coordination of the instruction outflow in the data parallel processor is enhanced from two aspects by using the instruction outflow technology as the landing point, that is, the efficient fusion of multiple instruction outflow modes and the cooperative matching between the hardware resources by overcoming the performance bottleneck. The main research results are as follows: 1 The high-efficiency fusion mode of the multi-core, SIMD and VLIW in the data-parallel processor is analyzed. In this paper, by adding the characterization of SIMD and VLIW technology in Amdahl's law, the Amdahl's law is successfully applied to the data parallel processor, and the design of the multi-core number, the SIMD width and the length of the VLIW is given. This paper also discusses the key bottleneck of data parallel processor performance, such as serial processing, branch structure and support for simultaneous multi-width SIMD Up. 2). Put forward the dual-core for accelerating the serial processing application and providing control processing and efficient matching. The framework includes three key technologies: kernel-level software running water, dynamic decoupling coupling mechanism, unified branch and fast data co-operation In this paper, a large number of serial and parallel application kernel parallelism are developed through kernel-level software pipelining, and the development of parallelism between strings and parallel applications is realized through dynamic decoupling and coupling mechanism, and the bottle of serial processing class application is eliminated. In addition, the unified branch and fast data sharing technology is used to further improve the binuclear framework in the tight coupling state. Performance. 3). A finger for overcoming the bottleneck effect of a branch structure is proposed. the mechanism maintains the high efficiency of the SIMD structure and has the flexibility of the MIMD structure when processing the branch problems, so that the different SIMD lane can obtain the corresponding instruction according to the respective branch results to realize different branch paths, in parallel, the simd lane, which performs the same branch path in this mechanism, is still executed in a simd manner, so that the simd structure is well maintained. The instruction shuffling mechanism sets up a bridge between the SIMD and MIMD structures, which greatly improves the data parallel processor. execution of efficiency. 4). extended instruction shuffling mechanism to propose a multi-simd multi-data stream (The structure of the MSMD can meet the requirement of simultaneous multi-width SIMD in the application while supporting the branch problem efficiently, and support a plurality of applications with different SIMD width requirements. in addition, the multi-SIMD multi-stream structure improves the instruction buffer mapping algorithm in the instruction shuffling mechanism, and further improves the SIMD structure in processing the partition. the question of the branch and the combination of the dual-core framework and the multi-SIMD multi-data stream structure is organically combined to form a cooperative instruction flow-out technology to realize the comprehensive breakthrough of the serial processing, the branch and the simultaneous multi-width SIMD problem in the data parallel processor. The design and implementation of the structure in the RTL-level environment of the whole chip are also carried out in this paper. The results show that the cooperative instruction flow-out technology can realize the hardware of the data parallel processor with reasonable overhead. High-efficiency co-operation of resources and data parallel processor structure It's still a hot topic. Many of the key issues still need to be more systematic This paper studies the fusion model of the data parallel processor, provides the system guidance for the design of the data parallel processor, and the key bottleneck for limiting the performance of the data parallel processor The results of the verification and evaluation show that the solution proposed in this paper is effective and can be applied to future data
【學位授予單位】：國防科學技術(shù)大學
【學位級別】：博士
【學位授予年份】：2013
【分類號】：TP332

【參考文獻】

相關(guān)期刊論文前2條

1 陳書明;汪東;陳小文;萬江華;;一種面向多核DSP的小容量緊耦合快速共享數(shù)據(jù)池[J];計算機學報;2008年10期

2 陳書明;萬江華;魯建壯;劉仲;孫海燕;孫永節(jié);劉衡竹;劉祥遠;李振濤;徐毅;陳小文;;YHFT-QDSP:High-Performance Heterogeneous Multi-Core DSP[J];Journal of Computer Science & Technology;2010年02期

本文編號：2400263

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/jisuanjikexuelunwen/2400263.html

上一篇：基于Qt的嵌入式Linux系統(tǒng)下的掌上多媒體系統(tǒng)設(shè)計
下一篇：云文件同步系統(tǒng)關(guān)鍵技術(shù)研究與實現(xiàn)

論文發(fā)表

·知網(wǎng)|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

數(shù)據(jù)并行處理器中指令流出的協(xié)同性研究