當(dāng)前位置：主頁(yè) > 科技論文 > 計(jì)算機(jī)論文 >

面向眾核體系結(jié)構(gòu)的寬度優(yōu)先搜索算法研究

發(fā)布時(shí)間：2018-09-15 19:42

【摘要】：寬度優(yōu)先搜索(Breadth-first Search,簡(jiǎn)稱BFS)是一種基礎(chǔ)的圖形算法,是眾多算法的核心組件,在大量領(lǐng)域中得到廣泛應(yīng)用,如網(wǎng)絡(luò)安全、醫(yī)學(xué)信息、數(shù)據(jù)挖掘、社交網(wǎng)絡(luò)、語(yǔ)義網(wǎng)等等。寬度優(yōu)先搜索算法還是一種典型的數(shù)據(jù)密集型應(yīng)用,Graph 500即使用其對(duì)超級(jí)計(jì)算機(jī)處理數(shù)據(jù)密集型應(yīng)用的能力進(jìn)行排名。近年來,得益于較低的功耗和較高的性價(jià)比,眾核體系結(jié)構(gòu)的加速器在高性能計(jì)算領(lǐng)域得到廣泛應(yīng)用。MIC(Many Integreated Core)作為最新的眾核體系結(jié)構(gòu)協(xié)處理器,相對(duì)于其它加速器來說,具有和傳統(tǒng)并行編程模型兼容的優(yōu)勢(shì)。隨著采用CPU+MIC的“天河二號(hào)”登上Top 500榜首,MIC在高性能計(jì)算領(lǐng)域得到廣泛重視。寬度優(yōu)先搜索算法的并行實(shí)現(xiàn)具有數(shù)據(jù)競(jìng)爭(zhēng)較嚴(yán)重、問題不規(guī)則和訪存局部性差等特點(diǎn),而MIC具有大量線程和寬向量處理能力且每線程的平均緩存較小,因而要充分發(fā)揮MIC硬件的優(yōu)勢(shì)高效實(shí)現(xiàn)寬度優(yōu)先搜索,將面臨線程間競(jìng)爭(zhēng)開銷較大,向量部件利用率不高,緩存利用率差等問題。本文即面向這些問題,研究利用MIC高效實(shí)現(xiàn)寬度優(yōu)先搜索算法,主要取得了如下成果:1)設(shè)計(jì)并實(shí)現(xiàn)了自上向下和自下向上相結(jié)合的混合BFS算法。該算法根據(jù)圖形的特點(diǎn),在不同的搜索層使用不同的搜索策略,能夠結(jié)合兩種搜索策略的優(yōu)勢(shì),性能分別為自上向下和自下向上策略的3.21和2.15倍。2)提出了一種面向MIC優(yōu)化的多線程BFS算法。該算法以混合BFS算法為基礎(chǔ),通過減少數(shù)據(jù)競(jìng)爭(zhēng),消除原子操作,并采用動(dòng)靜相結(jié)合的線程調(diào)度方式,能夠很好地利用MIC提供的大量線程處理能力。3)提出了一種使用寬向量部件進(jìn)一步加速BFS的方法。該方法在自上向下搜索部分采用SIMD指令同時(shí)掃描頂點(diǎn)的鄰居,在自下向上搜索部分采用SIMD指令并行查找未訪問的頂點(diǎn),能夠進(jìn)一步加速BFS算法,最高加速比達(dá)到1.85。4)設(shè)計(jì)并實(shí)現(xiàn)了CPU和MIC協(xié)同計(jì)算的異構(gòu)混合BFS算法。該算法以混合BFS算法為基礎(chǔ),在搜索層中任務(wù)較多時(shí)采用CPU和MIC協(xié)同計(jì)算,通過比例可調(diào)的任務(wù)劃分方法以及重疊計(jì)算的通信設(shè)計(jì),解決了協(xié)同計(jì)算中任務(wù)不均衡和通信開銷較大的問題,相對(duì)于CPU的加速比達(dá)到1.4倍左右。實(shí)驗(yàn)結(jié)果表明,本文在MIC中實(shí)現(xiàn)的BFS算法性能約為GPU的5.31倍;CPU+MIC異構(gòu)混合算法的最高加速比達(dá)到1.46倍。
[Abstract]:Width first search (Breadth-first Search,) is a basic graphic algorithm, which is the core component of many algorithms. It is widely used in many fields, such as network security, medical information, data mining, social network, semantic web and so on. The width-first search algorithm is also a typical data-intensive application named Graph 500 even though it is used to rank the supercomputer's ability to process data-intensive applications. In recent years, due to the low power consumption and high performance-price ratio, the multi-core accelerator has been widely used in the field of high-performance computing, as the latest multi-core architecture coprocessor, compared with other accelerators. It has the advantage of compatibility with traditional parallel programming model. With the use of CPU MIC "Tianhe 2" to the top of the Top 500 Mics in the field of high performance computing has received widespread attention. The parallel implementation of the width-first search algorithm is characterized by serious data competition, irregular problems and poor memory access locality, while MIC has a large number of threads and wide vector processing capabilities, and the average cache per thread is small. Therefore, in order to give full play to the advantages of MIC hardware and efficiently implement breadth-first search, we will face the problems of high competition overhead between threads, low utilization of vector components and poor cache utilization. In this paper, aiming at these problems, we study how to implement the breadth-first search algorithm efficiently by using MIC. The main achievements are as follows: 1) A hybrid BFS algorithm combining top-down and bottom-up is designed and implemented. According to the characteristics of graphics, the algorithm uses different search strategies in different search layers, which can combine the advantages of two search strategies. The performance is 3.21 and 2.15 times higher than that of top-down and bottom-up strategies, respectively. A multi-threaded BFS algorithm for MIC optimization is proposed. Based on the hybrid BFS algorithm, the algorithm reduces the data competition, eliminates the atomic operation, and adopts the thread scheduling method which combines dynamic and static. In this paper, we propose a method to speed up BFS further by using wide vector components, which can make good use of a large number of thread processing capabilities provided by MIC. In this method, SIMD instructions are used to scan vertex neighbors in the top-down search part, and SIMD instructions are used to parallel search the unvisited vertices in the bottom-up search part, which can further accelerate the BFS algorithm. The maximum speedup is 1.85.4). A heterogeneous hybrid BFS algorithm based on CPU and MIC is designed and implemented. The algorithm is based on the hybrid BFS algorithm. When there are more tasks in the search layer, CPU and MIC are used to work together. The method of task partitioning with adjustable scale and the communication design of overlapping computation are adopted. The problem of task imbalance and communication overhead in cooperative computing is solved, and the speedup ratio of CPU is about 1.4 times. The experimental results show that the performance of the BFS algorithm implemented in this paper is about 5.31 times of that of GPU, and the maximum speedup of the MIC heterogeneous hybrid algorithm is 1.46 times that of GPU.
【學(xué)位授予單位】：國(guó)防科學(xué)技術(shù)大學(xué)
【學(xué)位級(jí)別】：碩士
【學(xué)位授予年份】：2013
【分類號(hào)】：TP332;TP301.6
，

本文編號(hào)：2244312

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會(huì)員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/jisuanjikexuelunwen/2244312.html

上一篇：基于SPARC v8體系結(jié)構(gòu)的仿真平臺(tái)的研究與設(shè)計(jì)
下一篇：職業(yè)院校《計(jì)算機(jī)應(yīng)用基

論文發(fā)表

·知網(wǎng)|萬(wàn)方|維普|龍?jiān)磡省級(jí)|國(guó)家級(jí)|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

面向眾核體系結(jié)構(gòu)的寬度優(yōu)先搜索算法研究