天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

面向眾核體系結(jié)構(gòu)的寬度優(yōu)先搜索算法研究

發(fā)布時(shí)間:2018-09-15 19:42
【摘要】:寬度優(yōu)先搜索(Breadth-first Search,簡(jiǎn)稱BFS)是一種基礎(chǔ)的圖形算法,是眾多算法的核心組件,在大量領(lǐng)域中得到廣泛應(yīng)用,如網(wǎng)絡(luò)安全、醫(yī)學(xué)信息、數(shù)據(jù)挖掘、社交網(wǎng)絡(luò)、語(yǔ)義網(wǎng)等等。寬度優(yōu)先搜索算法還是一種典型的數(shù)據(jù)密集型應(yīng)用,Graph 500即使用其對(duì)超級(jí)計(jì)算機(jī)處理數(shù)據(jù)密集型應(yīng)用的能力進(jìn)行排名。近年來,得益于較低的功耗和較高的性價(jià)比,眾核體系結(jié)構(gòu)的加速器在高性能計(jì)算領(lǐng)域得到廣泛應(yīng)用。MIC(Many Integreated Core)作為最新的眾核體系結(jié)構(gòu)協(xié)處理器,相對(duì)于其它加速器來說,具有和傳統(tǒng)并行編程模型兼容的優(yōu)勢(shì)。隨著采用CPU+MIC的“天河二號(hào)”登上Top 500榜首,MIC在高性能計(jì)算領(lǐng)域得到廣泛重視。寬度優(yōu)先搜索算法的并行實(shí)現(xiàn)具有數(shù)據(jù)競(jìng)爭(zhēng)較嚴(yán)重、問題不規(guī)則和訪存局部性差等特點(diǎn),而MIC具有大量線程和寬向量處理能力且每線程的平均緩存較小,因而要充分發(fā)揮MIC硬件的優(yōu)勢(shì)高效實(shí)現(xiàn)寬度優(yōu)先搜索,將面臨線程間競(jìng)爭(zhēng)開銷較大,向量部件利用率不高,緩存利用率差等問題。本文即面向這些問題,研究利用MIC高效實(shí)現(xiàn)寬度優(yōu)先搜索算法,主要取得了如下成果:1)設(shè)計(jì)并實(shí)現(xiàn)了自上向下和自下向上相結(jié)合的混合BFS算法。該算法根據(jù)圖形的特點(diǎn),在不同的搜索層使用不同的搜索策略,能夠結(jié)合兩種搜索策略的優(yōu)勢(shì),性能分別為自上向下和自下向上策略的3.21和2.15倍。2)提出了一種面向MIC優(yōu)化的多線程BFS算法。該算法以混合BFS算法為基礎(chǔ),通過減少數(shù)據(jù)競(jìng)爭(zhēng),消除原子操作,并采用動(dòng)靜相結(jié)合的線程調(diào)度方式,能夠很好地利用MIC提供的大量線程處理能力。3)提出了一種使用寬向量部件進(jìn)一步加速BFS的方法。該方法在自上向下搜索部分采用SIMD指令同時(shí)掃描頂點(diǎn)的鄰居,在自下向上搜索部分采用SIMD指令并行查找未訪問的頂點(diǎn),能夠進(jìn)一步加速BFS算法,最高加速比達(dá)到1.85。4)設(shè)計(jì)并實(shí)現(xiàn)了CPU和MIC協(xié)同計(jì)算的異構(gòu)混合BFS算法。該算法以混合BFS算法為基礎(chǔ),在搜索層中任務(wù)較多時(shí)采用CPU和MIC協(xié)同計(jì)算,通過比例可調(diào)的任務(wù)劃分方法以及重疊計(jì)算的通信設(shè)計(jì),解決了協(xié)同計(jì)算中任務(wù)不均衡和通信開銷較大的問題,相對(duì)于CPU的加速比達(dá)到1.4倍左右。實(shí)驗(yàn)結(jié)果表明,本文在MIC中實(shí)現(xiàn)的BFS算法性能約為GPU的5.31倍;CPU+MIC異構(gòu)混合算法的最高加速比達(dá)到1.46倍。
[Abstract]:Width first search (Breadth-first Search,) is a basic graphic algorithm, which is the core component of many algorithms. It is widely used in many fields, such as network security, medical information, data mining, social network, semantic web and so on. The width-first search algorithm is also a typical data-intensive application named Graph 500 even though it is used to rank the supercomputer's ability to process data-intensive applications. In recent years, due to the low power consumption and high performance-price ratio, the multi-core accelerator has been widely used in the field of high-performance computing, as the latest multi-core architecture coprocessor, compared with other accelerators. It has the advantage of compatibility with traditional parallel programming model. With the use of CPU MIC "Tianhe 2" to the top of the Top 500 Mics in the field of high performance computing has received widespread attention. The parallel implementation of the width-first search algorithm is characterized by serious data competition, irregular problems and poor memory access locality, while MIC has a large number of threads and wide vector processing capabilities, and the average cache per thread is small. Therefore, in order to give full play to the advantages of MIC hardware and efficiently implement breadth-first search, we will face the problems of high competition overhead between threads, low utilization of vector components and poor cache utilization. In this paper, aiming at these problems, we study how to implement the breadth-first search algorithm efficiently by using MIC. The main achievements are as follows: 1) A hybrid BFS algorithm combining top-down and bottom-up is designed and implemented. According to the characteristics of graphics, the algorithm uses different search strategies in different search layers, which can combine the advantages of two search strategies. The performance is 3.21 and 2.15 times higher than that of top-down and bottom-up strategies, respectively. A multi-threaded BFS algorithm for MIC optimization is proposed. Based on the hybrid BFS algorithm, the algorithm reduces the data competition, eliminates the atomic operation, and adopts the thread scheduling method which combines dynamic and static. In this paper, we propose a method to speed up BFS further by using wide vector components, which can make good use of a large number of thread processing capabilities provided by MIC. In this method, SIMD instructions are used to scan vertex neighbors in the top-down search part, and SIMD instructions are used to parallel search the unvisited vertices in the bottom-up search part, which can further accelerate the BFS algorithm. The maximum speedup is 1.85.4). A heterogeneous hybrid BFS algorithm based on CPU and MIC is designed and implemented. The algorithm is based on the hybrid BFS algorithm. When there are more tasks in the search layer, CPU and MIC are used to work together. The method of task partitioning with adjustable scale and the communication design of overlapping computation are adopted. The problem of task imbalance and communication overhead in cooperative computing is solved, and the speedup ratio of CPU is about 1.4 times. The experimental results show that the performance of the BFS algorithm implemented in this paper is about 5.31 times of that of GPU, and the maximum speedup of the MIC heterogeneous hybrid algorithm is 1.46 times that of GPU.
【學(xué)位授予單位】:國(guó)防科學(xué)技術(shù)大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2013
【分類號(hào)】:TP332;TP301.6
,

本文編號(hào):2244312

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/2244312.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶9ae67***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com
亚洲国产四季欧美一区| 女人精品内射国产99| 狠狠亚洲丁香综合久久| 亚洲精品欧美精品一区三区| 天堂av一区一区一区| 小草少妇视频免费看视频| 日韩精品视频一二三区| 麻豆一区二区三区在线免费| 五月综合激情婷婷丁香| 国产水滴盗摄一区二区| 亚洲欧美日本国产不卡| 东京热电东京热一区二区三区| 好吊妞视频只有这里有精品| 欧美日韩国产精品黄片| 自拍偷女厕所拍偷区亚洲综合| 蜜桃传媒在线正在播放| 国产伦精品一区二区三区精品视频| 欧美日韩人妻中文一区二区| 中文字幕久久精品亚洲乱码| 国产情侣激情在线对白| 人妻熟女中文字幕在线| 色哟哟在线免费一区二区三区| 欧美精品亚洲精品日韩专区| 中文字幕av诱惑一区二区| 大香蕉网国产在线观看av| 人人爽夜夜爽夜夜爽精品视频| 神马午夜福利一区二区| 日韩精品视频一二三区| 国产成人精品视频一二区| 伊人久久青草地婷婷综合| 台湾综合熟女一区二区| 东京热电东京热一区二区三区| 91亚洲熟女少妇在线观看| 亚洲欧美国产精品一区二区| 亚洲成人免费天堂诱惑| 欧美日韩综合在线第一页| 亚洲妇女作爱一区二区三区| 婷婷伊人综合中文字幕| 精品一区二区三区不卡少妇av| 欧美日韩综合综合久久久| 黑色丝袜脚足国产一区二区|