面向GPGPU的嵌入式平臺人群計數(shù)算法的并行優(yōu)化與設(shè)計
發(fā)布時間:2018-07-31 06:02
【摘要】:目前,國內(nèi)城市人口的快速增長大大提高了公共場所人群聚集事件的發(fā)生概率。由人群聚集導(dǎo)致的踩踏、混亂等異常群體事件給人們帶來了巨大的生命財產(chǎn)損失。如何有效監(jiān)測和管理地鐵、商城和廣場等公共場所的人群動態(tài)信息,成為了當前亟待解決的實際問題。人群數(shù)量信息是異常群體事件的主要特征,若在事件發(fā)生前獲得監(jiān)控區(qū)域的人群數(shù)量信息,則可以幫助管理者及時疏導(dǎo)聚集的人群,有效避免異常群體事件的發(fā)生。近年來,GPU硬件性能的快速提高,使得利用GPU進行通用計算成為了數(shù)字圖像算法加速的一種新途徑。本文針對人群異常事件預(yù)警的需求,提出了一種面向監(jiān)控視頻的人群計數(shù)算法,并利用GPGPU通用計算技術(shù)對該算法的瓶頸模塊進行硬件加速。首先,根據(jù)廣場和通道等公共場所監(jiān)控視頻的特點,利用圖像處理中的前景提取、邊緣檢測、目標識別與跟蹤等技術(shù)設(shè)計和實現(xiàn)該人群計數(shù)算法,并對該人群計數(shù)算法的各個模塊進行耗時分析,得出算法運行瓶頸模塊為ViBe前景提取和Canny邊緣檢測。然后,利用垮平臺的OpenCL異構(gòu)開發(fā)框架分別對ViBe前景提取和Canny邊緣檢測進行并行優(yōu)化設(shè)計。在ViBe前景提取并行優(yōu)化設(shè)計時,采用了NDRange索引空間優(yōu)化和異步執(zhí)行優(yōu)化方案對其模型初始化和模型更新進行GPU硬件加速。在Canny邊緣檢測并行優(yōu)化設(shè)計時,分別利用內(nèi)存訪問優(yōu)化、分離式卷積設(shè)計、減少內(nèi)存訪問次數(shù)和有限次迭代處理等方案對其圖像高速濾波、梯度值和方向計算、非極大值抑制和雙閥值邊緣連接進行并行優(yōu)化處理。對優(yōu)化前后的ViBe算法和Canny算法進行性能測試,結(jié)果表明優(yōu)化后的算法都能在不影響處理效果的情況下,降低耗時,提高運行效率。最后,將并行優(yōu)化后的人群計數(shù)算法應(yīng)用到監(jiān)控系統(tǒng)中,并在嵌入式平臺進行實現(xiàn)和測試。通過對監(jiān)控系統(tǒng)整體功能對比和性能測試,結(jié)果表明系統(tǒng)通過OpenCL并行優(yōu)化設(shè)計后,明顯提高了算法耗時較高的瓶頸模塊的運行效率。經(jīng)過GPU硬件加速后的系統(tǒng)整體性能夠在不影響系統(tǒng)功能操作和監(jiān)控效果的情況下得到了45%到60%的提高。
[Abstract]:At present, the rapid growth of urban population in China has greatly increased the probability of crowd gathering in public places. The stampede, chaos and other abnormal crowd events caused by crowd gathering have brought huge loss of life and property to people. How to effectively monitor and manage the crowd dynamic information in public places such as subway, shopping mall and square has become a practical problem to be solved. The information of crowd quantity is the main characteristic of abnormal group events. If the information of crowd quantity in monitoring area is obtained before the event occurs, it can help managers to direct the crowd gathered in time and effectively avoid the occurrence of abnormal group events. In recent years, with the rapid improvement of the hardware performance of GPUs, general computing using GPU has become a new way to accelerate the digital image algorithm. In this paper, a crowd counting algorithm for surveillance video is proposed, and the bottleneck module of the algorithm is accelerated by using the general computing technology of GPGPU. Firstly, according to the characteristics of surveillance video in public places, such as square and passageway, the algorithm of crowd counting is designed and implemented by using the techniques of foreground extraction, edge detection, target recognition and tracking in image processing. The time-consuming analysis of each module of the algorithm shows that the bottleneck module of the algorithm is ViBe foreground extraction and Canny edge detection. Then, the ViBe foreground extraction and Canny edge detection are optimized by using the OpenCL heterogeneous development framework. NDRange index space optimization and asynchronous execution optimization scheme are used to accelerate the model initialization and model update in parallel optimization design of ViBe foreground extraction. In the parallel optimization design of Canny edge detection, the methods of memory access optimization, separation convolution design, reduction of memory access times and finite iterative processing are used to calculate the image high speed filtering, gradient value and direction calculation, respectively. Non-maximum suppression and double-threshold edge connection are processed by parallel optimization. The performance tests of the ViBe algorithm and the Canny algorithm before and after the optimization show that the optimized algorithm can reduce the time consuming and improve the running efficiency without affecting the processing effect. Finally, the parallel optimized crowd counting algorithm is applied to the monitoring system, and implemented and tested on the embedded platform. Through the comparison of the whole function of the monitoring system and the performance test, the results show that the system can obviously improve the running efficiency of the bottleneck module, which is time-consuming and time-consuming, after the system is designed in parallel with OpenCL. After GPU hardware acceleration, the system integrity can be improved by 45% to 60% without affecting the system function operation and monitoring effect.
【學(xué)位授予單位】:電子科技大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:X924;TP391.41
本文編號:2154652
[Abstract]:At present, the rapid growth of urban population in China has greatly increased the probability of crowd gathering in public places. The stampede, chaos and other abnormal crowd events caused by crowd gathering have brought huge loss of life and property to people. How to effectively monitor and manage the crowd dynamic information in public places such as subway, shopping mall and square has become a practical problem to be solved. The information of crowd quantity is the main characteristic of abnormal group events. If the information of crowd quantity in monitoring area is obtained before the event occurs, it can help managers to direct the crowd gathered in time and effectively avoid the occurrence of abnormal group events. In recent years, with the rapid improvement of the hardware performance of GPUs, general computing using GPU has become a new way to accelerate the digital image algorithm. In this paper, a crowd counting algorithm for surveillance video is proposed, and the bottleneck module of the algorithm is accelerated by using the general computing technology of GPGPU. Firstly, according to the characteristics of surveillance video in public places, such as square and passageway, the algorithm of crowd counting is designed and implemented by using the techniques of foreground extraction, edge detection, target recognition and tracking in image processing. The time-consuming analysis of each module of the algorithm shows that the bottleneck module of the algorithm is ViBe foreground extraction and Canny edge detection. Then, the ViBe foreground extraction and Canny edge detection are optimized by using the OpenCL heterogeneous development framework. NDRange index space optimization and asynchronous execution optimization scheme are used to accelerate the model initialization and model update in parallel optimization design of ViBe foreground extraction. In the parallel optimization design of Canny edge detection, the methods of memory access optimization, separation convolution design, reduction of memory access times and finite iterative processing are used to calculate the image high speed filtering, gradient value and direction calculation, respectively. Non-maximum suppression and double-threshold edge connection are processed by parallel optimization. The performance tests of the ViBe algorithm and the Canny algorithm before and after the optimization show that the optimized algorithm can reduce the time consuming and improve the running efficiency without affecting the processing effect. Finally, the parallel optimized crowd counting algorithm is applied to the monitoring system, and implemented and tested on the embedded platform. Through the comparison of the whole function of the monitoring system and the performance test, the results show that the system can obviously improve the running efficiency of the bottleneck module, which is time-consuming and time-consuming, after the system is designed in parallel with OpenCL. After GPU hardware acceleration, the system integrity can be improved by 45% to 60% without affecting the system function operation and monitoring effect.
【學(xué)位授予單位】:電子科技大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:X924;TP391.41
【參考文獻】
相關(guān)期刊論文 前1條
1 周治平;許伶俐;李文慧;;特征回歸與檢測結(jié)合的人數(shù)統(tǒng)計方法[J];計算機輔助設(shè)計與圖形學(xué)學(xué)報;2015年03期
相關(guān)碩士學(xué)位論文 前1條
1 俞嫣琰;視頻摘要算法研發(fā)及GPU優(yōu)化[D];浙江大學(xué);2016年
,本文編號:2154652
本文鏈接:http://sikaile.net/kejilunwen/anquangongcheng/2154652.html
最近更新
教材專著