基于GPU的快速摘要生成方法
發(fā)布時(shí)間:2018-11-11 11:07
【摘要】:作為搜索引擎展示最終搜索結(jié)果的重要組成部分,基于查詢的摘要是現(xiàn)代搜索引擎最常用的方法,,它可以向用戶展示結(jié)果文檔中與檢索詞關(guān)聯(lián)度最大的若干片段,這種基于查詢的摘要可以使得搜索結(jié)果對于用戶而言更直觀,更具針對性。根據(jù)查詢詞來計(jì)算一篇文檔的摘要是輕量級的任務(wù),但是現(xiàn)今的搜索引擎往往要面對海量的查詢請求,而每個(gè)請求所呈現(xiàn)的結(jié)果頁面中的每個(gè)結(jié)果文檔都必須根據(jù)查詢詞來生成相應(yīng)的摘要,因此基于查詢的摘要計(jì)算是現(xiàn)代搜索引擎系統(tǒng)中耗費(fèi)計(jì)算資源相當(dāng)大的一個(gè)部分。為了改進(jìn)在大負(fù)載條件下摘要生成計(jì)算的性能和經(jīng)濟(jì)性,提出了一種基于CPU-GPU(Graphic Processing Unit,圖形處理單元)混合系統(tǒng)的高性能并行處理方法。 提出了一種適合GPU處理的摘要生成算法,這個(gè)算法采用了滑動(dòng)窗口的文檔切分方法,目的是為了避免傳統(tǒng)的截?cái)嗍轿臋n切分法所導(dǎo)致的高關(guān)聯(lián)度片段被切斷的問題。與此同時(shí),算法還采用了一種新的量化公式來評估一個(gè)片段與查詢詞的關(guān)聯(lián)度。 在對CPU-GPU混合系統(tǒng)運(yùn)行特征進(jìn)行分析的基礎(chǔ)之上,對前述的摘要生成算法進(jìn)行了改進(jìn)。將一個(gè)摘要生成任務(wù)內(nèi)部并行化的同時(shí),還實(shí)現(xiàn)了任務(wù)間的并行化,并設(shè)計(jì)了一種三段式的流水線系統(tǒng)來支持此并行化的處理方法。為了實(shí)現(xiàn)此三段式流水線系統(tǒng),設(shè)計(jì)了一種異步執(zhí)行框架JobFlow,此框架采用基于服務(wù)的編程模式,可以支持高度的模塊化和并行化的程序設(shè)計(jì)。 開展了多項(xiàng)試驗(yàn)以優(yōu)化系統(tǒng)的性能指標(biāo)并評估系統(tǒng)的性能和經(jīng)濟(jì)效能。實(shí)驗(yàn)結(jié)果顯示,與基準(zhǔn)摘要生成算法Lucene的Highlighter組件相比較,GPU流水線處理系統(tǒng)獲得了較高的加速比,同時(shí)能降低了系統(tǒng)的成本。
[Abstract]:As an important part of search engine to display final search results, query-based summary is the most commonly used method in modern search engine. This query-based summary can make search results more intuitive and targeted to users. It is a lightweight task to calculate the summary of a document according to the query words, but nowadays search engines often have to face a large number of query requests. However, each result document in the result page presented by each request must generate the corresponding summary according to the query term. Therefore, the query-based summary computing is a part of the modern search engine system that consumes a lot of computing resources. In order to improve the performance and economy of summary generation under heavy load, a high performance parallel processing method based on CPU-GPU (Graphic Processing Unit, graphics processing unit (CPU-GPU (Graphic Processing Unit,) hybrid system is proposed. A summary generation algorithm suitable for GPU processing is proposed in this paper. This algorithm uses a sliding window method to segment documents in order to avoid the problem of cutting off high correlation segments caused by the traditional truncated document segmentation method. At the same time, a new quantitative formula is used to evaluate the correlation between a segment and a query word. On the basis of analyzing the operation characteristics of CPU-GPU hybrid system, the algorithm of summary generation is improved. While a summary generation task is parallelized, the parallelization between tasks is realized, and a three-segment pipeline system is designed to support the parallelization. In order to realize this three-segment pipeline system, an asynchronous execution framework (JobFlow,) is designed. The framework adopts a service-based programming model and can support highly modular and parallel programming. Several experiments were carried out to optimize the performance index and evaluate the performance and economic performance of the system. The experimental results show that compared with the Highlighter component of the benchmark digest generation algorithm Lucene, the GPU pipeline processing system has a higher speedup ratio and can reduce the cost of the system at the same time.
【學(xué)位授予單位】:華中科技大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2012
【分類號】:TP391.3
本文編號:2324649
[Abstract]:As an important part of search engine to display final search results, query-based summary is the most commonly used method in modern search engine. This query-based summary can make search results more intuitive and targeted to users. It is a lightweight task to calculate the summary of a document according to the query words, but nowadays search engines often have to face a large number of query requests. However, each result document in the result page presented by each request must generate the corresponding summary according to the query term. Therefore, the query-based summary computing is a part of the modern search engine system that consumes a lot of computing resources. In order to improve the performance and economy of summary generation under heavy load, a high performance parallel processing method based on CPU-GPU (Graphic Processing Unit, graphics processing unit (CPU-GPU (Graphic Processing Unit,) hybrid system is proposed. A summary generation algorithm suitable for GPU processing is proposed in this paper. This algorithm uses a sliding window method to segment documents in order to avoid the problem of cutting off high correlation segments caused by the traditional truncated document segmentation method. At the same time, a new quantitative formula is used to evaluate the correlation between a segment and a query word. On the basis of analyzing the operation characteristics of CPU-GPU hybrid system, the algorithm of summary generation is improved. While a summary generation task is parallelized, the parallelization between tasks is realized, and a three-segment pipeline system is designed to support the parallelization. In order to realize this three-segment pipeline system, an asynchronous execution framework (JobFlow,) is designed. The framework adopts a service-based programming model and can support highly modular and parallel programming. Several experiments were carried out to optimize the performance index and evaluate the performance and economic performance of the system. The experimental results show that compared with the Highlighter component of the benchmark digest generation algorithm Lucene, the GPU pipeline processing system has a higher speedup ratio and can reduce the cost of the system at the same time.
【學(xué)位授予單位】:華中科技大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2012
【分類號】:TP391.3
【參考文獻(xiàn)】
相關(guān)期刊論文 前3條
1 顏維龍,蓋杰,武港山,袁春風(fēng);面向網(wǎng)絡(luò)的全文檢索中索引文件的組織[J];計(jì)算機(jī)應(yīng)用研究;2002年11期
2 張衛(wèi);楊曉江;;基于PC機(jī)群的分布式信息檢索系統(tǒng)[J];情報(bào)雜志;2006年12期
3 許濤,吳淑燕;Google搜索引擎及其技術(shù)簡介[J];現(xiàn)代圖書情報(bào)技術(shù);2003年04期
本文編號:2324649
本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2324649.html
最近更新
教材專著