基于GPU的快速摘要生成方法
發(fā)布時(shí)間:2018-11-11 11:07
【摘要】:作為搜索引擎展示最終搜索結(jié)果的重要組成部分,基于查詢(xún)的摘要是現(xiàn)代搜索引擎最常用的方法,它可以向用戶(hù)展示結(jié)果文檔中與檢索詞關(guān)聯(lián)度最大的若干片段,這種基于查詢(xún)的摘要可以使得搜索結(jié)果對(duì)于用戶(hù)而言更直觀,更具針對(duì)性。根據(jù)查詢(xún)?cè)~來(lái)計(jì)算一篇文檔的摘要是輕量級(jí)的任務(wù),但是現(xiàn)今的搜索引擎往往要面對(duì)海量的查詢(xún)請(qǐng)求,而每個(gè)請(qǐng)求所呈現(xiàn)的結(jié)果頁(yè)面中的每個(gè)結(jié)果文檔都必須根據(jù)查詢(xún)?cè)~來(lái)生成相應(yīng)的摘要,因此基于查詢(xún)的摘要計(jì)算是現(xiàn)代搜索引擎系統(tǒng)中耗費(fèi)計(jì)算資源相當(dāng)大的一個(gè)部分。為了改進(jìn)在大負(fù)載條件下摘要生成計(jì)算的性能和經(jīng)濟(jì)性,提出了一種基于CPU-GPU(Graphic Processing Unit,,圖形處理單元)混合系統(tǒng)的高性能并行處理方法。 提出了一種適合GPU處理的摘要生成算法,這個(gè)算法采用了滑動(dòng)窗口的文檔切分方法,目的是為了避免傳統(tǒng)的截?cái)嗍轿臋n切分法所導(dǎo)致的高關(guān)聯(lián)度片段被切斷的問(wèn)題。與此同時(shí),算法還采用了一種新的量化公式來(lái)評(píng)估一個(gè)片段與查詢(xún)?cè)~的關(guān)聯(lián)度。 在對(duì)CPU-GPU混合系統(tǒng)運(yùn)行特征進(jìn)行分析的基礎(chǔ)之上,對(duì)前述的摘要生成算法進(jìn)行了改進(jìn)。將一個(gè)摘要生成任務(wù)內(nèi)部并行化的同時(shí),還實(shí)現(xiàn)了任務(wù)間的并行化,并設(shè)計(jì)了一種三段式的流水線系統(tǒng)來(lái)支持此并行化的處理方法。為了實(shí)現(xiàn)此三段式流水線系統(tǒng),設(shè)計(jì)了一種異步執(zhí)行框架JobFlow,此框架采用基于服務(wù)的編程模式,可以支持高度的模塊化和并行化的程序設(shè)計(jì)。 開(kāi)展了多項(xiàng)試驗(yàn)以?xún)?yōu)化系統(tǒng)的性能指標(biāo)并評(píng)估系統(tǒng)的性能和經(jīng)濟(jì)效能。實(shí)驗(yàn)結(jié)果顯示,與基準(zhǔn)摘要生成算法Lucene的Highlighter組件相比較,GPU流水線處理系統(tǒng)獲得了較高的加速比,同時(shí)能降低了系統(tǒng)的成本。
[Abstract]:As an important part of search engine to display final search results, query-based summary is the most commonly used method in modern search engine. This query-based summary can make search results more intuitive and targeted to users. It is a lightweight task to calculate the summary of a document according to the query words, but nowadays search engines often have to face a large number of query requests. However, each result document in the result page presented by each request must generate the corresponding summary according to the query term. Therefore, the query-based summary computing is a part of the modern search engine system that consumes a lot of computing resources. In order to improve the performance and economy of summary generation under heavy load, a high performance parallel processing method based on CPU-GPU (Graphic Processing Unit, graphics processing unit (CPU-GPU (Graphic Processing Unit,) hybrid system is proposed. A summary generation algorithm suitable for GPU processing is proposed in this paper. This algorithm uses a sliding window method to segment documents in order to avoid the problem of cutting off high correlation segments caused by the traditional truncated document segmentation method. At the same time, a new quantitative formula is used to evaluate the correlation between a segment and a query word. On the basis of analyzing the operation characteristics of CPU-GPU hybrid system, the algorithm of summary generation is improved. While a summary generation task is parallelized, the parallelization between tasks is realized, and a three-segment pipeline system is designed to support the parallelization. In order to realize this three-segment pipeline system, an asynchronous execution framework (JobFlow,) is designed. The framework adopts a service-based programming model and can support highly modular and parallel programming. Several experiments were carried out to optimize the performance index and evaluate the performance and economic performance of the system. The experimental results show that compared with the Highlighter component of the benchmark digest generation algorithm Lucene, the GPU pipeline processing system has a higher speedup ratio and can reduce the cost of the system at the same time.
【學(xué)位授予單位】:華中科技大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2012
【分類(lèi)號(hào)】:TP391.3
本文編號(hào):2324651
[Abstract]:As an important part of search engine to display final search results, query-based summary is the most commonly used method in modern search engine. This query-based summary can make search results more intuitive and targeted to users. It is a lightweight task to calculate the summary of a document according to the query words, but nowadays search engines often have to face a large number of query requests. However, each result document in the result page presented by each request must generate the corresponding summary according to the query term. Therefore, the query-based summary computing is a part of the modern search engine system that consumes a lot of computing resources. In order to improve the performance and economy of summary generation under heavy load, a high performance parallel processing method based on CPU-GPU (Graphic Processing Unit, graphics processing unit (CPU-GPU (Graphic Processing Unit,) hybrid system is proposed. A summary generation algorithm suitable for GPU processing is proposed in this paper. This algorithm uses a sliding window method to segment documents in order to avoid the problem of cutting off high correlation segments caused by the traditional truncated document segmentation method. At the same time, a new quantitative formula is used to evaluate the correlation between a segment and a query word. On the basis of analyzing the operation characteristics of CPU-GPU hybrid system, the algorithm of summary generation is improved. While a summary generation task is parallelized, the parallelization between tasks is realized, and a three-segment pipeline system is designed to support the parallelization. In order to realize this three-segment pipeline system, an asynchronous execution framework (JobFlow,) is designed. The framework adopts a service-based programming model and can support highly modular and parallel programming. Several experiments were carried out to optimize the performance index and evaluate the performance and economic performance of the system. The experimental results show that compared with the Highlighter component of the benchmark digest generation algorithm Lucene, the GPU pipeline processing system has a higher speedup ratio and can reduce the cost of the system at the same time.
【學(xué)位授予單位】:華中科技大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2012
【分類(lèi)號(hào)】:TP391.3
【參考文獻(xiàn)】
相關(guān)期刊論文 前3條
1 顏維龍,蓋杰,武港山,袁春風(fēng);面向網(wǎng)絡(luò)的全文檢索中索引文件的組織[J];計(jì)算機(jī)應(yīng)用研究;2002年11期
2 張衛(wèi);楊曉江;;基于PC機(jī)群的分布式信息檢索系統(tǒng)[J];情報(bào)雜志;2006年12期
3 許濤,吳淑燕;Google搜索引擎及其技術(shù)簡(jiǎn)介[J];現(xiàn)代圖書(shū)情報(bào)技術(shù);2003年04期
本文編號(hào):2324651
本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2324651.html
最近更新
教材專(zhuān)著