基于多核并行的全文檢索動態(tài)后繼樹模型相關算法研究
本文選題:多核處理器 + 并行編程; 參考:《廣西大學》2013年碩士論文
【摘要】:隨著多核處理器的快速普及,廣大的程序開發(fā)者們面臨著極大的機遇與挑戰(zhàn)。在過去,由于缺乏并行開發(fā)的環(huán)境,大多數(shù)的開發(fā)者們通常只能在單核單線程的平臺上進行程序開發(fā),所以在大多數(shù)情況下是不會考慮程序的并發(fā)執(zhí)行問題。但是,隨著時間的推移,可以預見單核處理器在不久即將淡出人們的生活,多核處理器在近年來取得極大發(fā)展和普及,大多數(shù)開發(fā)者們都擁有了多核開發(fā)環(huán)境,在進行程序開發(fā)的時候,如果再不考慮程序的并行執(zhí)行相關問題,將不能發(fā)揮多核平臺擁有多線程并行執(zhí)行能力的特點,不能進一步的提高程序的運行效率。 除此之外,伴隨著互聯(lián)網(wǎng)普及速度的進一步加快,搜索引擎成為了互聯(lián)網(wǎng)上最受歡迎的應用之一,而搜索引擎就是全文檢索系統(tǒng)的一個常見應用。伴隨著多核平臺的普及以及硬件價格的下降,建立適用于行業(yè)特色的檢索系統(tǒng)將不再是一件遙不可及的事情。而過去對于全文檢索索引模型的研究主要是通過對于索引結構的修改來提高索引的性能,而本文則希望通過另一種思路,通過對全文檢索系統(tǒng)中動態(tài)后繼樹索引模型相關算法進行基于多核平臺的并行化改進,從而提高索引生成以及檢索的效率,本文主要從事了以下工作: 1、研究多核處理器的主要特點以及在進行多核編程的時候會遇到的問題;比較了多核平臺并行運算與傳統(tǒng)的多機分布式并行運算的不同,以此來作為進行算法并行化研究的指導。 2、分析研究全文檢索系統(tǒng)中不同的索引結構及特點,并與動態(tài)后繼樹索引進行比較,從生成速度、檢索效率以及更新的復雜度等角度出發(fā),選擇了動態(tài)后繼樹索引進行算法的相關研究。 3、結合多核編程的特點,研究了動態(tài)后繼樹索引創(chuàng)建算法的任務分配方法,在每個線程擁有各自私有內存空間的基礎上,通過任務分配管理分配不同的對象,讓每個并行線程都能以最合理的工作方式工作,最大程度發(fā)揮程序的并行性能。通過在原始索引結構創(chuàng)建算法的基礎上,加入OpenMP指導語句以實現(xiàn)程序的并行化 4、分析了動態(tài)后繼樹的檢索算法的特點,在多分詞查找算法中加入OpenMP指導語句,研究了更適合多核平臺的檢索算法。 5、通過對動態(tài)后繼樹原始創(chuàng)建算法與修改后的并行創(chuàng)建算法以及動態(tài)后繼樹多分詞查找原始算法與多分詞并行查找算法的實驗結果進行比較,證明經(jīng)過了并行化處理的算法,在多核平臺上的運行效率有顯著提升。
[Abstract]:With the rapid popularity of multi-core processors, the vast number of program developers are facing great opportunities and challenges. In the past, because of the lack of parallel development environment, most developers can only develop programs on the platform of single core and single thread, so in most cases, the concurrent execution of programs is not considered. However, with the passage of time, it can be predicted that single-core processors will soon fade out of people's lives, and multicore processors have made great progress and popularity in recent years, and most developers have a multi-core development environment. In the process of program development, if we do not consider the parallel execution of the program, we will not be able to play the multi-core platform has the ability of multi-thread parallel execution, and can not further improve the efficiency of the program. In addition, with the further acceleration of the speed of Internet popularization, search engine has become one of the most popular applications on the Internet, and search engine is a common application of full-text retrieval system. With the popularization of multi-core platform and the decline of hardware price, it is no longer a distant thing to build a retrieval system suitable for industry characteristics. In the past, the research on index model of full-text retrieval is mainly to improve the performance of index by modifying the index structure, but this paper hopes to use another way of thinking. By improving the parallel algorithm of dynamic successor tree index model in the full-text retrieval system based on multi-core platform, this paper improves the efficiency of index generation and retrieval. The main work of this paper is as follows: 1. The main characteristics of multi-core processor and the problems encountered in the process of multi-core programming are studied, and the differences between parallel operation of multi-core platform and traditional distributed parallel computing of multi-computer are compared. This is used as a guide for the research of parallelization of algorithms. 2. The different index structures and characteristics in the full-text retrieval system are analyzed and compared with the dynamic successor tree index. From the point of view of generation speed, retrieval efficiency and update complexity, etc. The dynamic successor tree index is selected to study the algorithm. 3. According to the characteristics of multi-core programming, the task allocation method of dynamic successor tree index creation algorithm is studied. On the basis of each thread having its own private memory space, different objects are allocated through task allocation management. Each parallel thread can work in the most reasonable way to maximize the parallel performance of the program. In order to realize the program parallelization by adding OpenMP instruction on the basis of the original index structure creation algorithm. 4. The characteristics of the search algorithm of dynamic successor tree are analyzed, and the OpenMP instruction sentence is added to the search algorithm of multi-participle, and the retrieval algorithm which is more suitable for multi-core platform is studied. 5. By comparing the experimental results of the original creation algorithm of dynamic successor tree with the modified parallel creation algorithm, the original algorithm of multi-word search of dynamic successor tree and the parallel search algorithm of multi-participle, it is proved that the algorithm has been parallelized. The running efficiency on multi-core platform is improved significantly.
【學位授予單位】:廣西大學
【學位級別】:碩士
【學位授予年份】:2013
【分類號】:TP391.3
【參考文獻】
相關期刊論文 前10條
1 Herb Sutter;羅小平;;免費午餐已經(jīng)結束——軟件歷史性地向并發(fā)靠攏[J];程序員;2006年11期
2 馬科,胡運發(fā);一個改進的互關聯(lián)后繼樹數(shù)據(jù)模型[J];計算機工程;2003年21期
3 申展,江寶林,唐磊,胡運發(fā);基于互關聯(lián)后繼樹的頻繁模式挖掘研究[J];計算機工程;2004年21期
4 王政華;胡運發(fā);;基于后繼區(qū)間的互關聯(lián)后繼樹搜索算法[J];計算機工程;2007年09期
5 龔磊;武友新;;Lucene全文檢索系統(tǒng)的研究與實現(xiàn)[J];計算機與數(shù)字工程;2010年05期
6 沈羽佳;韓松峰;刁海南;劉勇;;基于數(shù)據(jù)依賴圖的主域變量識別方法[J];計算機應用研究;2009年01期
7 霍林;黃俊文;潘英花;王力;;大規(guī)模分布式資源搜索技術研究進展[J];計算機應用研究;2010年11期
8 申展,江寶林,張謐,唐磊,胡運發(fā);互關聯(lián)后繼樹模型及其實現(xiàn)[J];計算機應用與軟件;2005年03期
9 郁衛(wèi)江,朱根江,謝立;一個過程間數(shù)據(jù)流分析的框架[J];軟件學報;1997年09期
10 陽雪林,于勐,陳道蓄,謝立;自動并行編譯新技術[J];軟件學報;2000年09期
相關博士學位論文 前1條
1 閆昭;程序并行識別方法及應用研究[D];吉林大學;2009年
,本文編號:1819177
本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/1819177.html