基于多核并行的全文檢索動(dòng)態(tài)后繼樹(shù)模型相關(guān)算法研究
本文選題:多核處理器 + 并行編程 ; 參考:《廣西大學(xué)》2013年碩士論文
【摘要】:隨著多核處理器的快速普及,廣大的程序開(kāi)發(fā)者們面臨著極大的機(jī)遇與挑戰(zhàn)。在過(guò)去,由于缺乏并行開(kāi)發(fā)的環(huán)境,大多數(shù)的開(kāi)發(fā)者們通常只能在單核單線程的平臺(tái)上進(jìn)行程序開(kāi)發(fā),所以在大多數(shù)情況下是不會(huì)考慮程序的并發(fā)執(zhí)行問(wèn)題。但是,隨著時(shí)間的推移,可以預(yù)見(jiàn)單核處理器在不久即將淡出人們的生活,多核處理器在近年來(lái)取得極大發(fā)展和普及,大多數(shù)開(kāi)發(fā)者們都擁有了多核開(kāi)發(fā)環(huán)境,在進(jìn)行程序開(kāi)發(fā)的時(shí)候,如果再不考慮程序的并行執(zhí)行相關(guān)問(wèn)題,將不能發(fā)揮多核平臺(tái)擁有多線程并行執(zhí)行能力的特點(diǎn),不能進(jìn)一步的提高程序的運(yùn)行效率。 除此之外,伴隨著互聯(lián)網(wǎng)普及速度的進(jìn)一步加快,搜索引擎成為了互聯(lián)網(wǎng)上最受歡迎的應(yīng)用之一,而搜索引擎就是全文檢索系統(tǒng)的一個(gè)常見(jiàn)應(yīng)用。伴隨著多核平臺(tái)的普及以及硬件價(jià)格的下降,建立適用于行業(yè)特色的檢索系統(tǒng)將不再是一件遙不可及的事情。而過(guò)去對(duì)于全文檢索索引模型的研究主要是通過(guò)對(duì)于索引結(jié)構(gòu)的修改來(lái)提高索引的性能,而本文則希望通過(guò)另一種思路,通過(guò)對(duì)全文檢索系統(tǒng)中動(dòng)態(tài)后繼樹(shù)索引模型相關(guān)算法進(jìn)行基于多核平臺(tái)的并行化改進(jìn),從而提高索引生成以及檢索的效率,本文主要從事了以下工作: 1、研究多核處理器的主要特點(diǎn)以及在進(jìn)行多核編程的時(shí)候會(huì)遇到的問(wèn)題;比較了多核平臺(tái)并行運(yùn)算與傳統(tǒng)的多機(jī)分布式并行運(yùn)算的不同,以此來(lái)作為進(jìn)行算法并行化研究的指導(dǎo)。 2、分析研究全文檢索系統(tǒng)中不同的索引結(jié)構(gòu)及特點(diǎn),并與動(dòng)態(tài)后繼樹(shù)索引進(jìn)行比較,從生成速度、檢索效率以及更新的復(fù)雜度等角度出發(fā),選擇了動(dòng)態(tài)后繼樹(shù)索引進(jìn)行算法的相關(guān)研究。 3、結(jié)合多核編程的特點(diǎn),研究了動(dòng)態(tài)后繼樹(shù)索引創(chuàng)建算法的任務(wù)分配方法,在每個(gè)線程擁有各自私有內(nèi)存空間的基礎(chǔ)上,通過(guò)任務(wù)分配管理分配不同的對(duì)象,讓每個(gè)并行線程都能以最合理的工作方式工作,最大程度發(fā)揮程序的并行性能。通過(guò)在原始索引結(jié)構(gòu)創(chuàng)建算法的基礎(chǔ)上,加入OpenMP指導(dǎo)語(yǔ)句以實(shí)現(xiàn)程序的并行化 4、分析了動(dòng)態(tài)后繼樹(shù)的檢索算法的特點(diǎn),在多分詞查找算法中加入OpenMP指導(dǎo)語(yǔ)句,研究了更適合多核平臺(tái)的檢索算法。 5、通過(guò)對(duì)動(dòng)態(tài)后繼樹(shù)原始創(chuàng)建算法與修改后的并行創(chuàng)建算法以及動(dòng)態(tài)后繼樹(shù)多分詞查找原始算法與多分詞并行查找算法的實(shí)驗(yàn)結(jié)果進(jìn)行比較,證明經(jīng)過(guò)了并行化處理的算法,在多核平臺(tái)上的運(yùn)行效率有顯著提升。
[Abstract]:With the rapid popularity of multi-core processors, the vast number of program developers are facing great opportunities and challenges. In the past, because of the lack of parallel development environment, most developers can only develop programs on the platform of single core and single thread, so in most cases, the concurrent execution of programs is not considered. However, with the passage of time, it can be predicted that single-core processors will soon fade out of people's lives, and multicore processors have made great progress and popularity in recent years, and most developers have a multi-core development environment. In the process of program development, if we do not consider the parallel execution of the program, we will not be able to play the multi-core platform has the ability of multi-thread parallel execution, and can not further improve the efficiency of the program. In addition, with the further acceleration of the speed of Internet popularization, search engine has become one of the most popular applications on the Internet, and search engine is a common application of full-text retrieval system. With the popularization of multi-core platform and the decline of hardware price, it is no longer a distant thing to build a retrieval system suitable for industry characteristics. In the past, the research on index model of full-text retrieval is mainly to improve the performance of index by modifying the index structure, but this paper hopes to use another way of thinking. By improving the parallel algorithm of dynamic successor tree index model in the full-text retrieval system based on multi-core platform, this paper improves the efficiency of index generation and retrieval. The main work of this paper is as follows: 1. The main characteristics of multi-core processor and the problems encountered in the process of multi-core programming are studied, and the differences between parallel operation of multi-core platform and traditional distributed parallel computing of multi-computer are compared. This is used as a guide for the research of parallelization of algorithms. 2. The different index structures and characteristics in the full-text retrieval system are analyzed and compared with the dynamic successor tree index. From the point of view of generation speed, retrieval efficiency and update complexity, etc. The dynamic successor tree index is selected to study the algorithm. 3. According to the characteristics of multi-core programming, the task allocation method of dynamic successor tree index creation algorithm is studied. On the basis of each thread having its own private memory space, different objects are allocated through task allocation management. Each parallel thread can work in the most reasonable way to maximize the parallel performance of the program. In order to realize the program parallelization by adding OpenMP instruction on the basis of the original index structure creation algorithm. 4. The characteristics of the search algorithm of dynamic successor tree are analyzed, and the OpenMP instruction sentence is added to the search algorithm of multi-participle, and the retrieval algorithm which is more suitable for multi-core platform is studied. 5. By comparing the experimental results of the original creation algorithm of dynamic successor tree with the modified parallel creation algorithm, the original algorithm of multi-word search of dynamic successor tree and the parallel search algorithm of multi-participle, it is proved that the algorithm has been parallelized. The running efficiency on multi-core platform is improved significantly.
【學(xué)位授予單位】:廣西大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2013
【分類(lèi)號(hào)】:TP391.3
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 Herb Sutter;羅小平;;免費(fèi)午餐已經(jīng)結(jié)束——軟件歷史性地向并發(fā)靠攏[J];程序員;2006年11期
2 馬科,胡運(yùn)發(fā);一個(gè)改進(jìn)的互關(guān)聯(lián)后繼樹(shù)數(shù)據(jù)模型[J];計(jì)算機(jī)工程;2003年21期
3 申展,江寶林,唐磊,胡運(yùn)發(fā);基于互關(guān)聯(lián)后繼樹(shù)的頻繁模式挖掘研究[J];計(jì)算機(jī)工程;2004年21期
4 王政華;胡運(yùn)發(fā);;基于后繼區(qū)間的互關(guān)聯(lián)后繼樹(shù)搜索算法[J];計(jì)算機(jī)工程;2007年09期
5 龔磊;武友新;;Lucene全文檢索系統(tǒng)的研究與實(shí)現(xiàn)[J];計(jì)算機(jī)與數(shù)字工程;2010年05期
6 沈羽佳;韓松峰;刁海南;劉勇;;基于數(shù)據(jù)依賴(lài)圖的主域變量識(shí)別方法[J];計(jì)算機(jī)應(yīng)用研究;2009年01期
7 霍林;黃俊文;潘英花;王力;;大規(guī)模分布式資源搜索技術(shù)研究進(jìn)展[J];計(jì)算機(jī)應(yīng)用研究;2010年11期
8 申展,江寶林,張謐,唐磊,胡運(yùn)發(fā);互關(guān)聯(lián)后繼樹(shù)模型及其實(shí)現(xiàn)[J];計(jì)算機(jī)應(yīng)用與軟件;2005年03期
9 郁衛(wèi)江,朱根江,謝立;一個(gè)過(guò)程間數(shù)據(jù)流分析的框架[J];軟件學(xué)報(bào);1997年09期
10 陽(yáng)雪林,于勐,陳道蓄,謝立;自動(dòng)并行編譯新技術(shù)[J];軟件學(xué)報(bào);2000年09期
相關(guān)博士學(xué)位論文 前1條
1 閆昭;程序并行識(shí)別方法及應(yīng)用研究[D];吉林大學(xué);2009年
,本文編號(hào):1819177
本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/1819177.html