基于GPU的并行排序學習算法研究
發(fā)布時間:2018-03-26 11:44
本文選題:排序學習 切入點:有序對 出處:《哈爾濱工業(yè)大學》2012年碩士論文
【摘要】:搜索引擎的出現(xiàn)幫助用戶在紛繁雜亂的互聯(lián)網信息中尋找相關的信息,因此檢索結果的排序是至關重要的。作為新興信息檢索技術的排序學習算法是解決互聯(lián)網信息檢索問題的新的解決方案。傳統(tǒng)排序學習算法是在小規(guī)模文本基礎上進行,針對互聯(lián)網信息總量規(guī)?焖僭黾訂栴},大規(guī)模數(shù)據對于傳統(tǒng)的排序學習算法出現(xiàn)瓶頸,排序學習算法的性能成為未來排序學習研究的一個方向。因此,本文提出新的排序學習算法并結合圖形處理器(Graphic Processing Unit,GPU)并行計算技術并驗證算法的效果。本文主要研究內容如下: (1)對排序學習算法的相關理論和GPU并行計算進行歸納和闡述,總結現(xiàn)有的排序學習算法,闡述排序學習算法的評價度量準則和并行編程模型。 (2)深入分析信息檢索技術特點并結合相關度更高的信息更重要的特點,本文采用基于有序對的排序學習算法研究方向。對數(shù)據輸入空間進行重新劃分,以大于偏序關系的文檔對作為輸入空間。 (3)提出一種基于貝葉斯個性化排名框架的排序學習算法,即線性評分排序學習模型(Linear Scoring Learning to Rank Model,,LSLRM)。通過估計輸入文檔對的正確排序而構建的排序學習模型來解決查詢排序問題,將排序學習訓練模型問題轉換為二值分類問題,并對特征進行分析,找出對相關度區(qū)分具有決定性的重要特征。 (4)算法結合GPU并行編程模型和存儲器模型等特點,解決排序學習算法在大規(guī)模數(shù)據的性能瓶頸。 (5)實驗證明基于GPU的并行排序學習算法的優(yōu)越性。將本文算法與RankSVM-Struct等算法在微軟亞洲研究院發(fā)布的基準實驗結果進行對比分析,得到的結論是本文算法總體優(yōu)于其他算法,在大規(guī)模數(shù)據集上相對于CPU取得10-11倍的加速比。
[Abstract]:The emergence of search engines has helped users find relevant information in a sprawling web of information. As a new information retrieval technology, sort learning algorithm is a new solution to Internet information retrieval problem. Traditional sorting learning algorithm is based on small text. In view of the problem of the rapid increase of the total amount of information on the Internet, there is a bottleneck in large-scale data for the traditional sort learning algorithm, and the performance of the sorting learning algorithm becomes a direction of future ranking learning research. In this paper, a new sort learning algorithm is proposed and the parallel computing technology of graphic Processing Unit GPU is combined to verify the effectiveness of the algorithm. The main contents of this paper are as follows:. The main contents of this paper are as follows: 1) summarize and expound the related theory of sorting learning algorithm and GPU parallel computing, summarize the existing sort learning algorithm, and expound the evaluation metric and parallel programming model of sort learning algorithm. 2) deeply analyzing the characteristics of information retrieval technology and combining with the more important characteristics of more relevant information, this paper adopts the research direction of ranking learning algorithm based on ordered pair, and redivides the data input space. Use document pairs larger than partial order as input space. This paper proposes a ranking learning algorithm based on Bayesian personalized ranking framework, that is, linear Scoring Learning to Rank ranking model LSLRM.A sort learning model is constructed by estimating the correct sort of input document pairs to solve the query scheduling problem. The problem of ranking learning training model is transformed into a binary classification problem, and the features are analyzed to find out the important features that are decisive to the classification of relevance. The algorithm combines the characteristics of GPU parallel programming model and memory model to solve the performance bottleneck of sorting learning algorithm in large-scale data. The experiment proves the superiority of parallel sorting learning algorithm based on GPU. By comparing the results of benchmark experiment published by RankSVM-Struct and this algorithm in Microsoft Asia Research Institute, the conclusion is that the algorithm in this paper is superior to other algorithms in general. An acceleration ratio of 10 to 11 times that of CPU is obtained on large data sets.
【學位授予單位】:哈爾濱工業(yè)大學
【學位級別】:碩士
【學位授予年份】:2012
【分類號】:TP391.3
【參考文獻】
相關期刊論文 前3條
1 張朝暉;劉俊起;徐勤建;;GPU并行計算技術分析與應用[J];信息技術;2009年11期
2 花貴春;張敏;劉奕群;馬少平;茹立云;;基于查詢聚類的排序學習算法[J];模式識別與人工智能;2012年01期
3 孫鶴立;黃健斌;馮博琴;趙志勤;劉均;鄭慶華;;查詢依賴的有序多超平面排序學習模型[J];軟件學報;2011年11期
本文編號:1667757
本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/1667757.html
教材專著