一種基于MPI和MapReduce的分布式向量計算框架的研究與實現(xiàn)
發(fā)布時間:2018-03-01 04:41
本文關鍵詞: 分布式計算框架 機器學習 向量MPI MapReduce 出處:《浙江大學》2013年碩士論文 論文類型:學位論文
【摘要】:機器學習是近20年來興起的多領域交叉學科,涉及多門學科,諸如概率論、統(tǒng)計學、逼近論、凸分析等等。機器學習算法目前已經(jīng)有了廣泛的應用,例如數(shù)據(jù)挖掘、自然語言處理、搜索引擎等等。當前各種機器學習算法已經(jīng)有開源的單機實現(xiàn),但是隨著互聯(lián)網(wǎng)的高速發(fā)展,用戶數(shù)據(jù)量急劇增加,單機實現(xiàn)已經(jīng)不能滿足工業(yè)界的需求,為了滿足算法的高性能實現(xiàn),開發(fā)人員需要利用MPI, Hadoop/MapReduce等計算框架開發(fā)并行程序。 MPI效率高,編程靈活,擴展性好,適合高性能計算,然而也存在一些缺點:MPI接口眾多,學習成本高;當前使用MPI實現(xiàn)高性能程序時,往往需要考慮數(shù)據(jù)切分、網(wǎng)絡通信等問題,缺少類似MapReduce的計算模型,增加了程序員的負擔;算法實現(xiàn)專有化不利用代碼復用,缺少統(tǒng)一抽象的分布式數(shù)據(jù)結構;程序容錯性較差。 針對以上缺點,本論文綜述了MPI容錯方案和MapReduce的應用與改進,結合抽象向量接口設計,提出了一種MPI下基于向量和MapReduce的分布式計算框架。該框架將機器學習算法中的矩陣操作抽象成為分布式向量的操作,同時結合異步收發(fā)提高網(wǎng)絡傳輸效率,盡可能重疊CPU計算和網(wǎng)絡收發(fā)。在此基礎之上,引入checkpoint機制,增加多輪迭代算法的在MPI環(huán)境中的容錯性。 為了驗證程序的效率和正確性,選擇了PageRank算法進行對比實驗。實驗證明,本論文提出框架適合并且能有有效解決符合MapReduce模型的機器學習算法的分布式實現(xiàn)問題。
[Abstract]:Machine learning is a multi-field interdisciplinary subject that has emerged in recent 20 years, involving many subjects, such as probability theory, statistics, approximation theory, convex analysis, etc. Machine learning algorithms have been widely used, such as data mining. Natural language processing, search engine and so on. At present, all kinds of machine learning algorithms have been implemented on an open source single machine, but with the rapid development of the Internet, the amount of user data has increased dramatically, and the single machine implementation has not been able to meet the needs of the industry. In order to achieve the high performance of the algorithm, developers need to use MPI, Hadoop/MapReduce and other computing frameworks to develop parallel programs. MPI has high efficiency, flexible programming, good expansibility and is suitable for high performance computing. However, it also has some disadvantages, such as: MPI interface is numerous and learning cost is high. When using MPI to implement high performance program, we often need to consider data segmentation, network communication and so on. The lack of a computing model similar to MapReduce increases the burden on programmers; the proprietary implementation of the algorithm does not use code reuse and lacks a unified abstract distributed data structure; and the fault tolerance of programs is poor. In view of the above shortcomings, this paper summarizes the application and improvement of MPI fault-tolerant scheme and MapReduce, combined with the design of abstract vector interface. This paper presents a distributed computing framework based on vector and MapReduce in MPI, which abstracts the matrix operation in machine learning algorithm into the operation of distributed vector, and improves the transmission efficiency of network by combining asynchronous transceiver and transceiver. The CPU computing and network transceiver are overlapped as much as possible. On this basis, the checkpoint mechanism is introduced to increase the fault-tolerance of multi-round iterative algorithms in the MPI environment. In order to verify the efficiency and correctness of the program, the PageRank algorithm is chosen to carry out a comparative experiment. The experimental results show that the proposed framework is suitable for and can effectively solve the distributed implementation problem of machine learning algorithm in accordance with the MapReduce model.
【學位授予單位】:浙江大學
【學位級別】:碩士
【學位授予年份】:2013
【分類號】:TP181
【參考文獻】
相關碩士學位論文 前1條
1 牛海波;基于MPI的并行容錯技術研究與實現(xiàn)[D];國防科學技術大學;2011年
,本文編號:1550450
本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/1550450.html
最近更新
教材專著