天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁 > 科技論文 > 搜索引擎論文 >

基于Hadoop平臺的決策樹算法并行化研究

發(fā)布時間:2018-04-26 07:03

  本文選題:云計(jì)算 + Hadoop; 參考:《華東師范大學(xué)》2012年碩士論文


【摘要】:云計(jì)算的概念是Google首席執(zhí)行官在2006年搜索引擎大會上首次提出的,在此后的6年間,云計(jì)算這一概念得到了廣泛的傳播,Microsoft、Google、IBM等等知名公司都相繼開展了云計(jì)算的相關(guān)研究。越來越多云計(jì)算平臺的出現(xiàn)也使獲得可擴(kuò)展的、廉價的、高效的計(jì)算模式成為可能。 現(xiàn)代社會信息增長快速,預(yù)計(jì)超過1/3的數(shù)字信息將被駐留在云計(jì)算平臺中或借助云計(jì)算平臺處理,隨之而來的是社會各界對多元化數(shù)據(jù)挖掘服務(wù)的需求,基于云計(jì)算平臺進(jìn)行高效、可信的海量數(shù)據(jù)挖掘成為一個具有挑戰(zhàn)性的難題。 本文首先研究了Google、IBM、Hadoop等等云計(jì)算平臺,著重分析了Hadoop平臺的關(guān)鍵技術(shù)MapReduce編程模型和Hadoop分布式文件系統(tǒng)。然后,比較深入地研究了決策樹分類算法,分析了幾個常用的決策樹分類算法。在此基礎(chǔ)上,本文針對兩種典型的決策樹分類算法C4.5算法和SPRINT算法,提出了它們在Hadoop平臺上的改進(jìn)方法和并行化策略。實(shí)驗(yàn)結(jié)果表明,對海量數(shù)據(jù),改進(jìn)后的這兩種算法在Hadoop平臺上都具有較高的加速比,在一定程度上解決了C4.5算法和SPRINT算法在處理海量數(shù)據(jù)時計(jì)算量大、構(gòu)建決策樹時間長的問題。
[Abstract]:The concept of cloud computing was first put forward by the chief executive of Google at the 2006 search engine conference. In the following six years, the concept of cloud computing has been widely disseminated, such as Microsoft Google, IBM and other well-known companies have carried out research on cloud computing. The emergence of more and more cloud computing platforms also makes it possible to obtain scalable, inexpensive and efficient computing models. With the rapid growth of information in modern society, it is expected that more than a third of the digital information will be hosted on or processed by cloud computing platforms, followed by the demand for diversified data mining services from all walks of life. Efficient and reliable massive data mining based on cloud computing platform has become a challenging problem. In this paper, we first study the cloud computing platform such as Hadoop, IBM and Hadoop, and analyze the key technology of Hadoop platform, MapReduce programming model and Hadoop distributed file system. Then, the decision tree classification algorithm is deeply studied, and several commonly used decision tree classification algorithms are analyzed. On this basis, two typical decision tree classification algorithms, C4.5 algorithm and SPRINT algorithm, are proposed in this paper, and their improved methods and parallelization strategies on Hadoop platform are proposed. The experimental results show that the two improved algorithms have a high speedup on Hadoop platform, and to some extent solve the problem that C4.5 algorithm and SPRINT algorithm have a large amount of computation in dealing with the massive data. The problem of long time to construct decision tree.
【學(xué)位授予單位】:華東師范大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2012
【分類號】:TP311.13

【參考文獻(xiàn)】

相關(guān)期刊論文 前10條

1 孫健;賈曉菁;;Google云計(jì)算平臺的技術(shù)架構(gòu)及對其成本的影響研究[J];電信科學(xué);2010年01期

2 范冬梅;盧志茂;張汝波;潘樹q,

本文編號:1804982


資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/1804982.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶df65c***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com