基于Hadoop的海量期貨數(shù)據(jù)的分布式存儲和算法分析
發(fā)布時間:2018-01-20 04:55
本文關鍵詞: Hadoop 期貨 海量數(shù)據(jù) 存儲 數(shù)據(jù)挖掘 分布式 出處:《天津大學》2012年碩士論文 論文類型:學位論文
【摘要】:期貨交易作為一種重要的投資和保值工具,近年來得到了快速的發(fā)展,隨之而產生的數(shù)據(jù)也在日益增長,而加快對期貨數(shù)據(jù)的信息資源的整合利用的重要性也就日漸突出。我們可以通過數(shù)據(jù)挖掘和統(tǒng)計等工具從中發(fā)現(xiàn)具有重要價值的信息,傳統(tǒng)的數(shù)據(jù)挖掘模式可以做到這一點,但是隨著數(shù)據(jù)量的不斷上漲,出現(xiàn)了一些制約傳統(tǒng)數(shù)據(jù)挖掘模型的因素。首先是對海量數(shù)據(jù)的存儲問題,面對上TB,PB級的數(shù)據(jù),傳統(tǒng)的商業(yè)單機存儲已經不能滿足要求,其次在如此大規(guī)模的數(shù)據(jù)上進行數(shù)據(jù)挖掘分析,傳統(tǒng)的單機算法所消耗的時間也變得讓人難以忍受。 在本文中,我們提出一種針對期貨行業(yè)的海量數(shù)據(jù),運用商業(yè)計算機集群來實現(xiàn)數(shù)據(jù)的分布式存儲和并行數(shù)據(jù)挖掘的解決方案。這一方案的實現(xiàn)的基礎是由Doug Cutting開發(fā)的Hadoop。該框架是由java實現(xiàn)的開源分布式計算框架,其基礎為HDFS和Mapreduce,在其上所構建的分布式應用具有很強的規(guī)模性,可擴展性和容錯性。方案由總體設計和具體實現(xiàn)兩部分。首先,我們提出了一種適用于海量數(shù)據(jù)存儲和挖掘的體系結構,該結構用到了軟件體系結構中比較著名的層次結構模型,這種設計使得我們的應用具有很強的靈活性和可擴展性。另外,我們針對各層進行了簡單的實現(xiàn),這些實現(xiàn)包括:web前端,Web service控制層,數(shù)據(jù)挖掘插件,Hbase存儲四個部分,其中對于數(shù)據(jù)挖掘插件的開發(fā)我們進行了較為詳細的說明。 在實現(xiàn)方案中,首先我們在頁面上使用WebService和Ajax技術來進行參數(shù)的提交,通過這兩者我們節(jié)省了網(wǎng)絡帶寬,同時達到了消除異構性的目的。在后臺,我們通過Spring的Ioc容器來啟動服務,減小了代碼的侵入性,同時也很好地管理了服務之間的相互依賴。在數(shù)據(jù)挖掘插件的開發(fā)方面,我們實現(xiàn)了Parallel FP-Growth算法,使用了maven來進行插件的開發(fā),這使得我們的應用更加的具有可管理性和復用性。數(shù)據(jù)存儲方面我們用到了基于列的分布式數(shù)據(jù)庫Hbase,其對于海量數(shù)據(jù)的存儲有很大的優(yōu)勢。
[Abstract]:Futures trading as an important tool for investment and preservation of value, in recent years has been rapid development, and the resulting data are also increasing day by day. The importance of accelerating the integration and utilization of information resources of futures data is becoming more and more prominent. We can find important information through data mining and statistics tools. The traditional data mining model can do this, but with the increasing of data volume, there are some factors that restrict the traditional data mining model. PB level data, the traditional commercial single computer storage can not meet the requirements, and then on such a large scale of data mining analysis, the traditional single-machine algorithm consumption time has become intolerable. In this paper, we propose a huge amount of data for futures industry. A solution for distributed data storage and parallel data mining using a cluster of commercial computers. The implementation of this solution is based on Doug. Hadoop, developed by Cutting, is an open source distributed computing framework implemented by java. It is based on HDFS and Mapreduce.The distributed application built on it has strong scale, extensibility and fault-tolerance. The scheme consists of two parts: the overall design and the concrete implementation. First of all. We propose an architecture for mass data storage and mining, which uses the well-known hierarchical model in the software architecture. This design makes our application very flexible and extensible. In addition, we have a simple implementation for each layer, which includes the: Web front-end. Web service control layer, data mining plug-in Hbase storage four parts, which for the development of data mining plug-in we carried out a more detailed description. In the implementation, first of all, we use WebService and Ajax technology to submit the parameters on the page, through which we save the network bandwidth. At the same time, the purpose of eliminating isomerism is achieved. In the background, we start the service through the Ioc container of Spring, which reduces the intrusiveness of the code. At the same time, we also manage the interdependence between services. In the development of data mining plug-in, we implement the Parallel FP-Growth algorithm. We use maven for plug-in development, which makes our application more manageability and reusability. We use column-based distributed database Hbase for data storage. It has a great advantage for the storage of massive data.
【學位授予單位】:天津大學
【學位級別】:碩士
【學位授予年份】:2012
【分類號】:TP333
【引證文獻】
相關期刊論文 前1條
1 廖飛;黃晟;龔德俊;安樂;;基于Hadoop的城市道路交通流量數(shù)據(jù)分布式存儲與挖掘分析研究[J];公路與汽運;2013年05期
相關碩士學位論文 前1條
1 杜超利;時空要素驅動的事件網(wǎng)頁信息檢索方法研究[D];南京師范大學;2013年
,本文編號:1446932
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/1446932.html
最近更新
教材專著