基于Storm的在線序列極限學習機的降雨量預測研究
發(fā)布時間:2018-11-22 14:15
【摘要】:降雨量是防災減災的重要參量,很大程度反映災害發(fā)生趨勢,降雨量對農業(yè)生產、水土流式和工程應用等有著重要的影響,對一個地區(qū)的降雨量進行準確預測,可以幫助農業(yè)、水利部門提高防治旱澇災害的能力,將危害降低到最低。隨著近幾年,我國洪澇災害不斷頻發(fā),如何準確及時地利用氣象數據對降雨量預報也變得越來越重要了。大數據時代的來臨,也給氣象預報行業(yè)帶來了新的挑戰(zhàn)。氣象數據主要來自于地面觀測、氣象衛(wèi)星遙感、天氣雷達和數值預報產品。這四類數據占數據總量的90%以上,直接應用于氣象業(yè)務、天氣預報、氣候預測以及氣象服務。流數據是一組數字編碼并連續(xù)的信號。一般情況下,數據流可被視為一個隨時間延續(xù)而無限廣泛應用于網絡輿情分析、股票市場走向、衛(wèi)星定位、金融實時監(jiān)控、物聯(lián)網監(jiān)控以及實時氣象監(jiān)控等多個領域。在基于大規(guī)模氣象流數據的降雨量預測領域,還有很大的發(fā)展空間。對于傳統(tǒng)的降雨量預測,往往利用離線的氣象數據,采用機器學習的方法進行批量訓練,即所有的訓練樣本一次性學習完畢后,學習過程不再繼續(xù)。但在實際應用中,訓練樣本空間的全部樣本并不能一次得到,而往往是隨著時間順序得到。盡管采用大規(guī)模集群能夠在一定程度上緩解大量數據帶來計算能力不足的問題,但是對于新到達的數據,卻不能進行快速處理學習并及時更新學習所獲得的知識。針對氣象數據的實時計算與海量處理的問題,本文提出了一種基于Storm平臺的在線序列的極限學習機降雨量預測模型。本文的主要內容和創(chuàng)新點如下:(1)針對氣象數據的離線批量預測方法不能及時反映降雨量變化的問題,提出了一種基于在線序列極限學習機的降雨量預測模型。針對氣象數據的大規(guī)模和實時特性,對極限學習機算法進行在線序列優(yōu)化。該模型首先初始化多個在線極限學習機模型,當不斷到達新的批次的數據時,模型能夠在已有的訓練結果的基礎上繼續(xù)學習新樣本,并引入隨機梯度下降法和誤差權值調整的方式,對新的預測結果進行誤差反饋,實時更新誤差權值參數,以提升模型預測準確率。(2)針對氣象數據的海量高維特性的問題,在數據預處理階段,本文采用決策屬性之間的相關系數對氣象數據分析,利用相關系數篩選預測屬性,降低了氣象數據復雜度,提高了模型訓練效率。另外,采用Storm流式大數據處理框架結合Kafka分布式消息隊列,對大規(guī)模氣象數據進行并行訓練。實驗結果表明,算法在Storm平臺上運行,具有優(yōu)異的并行性能和預測精度。
[Abstract]:Rainfall is an important parameter for disaster prevention and mitigation, which largely reflects the trend of disaster occurrence. Rainfall has an important impact on agricultural production, soil and water flow and engineering application. Accurate prediction of rainfall in a region can help agriculture. Water conservancy departments to improve the ability to prevent drought and waterlogging disasters, the harm to the minimum. With the frequent flood and waterlogging disasters in China in recent years, how to accurately and timely use meteorological data to forecast rainfall has become more and more important. The arrival of big data era, also brought new challenge to meteorological forecast industry. Weather data are mainly derived from ground observation, meteorological satellite remote sensing, weather radar and numerical forecast products. These four types of data account for more than 90% of the total data and are directly used in meteorological operations, weather forecasting, climate prediction and meteorological services. Stream data is a set of digitally encoded and continuous signals. In general, data flow can be regarded as an infinite and extensive application in network public opinion analysis, stock market trend, satellite positioning, financial real-time monitoring, Internet of things monitoring and real-time meteorological monitoring and so on. There is still much room for development in the field of rainfall prediction based on large-scale meteorological flow data. For the traditional rainfall prediction, the off-line meteorological data are often used to carry out batch training with the method of machine learning, that is, the learning process will not continue after all the training samples have been studied at one time. However, in practical applications, all samples in the training sample space can not be obtained at one time, but often in the order of time. Although large scale cluster can alleviate the problem of insufficient computing power caused by large amount of data to a certain extent, but for the newly arrived data, it is unable to process quickly and update the knowledge acquired by learning in time. In order to solve the problem of real-time calculation and massive processing of meteorological data, this paper presents a model of rainfall prediction based on online sequence based on Storm platform for extreme learning machine. The main contents and innovations of this paper are as follows: (1) aiming at the problem that the off-line batch forecasting method of meteorological data can not reflect the change of rainfall in time, a rainfall prediction model based on on-line sequence limit learning machine is proposed. Aiming at the large-scale and real-time characteristics of meteorological data, the algorithm of extreme learning machine is optimized on line. The model initializes several online extreme learning machine models. When the data of new batches are continuously reached, the model can continue to learn new samples on the basis of the existing training results. The method of random gradient descent and the adjustment of error weight are introduced to give error feedback to the new prediction results and update the error weight parameters in real time to improve the prediction accuracy of the model. (2) aiming at the problem of the massive high dimensional characteristics of meteorological data, In the stage of data preprocessing, the correlation coefficient between the decision attributes is used to analyze the meteorological data, and the correlation coefficient is used to filter the prediction attributes, which reduces the complexity of meteorological data and improves the efficiency of model training. In addition, Storm streaming big data frame and Kafka distributed message queue are used to train large scale meteorological data in parallel. Experimental results show that the algorithm runs on Storm platform and has excellent parallel performance and prediction accuracy.
【學位授予單位】:湘潭大學
【學位級別】:碩士
【學位授予年份】:2017
【分類號】:P457.6;TP181
本文編號:2349608
[Abstract]:Rainfall is an important parameter for disaster prevention and mitigation, which largely reflects the trend of disaster occurrence. Rainfall has an important impact on agricultural production, soil and water flow and engineering application. Accurate prediction of rainfall in a region can help agriculture. Water conservancy departments to improve the ability to prevent drought and waterlogging disasters, the harm to the minimum. With the frequent flood and waterlogging disasters in China in recent years, how to accurately and timely use meteorological data to forecast rainfall has become more and more important. The arrival of big data era, also brought new challenge to meteorological forecast industry. Weather data are mainly derived from ground observation, meteorological satellite remote sensing, weather radar and numerical forecast products. These four types of data account for more than 90% of the total data and are directly used in meteorological operations, weather forecasting, climate prediction and meteorological services. Stream data is a set of digitally encoded and continuous signals. In general, data flow can be regarded as an infinite and extensive application in network public opinion analysis, stock market trend, satellite positioning, financial real-time monitoring, Internet of things monitoring and real-time meteorological monitoring and so on. There is still much room for development in the field of rainfall prediction based on large-scale meteorological flow data. For the traditional rainfall prediction, the off-line meteorological data are often used to carry out batch training with the method of machine learning, that is, the learning process will not continue after all the training samples have been studied at one time. However, in practical applications, all samples in the training sample space can not be obtained at one time, but often in the order of time. Although large scale cluster can alleviate the problem of insufficient computing power caused by large amount of data to a certain extent, but for the newly arrived data, it is unable to process quickly and update the knowledge acquired by learning in time. In order to solve the problem of real-time calculation and massive processing of meteorological data, this paper presents a model of rainfall prediction based on online sequence based on Storm platform for extreme learning machine. The main contents and innovations of this paper are as follows: (1) aiming at the problem that the off-line batch forecasting method of meteorological data can not reflect the change of rainfall in time, a rainfall prediction model based on on-line sequence limit learning machine is proposed. Aiming at the large-scale and real-time characteristics of meteorological data, the algorithm of extreme learning machine is optimized on line. The model initializes several online extreme learning machine models. When the data of new batches are continuously reached, the model can continue to learn new samples on the basis of the existing training results. The method of random gradient descent and the adjustment of error weight are introduced to give error feedback to the new prediction results and update the error weight parameters in real time to improve the prediction accuracy of the model. (2) aiming at the problem of the massive high dimensional characteristics of meteorological data, In the stage of data preprocessing, the correlation coefficient between the decision attributes is used to analyze the meteorological data, and the correlation coefficient is used to filter the prediction attributes, which reduces the complexity of meteorological data and improves the efficiency of model training. In addition, Storm streaming big data frame and Kafka distributed message queue are used to train large scale meteorological data in parallel. Experimental results show that the algorithm runs on Storm platform and has excellent parallel performance and prediction accuracy.
【學位授予單位】:湘潭大學
【學位級別】:碩士
【學位授予年份】:2017
【分類號】:P457.6;TP181
【參考文獻】
相關期刊論文 前10條
1 李志杰;李元香;王峰;何國良;匡立;;面向大數據分析的在線學習算法綜述[J];計算機研究與發(fā)展;2015年08期
2 孟小峰;慈祥;;大數據管理:概念、技術與挑戰(zhàn)[J];計算機研究與發(fā)展;2013年01期
3 姜文瑞;王玉英;郝小琪;李富鵬;;決策樹方法在氣溫預測中的應用[J];計算機應用與軟件;2012年08期
4 肖偉平;何宏;;基于遺傳算法的數據挖掘方法及應用[J];湖南科技大學學報(自然科學版);2009年03期
5 鄒文安;劉立博;王鳳;;人工神經網絡BP模型在枯季徑流量預測中的應用[J];水資源研究;2008年03期
6 樊改娥;張順利;;淺談氣象預報的作用[J];科技情報開發(fā)與經濟;2008年16期
7 石揚;張燕平;趙姝;張玲;田福生;汪小寒;;基于商空間的氣象時間序列數據挖掘研究[J];計算機工程與應用;2007年01期
8 焦飛;黃天文;何華慶;;數據挖掘技術在氣溫長期變化趨勢預測中的應用[J];廣東氣象;2006年02期
9 吳成東;許可;韓中華;裴濤;;基于粗糙集和決策樹的數據挖掘方法[J];東北大學學報;2006年05期
10 金龍,金健,姚才;A Short-Term Climate Prediction Model Based on a Modular Fuzzy Neural Network[J];Advances in Atmospheric Sciences;2005年03期
,本文編號:2349608
本文鏈接:http://sikaile.net/kejilunwen/zidonghuakongzhilunwen/2349608.html
最近更新
教材專著