油田海量數(shù)據(jù)挖掘技術(shù)研究及應(yīng)用
發(fā)布時(shí)間:2018-05-12 11:19
本文選題:數(shù)據(jù)挖掘 + 屬性約簡 ; 參考:《東北石油大學(xué)》2017年碩士論文
【摘要】:近年來,數(shù)據(jù)挖掘技術(shù)在多個(gè)領(lǐng)域取得廣泛的應(yīng)用。它在處理知識發(fā)現(xiàn)、海量數(shù)據(jù)分析方面擁有其他技術(shù)無可比擬的優(yōu)勢。油田積累了海量的生產(chǎn)數(shù)據(jù),在這些數(shù)據(jù)中存在著一些隱含的規(guī)律,由于人工分析數(shù)據(jù)的能力有限,所以很難發(fā)現(xiàn)它們,而數(shù)據(jù)挖掘技術(shù)恰好可以彌補(bǔ)這個(gè)不足。本文嘗試應(yīng)用數(shù)據(jù)挖掘技術(shù)來對油田產(chǎn)量進(jìn)行分析和預(yù)測。本文首先確定數(shù)據(jù)挖掘技術(shù)在油田產(chǎn)量預(yù)測中應(yīng)用的技術(shù)路線,對數(shù)據(jù)挖掘技術(shù)中與數(shù)據(jù)預(yù)處理、數(shù)據(jù)分類和數(shù)據(jù)預(yù)測相關(guān)的算法進(jìn)行研究,主要內(nèi)容有:1、對粗糙集理論中的生產(chǎn)數(shù)據(jù)屬性約簡算法進(jìn)行優(yōu)化。應(yīng)用屬性的依賴度和重要性來描述屬性的權(quán)重,并以此作為粒子群算法初始群體的選取標(biāo)準(zhǔn),縮小解空間的搜索范圍,最后引入細(xì)菌覓食算法的遷移和趨向性操作來完成算法的局部搜索功能,提高屬性約簡過程中求取最優(yōu)約簡結(jié)果的尋優(yōu)能力,從而得到最優(yōu)的生產(chǎn)屬性約簡結(jié)果;2、利用數(shù)據(jù)庫管理系統(tǒng)和基于C#的嵌入式SQL,直接在生產(chǎn)數(shù)據(jù)庫中對生產(chǎn)數(shù)據(jù)進(jìn)行查詢操作,彌補(bǔ)C4.5算法不能對海量數(shù)據(jù)進(jìn)行分類的不足,同時(shí),利用Fayyad邊界點(diǎn)判定定理,解決C4.5算法選取最優(yōu)閾值比較耗時(shí)的問題,提高C4.5算法的執(zhí)行效率,當(dāng)生產(chǎn)數(shù)據(jù)庫中的樣本增多時(shí)不會影響算法的執(zhí)行效率和分類準(zhǔn)確性,使其具有更好的適應(yīng)能力;3、應(yīng)用組合預(yù)測方法,對油田產(chǎn)量這個(gè)受諸多因素影響的復(fù)雜變量進(jìn)行預(yù)測。首先采用多元線性回歸預(yù)測方法對變量進(jìn)行顯著性檢驗(yàn),保留顯著性高的變量,再結(jié)合基于ARMA時(shí)間序列的分析方法對保留后的變量進(jìn)行預(yù)測,最后通過神經(jīng)網(wǎng)絡(luò)建立產(chǎn)量綜合預(yù)測模型,從而提高預(yù)測的精準(zhǔn)度;4、以上述改進(jìn)的數(shù)據(jù)挖掘算法為技術(shù)基礎(chǔ),在Windows7操作環(huán)境下運(yùn)用Microsoft Visual Studio 2010編程軟件、Oracle10g數(shù)據(jù)庫及其管理系統(tǒng)和基于C#的嵌入式SQL語句,研發(fā)一種C/S架構(gòu)的油田產(chǎn)量分析決策支持系統(tǒng)。最后通過實(shí)際生產(chǎn)數(shù)據(jù)對系統(tǒng)進(jìn)行測試,驗(yàn)證該系統(tǒng)滿足油田生產(chǎn)決策的需求。
[Abstract]:In recent years, data mining technology has been widely used in many fields. It has unparalleled advantages in dealing with knowledge discovery and mass data analysis. The oil field has accumulated massive production data, there are some hidden laws in these data, because of the limited ability of manual analysis data, it is difficult to find them, and data mining technology can make up for this deficiency. This paper attempts to use data mining technology to analyze and predict oilfield production. This paper first determines the technical route of application of data mining technology in oil field production prediction, and studies the algorithms related to data preprocessing, data classification and data prediction in data mining technology. The main content is: 1. Optimize the attribute reduction algorithm of production data in rough set theory. The dependency and importance of attributes are applied to describe the weight of attributes, which is used as the criterion of selecting initial population of PSO, and the search range of solution space is reduced. Finally, the migration and orientation operation of bacterial foraging algorithm is introduced to complete the local search function of the algorithm, and to improve the ability of finding the best reduction result in the process of attribute reduction. In order to get the optimal result of attribute reduction of production, we use database management system and embedded SQL based on C # to query the production data directly in the production database, and make up the deficiency that C4.5 algorithm can not classify the massive data. At the same time, using the Fayyad boundary point decision theorem, the problem of how to select the optimal threshold value of C4.5 algorithm is solved, and the execution efficiency of C4.5 algorithm is improved. When the number of samples in the production database increases, the execution efficiency and classification accuracy of the algorithm will not be affected. In order to make it more adaptable, the combined forecasting method is used to predict the complex variable, which is affected by many factors. First, the multivariate linear regression prediction method is used to test the significance of the variables, and the variables with high significance are retained. Then, combined with the analysis method based on the ARMA time series, the reserved variables are predicted. Finally, the comprehensive prediction model of production is established through neural network, so as to improve the accuracy of prediction. Based on the improved data mining algorithm mentioned above, Using Microsoft Visual Studio 2010 programming software Oracle10g database and its management system and embedded SQL language based on C #, a C / S structure decision support system for oil field production analysis is developed in Windows7 operating environment. Finally, the system is tested by actual production data to verify that the system meets the requirements of oilfield production decision.
【學(xué)位授予單位】:東北石油大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:TE319;TP311.13
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 蔡照鵬;王永皎;韓正亮;;ID3算法改進(jìn)及其在分析商品價(jià)格波動因素中的應(yīng)用[J];河南城建學(xué)院學(xué)報(bào);2016年06期
2 李志豪;;基于離散粒子群算法的粗糙集屬性約簡[J];工業(yè)控制計(jì)算機(jī);2016年11期
3 陳海燕;劉晨暉;孫博;;時(shí)間序列數(shù)據(jù)挖掘的相似性度量綜述[J];控制與決策;2017年01期
4 繆長生;張晨陽;李振華;黃蘭;魏華;;塔里木油田產(chǎn)量預(yù)測方法探討[J];中國管理信息化;2016年22期
5 路,
本文編號:1878417
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/1878417.html
最近更新
教材專著