天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當前位置:主頁 > 科技論文 > 自動化論文 >

虛擬樣本生成技術及建模應用研究

發(fā)布時間:2018-04-23 21:46

  本文選題:小樣本 + 虛擬樣本生成; 參考:《北京化工大學》2017年博士論文


【摘要】:“大數據”時代,在很多領域,數據海量,知識貧乏,需要通過數據挖掘發(fā)現知識,數據驅動建模成為研究熱點,而數據樣本個數不充分、樣本代表性不典型或者樣本分布不均勻等嚴重制約數據驅動建模的質量。在大數據背景下,不可忽視的一個重要問題就是大數據、小樣本問題。這個問題主要源于數據獲取成本較高、或數據重復或發(fā)生概率較小等原因,致使面臨有用數據有限;谛颖救绾芜M行有效建模是計算智能領域的一個重要研究方向,具有十分重要的理論研究意義和應用價值。解決小樣本問題,目前學術界主要有基于灰色理論與機器學習的方法和生成虛擬樣本的方法等兩種途徑;谛颖緮祿a生新的有效數據是補充數據的一種有效方法,虛擬樣本生成技術是解決小樣本問題的重要研究方向。在大量文獻閱讀、歸納、總結的基礎上,本文將針對監(jiān)督式和非監(jiān)督式機器學習算法所對應的標簽數據和無標簽數據的小樣本問題,開展基于小樣本的虛擬樣本產生、優(yōu)化和應用研究,以產生充足的有效數據集,進而開展神經網絡結構和算法研究以提出數據驅動的智能建模新方法,并開展工程建設費用風險分析應用研究。本文的主要研究內容如下:(1)基于整體擴散技術的虛擬樣本生成新方法。整體趨勢擴散技術是一種有效的基于分布的虛擬樣本生成技術,但現有技術只考慮了在原始樣本區(qū)域和擴散區(qū)域采用同一種數據分布方法產生虛擬樣本,并且增加虛擬輸入屬性使輸入空間倍增。本文在此基礎上,在已知小樣本區(qū)域采用不均勻分布、在拓展區(qū)域采用均勻分布兩種方式相結合,通過多分布整體擴散技術推估小樣本屬性可接受范圍,同時為了不增加輸入屬性,不再求取隸屬度函數值代表樣本點發(fā)生的可能性作為模型的虛擬輸入屬性,由此形成了一種更有效的虛擬樣本產生新機制,提出了一種新穎的多分布整體趨勢擴散技術(MD-MTD)。通過標準函數和工業(yè)數據集驗證了所提方法的有效性。(2)基于優(yōu)化技術的虛擬樣本生成新方法。為了解決虛擬樣本的優(yōu)化問題,在MD-MTD的基礎上,本文提出了基于三角隸屬函數的信息擴散方法(TMIE),進而提出了一種新的確定上下拓展區(qū)域界限的方法,基于改進的MD-MTD產生虛擬樣本,采用PSO對所產生的輸入屬性的虛擬樣本進行優(yōu)化計算,獲得更合適的虛擬樣本,由此提出了 PSO-MD-MTD方法。通過標準函數和工業(yè)數據集驗證了所提方法的有效性。(3)基于插值的虛擬樣本生成新方法;诜植嫉奶摂M樣本生成技術是基于小樣本建立的模型,由此本文研究建立一種合理有效的基于小樣本的神經網絡模型,進而根據所建模型的線性和非線性結構特點進行虛擬樣本的生成。為此,本文提出了一種極限學習機隱含層插值的虛擬樣本生成方法(IVSG),對極限學習機隱含層的輸出數據進行中值插值產生相應的虛擬樣本,再由隱含層輸出數據的虛擬樣本前后反推輸出層輸出和輸入層輸入空間的虛擬數據。通過標準函數和工業(yè)數據集驗證了所提方法的有效性,并對IVSG、PSO-MD-MTD和MD-MTD進行比較,分析不同方法的適用性。(4)基于偏最小二乘法的函數連接神經網絡建模新方法。在解決數據樣本有效性問題的基礎上,利用數據驅動建模思想來挖掘數據背后隱藏的知識就是一項十分重要的工作。為了有效解決函數連接神經網絡中共線性數據問題和有效地挖掘有限數據背后的知識信息,本文結合極限學習機模型,提出采用偏最小二乘學習算法取代函數連接神經網絡原模型誤差反向傳播算法來求取模型參數,由此提出了一種基于偏最小二乘學習算法的函數連接神經網絡模型(PLSR-FLNN),通過兩個工業(yè)實例數據集驗證了所提方法的有效性,與其它四種建模方法比較驗證了所提方法的先進性。(5)基于蒙特卡洛方法擴充樣本實現工程建設費用風險分析與評估。在解決監(jiān)督學習中數據和建模問題的基礎上,本文針對非監(jiān)督學習中的數據問題開展研究工作。重點探討Monte Carlo在工程建設費用風險分析中的不確定性小樣本問題,提出基于蒙特卡洛模擬的樣本補充方法,在此基礎上,根據數據樣本估計費用項的概率分布和概率密度函數,同時采用蒙特卡洛模擬和市場因素驅動,并結合李克特量表分析法,對各影響因素進行綜合分析與評價,由此提出一種實用的工程建設費用風險分析方法,通過實際工程案例驗證了所提方法的有效性。
[Abstract]:In the era of "big data", in many fields, data is huge, knowledge is poor, and knowledge is needed through data mining. Data driven modeling has become a hot topic, but the number of data samples is not sufficient, the representative of sample is not typical or the distribution of sample is not uniform, and the quality of data driven modeling is seriously restricted. In large data background, it can not be ignored. One of the important problems is large data, small sample problem. This problem is mainly due to the high cost of data acquisition, or the low probability of data repetition or small occurrence, which leads to the limited availability of useful data. It is an important research direction in the field of computing intelligence based on how to make effective modeling based on small samples. In order to solve the problem of small sample, there are two ways in the academic circle, which are based on the method of grey theory and machine learning and the method of generating virtual sample. It is an effective method to produce new effective data based on small sample data, and the virtual sample generation technology is important to solve the small sample problem. On the basis of a large number of literature reading, induction and summary, this paper will launch a small sample based virtual sample generation, optimization and application research to produce sufficient and effective data sets to develop a neural network, based on the small sample problem of the label data and unlabeled data corresponding to the supervised and unsupervised machine learning algorithms. The research of network structure and algorithm is a new method of data driven intelligent modeling, and the research of engineering construction cost risk analysis is carried out. The main contents of this paper are as follows: (1) a new method of virtual sample generation based on the whole diffusion technology. The existing technology only considers the use of the same data distribution method in the original sample area and the diffusion region to generate virtual samples, and increase the virtual input attribute to multiplier the input space. On this basis, the inhomogeneous distribution is adopted in the known small sample regions, and the two ways of uniform distribution are combined in the extended region through the multiple points. The whole diffusion technology estimates the acceptable range of the small sample attributes. At the same time, in order to not increase the input attribute, the possibility of the membership degree function is no longer to represent the possibility of the sample point as the virtual input attribute of the model, thus a more effective new mechanism of virtual sample generation is formed, and a novel multi distribution overall trend expansion is proposed. MD-MTD. The validity of the proposed method is verified through standard functions and industrial data sets. (2) a new method of virtual sample generation based on optimization technology is created. In order to solve the optimization problem of virtual samples, based on MD-MTD, this paper proposes a method of information diffusion based on trigonometric membership function (TMIE), and then proposes a new kind of method. The method of setting up and down region boundaries is based on the virtual sample produced by the improved MD-MTD. The virtual sample of the input attributes generated by PSO is optimized and the more appropriate virtual samples are obtained. Thus, the PSO-MD-MTD method is proposed. The validity of the proposed method is verified by the standard function and the industrial data set. (3) interpolation based on the method. The virtual sample generation method is a new method. The distributed virtual sample generation technology is based on the small sample model. In this paper, a reasonable and effective neural network model based on small sample is established, and then the pseudo sample is generated according to the linear and nonlinear structure characteristics of the model. The virtual sample generation method (IVSG) for the implicit layer interpolation of the learning machine is used to generate the corresponding virtual samples for the output data of the implicit layer of the limit learning machine, and then the output layer and the input layer virtual data in the input layer of the virtual sample of the hidden layer output data. The standard function and the industrial data collection are tested. The validity of the proposed method is proved, and IVSG, PSO-MD-MTD and MD-MTD are compared, and the applicability of different methods is analyzed. (4) a new method of modeling the neural network based on partial least square method is used. On the basis of solving the problem of data sample validity, the data driven modeling idea is used to excavate the hidden knowledge behind the data. In order to effectively solve the linear data problem of the function connection neural network and effectively excavate the knowledge information behind the finite data, a partial least square learning algorithm is proposed to replace the original model error back propagation algorithm of the function connection neural network to obtain the model reference. In this way, a function connection neural network model (PLSR-FLNN) based on partial least squares learning algorithm is proposed, and the effectiveness of the proposed method is verified by two industrial example data sets. Compared with the other four modeling methods, the advanced nature of the proposed method is verified. (5) the construction cost of the project is expanded by the Monte Carlo method. Using risk analysis and evaluation. On the basis of solving the problem of data and modeling in supervised learning, this paper carries out research work on data problems in unsupervised learning. This paper focuses on the small sample problem of Monte Carlo in the risk analysis of engineering construction costs, and proposes a sample supplement based on Monte Carlo simulation, which is based on this basis. At the same time, the probability distribution and probability density function of the cost item are estimated according to the data sample, and the Monte Carlo simulation and the market factor are used at the same time. Combined with the Li kte scale analysis method, the influence factors are synthetically analyzed and evaluated. A practical project construction cost risk analysis method is put forward, and the practical engineering case is adopted. The effectiveness of the proposed method is verified.

【學位授予單位】:北京化工大學
【學位級別】:博士
【學位授予年份】:2017
【分類號】:TP18;TP311.13

【參考文獻】

相關期刊論文 前10條

1 賀彥林;王曉;朱群雄;;基于主成分分析-改進的極限學習機方法的精對苯二甲酸醋酸含量軟測量[J];控制理論與應用;2015年01期

2 劉菲菲;彭荻;賀彥林;朱群雄;;基于極限學習的過程神經網絡研究及化工應用[J];上海交通大學學報;2014年07期

3 高慧慧;賀彥林;彭荻;朱群雄;;基于數據屬性劃分的遞階ELM研究及化工應用[J];化工學報;2013年12期

4 張明泉;鐘雄;;蒙特卡洛模擬在油田開發(fā)經濟評價風險中的應用[J];西南石油大學學報(社會科學版);2012年04期

5 孫燕君;錢瑜;張玉超;;蒙特卡洛分析在氯氣泄漏事故環(huán)境風險評價中的應用研究[J];環(huán)境科學學報;2011年11期

6 于旭;楊靜;謝志強;;虛擬樣本生成技術研究[J];計算機科學;2011年03期

7 邵秀麗;侯樂彩;黃海寬;;基于SVM和蒙特卡洛的滴丸含水量建模仿真[J];南開大學學報(自然科學版);2009年05期

8 朱群雄;孟慶浩;;一種新的選擇性神經網絡集成方法及其在PTA中的應用[J];化工學報;2009年10期

9 王紅衛(wèi);祁超;魏永長;李彬;朱松;;基于數據的決策方法綜述[J];自動化學報;2009年06期

10 郜傳厚;漸令;陳積明;孫優(yōu)賢;;復雜高爐煉鐵過程的數據驅動建模及預測算法[J];自動化學報;2009年06期

相關會議論文 前1條

1 吳祉群;何建國;蒲潔;;蒙特卡羅法模擬計算小樣本事件可靠性[A];2004年全國機械可靠性學術交流會論文集[C];2004年

相關博士學位論文 前1條

1 李棟;基于免疫系統的小樣本在線學習異常檢測與故障診斷方法[D];上海大學;2014年

,

本文編號:1793759

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/zidonghuakongzhilunwen/1793759.html


Copyright(c)文論論文網All Rights Reserved | 網站地圖 |

版權申明:資料由用戶781b5***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com