智能建模中冗余問題的分析與處理及其應(yīng)用
發(fā)布時間:2018-08-22 13:58
【摘要】:隨著現(xiàn)代科學(xué)技術(shù)飛速發(fā)展,石油化工產(chǎn)業(yè)不斷進(jìn)步和發(fā)展,石油化工產(chǎn)品質(zhì)量不斷提高、種類不斷多元化。當(dāng)面臨新穎的生產(chǎn)工藝、不斷復(fù)雜化的過程系統(tǒng)時,其過程機(jī)理難以獲得,機(jī)理模型難以建立。因此,基于數(shù)據(jù)的建模方法得到了越來越多的關(guān)注;跀(shù)據(jù)的建模方法不用了解化工過程機(jī)理,卻十分依賴樣本數(shù)據(jù)的質(zhì)量和模型的結(jié)構(gòu)。而在這兩方面往往會出現(xiàn)冗余問題。A)挑選的輸入變量可能與因變量無關(guān),且相互之間可能存在冗余。工業(yè)過程中,通過先驗知識或已知部分機(jī)理知識確定足夠多的變量個數(shù),而它們之間通常存在復(fù)雜的交互關(guān)系。如果將它們都作為模型輸入,就會直接增加模型輸入結(jié)構(gòu)的復(fù)雜度,而且冗余問題可能間接傳遞到模型輸出,嚴(yán)重影響其性能。B)模型結(jié)構(gòu)中的冗余問題。模型的性能與其結(jié)構(gòu)的優(yōu)劣息息相關(guān),而且結(jié)構(gòu)的復(fù)雜度決定了模型的計算效率。為此,本文針對這兩類智能建模中的冗余問題,首先通過多元統(tǒng)計方法中的主成分分析,并結(jié)合互信息分析法,研究和探討了如何消除輸入變量之間的冗余問題、如何發(fā)現(xiàn)與因變量無關(guān)的輸入變量。其次,通過偏互信息法和提出的基于偏互信息的聚類方法,消除神經(jīng)網(wǎng)絡(luò)隱含層輸出之間存在的冗余問題,優(yōu)化神經(jīng)網(wǎng)絡(luò)結(jié)構(gòu)。本文的主要研究成果如下: (1)針對輸入變量之間可能存在復(fù)雜的冗余問題,結(jié)合徑向基神經(jīng)網(wǎng)絡(luò)建模,提出了基于主成分-互信息分析的徑向基神經(jīng)網(wǎng)絡(luò)模型(Principal Component Analysis-Mutual Information-Radial Basis Function Neural Network, PCA-MI-RBFNN)。首先通過主成分分析,把含有冗余問題的原始輸入變量轉(zhuǎn)換為新變量—主成分。主成分之間互不相關(guān),并按照樣本方差從大到小構(gòu)建。模型要描述的是輸入和輸出變量之間的關(guān)系,因此按方差最大化來選取主成分變量作為模型輸入,忽略了輸入與輸出變量之間的相關(guān)信息。于是結(jié)合互信息分析法,可以準(zhǔn)確分析各主成分與輸出變量之間的相關(guān)性,挑選出最佳的主成分作為模型輸入。經(jīng)過標(biāo)準(zhǔn)測試建模數(shù)據(jù)和精對苯二甲酸生產(chǎn)過程中氧化單元粗對苯二甲酸中對羧基苯甲醛(4-carboxybenzaldehyde,4-CBA)含量軟測量建模的測試,結(jié)果表明消除輸入冗余后,PCA-MI-RBFNN模型具有良好的魯棒和預(yù)測性能。 (2)針對挑選的輸入變量可能與因變量無關(guān),且輸入變量之間可能存在復(fù)雜的冗余問題,結(jié)合相關(guān)向量機(jī)建模,提出了基于互信息-主成分-互信息分析的相關(guān)向量機(jī)模型(Mutual Information-Principal Component Analysis-Mutual Information-Relevance Vector Machine, MI-PCA-MI-RVM)。針對化工過程中高維的原始輸入變量,其中有些變量與因變量毫不相關(guān),若將這些變量直接用于建模,則會導(dǎo)致模型不準(zhǔn)確;有些變量雖然與因變量相關(guān),但相互之間存在冗余問題,若將這些變量直接用于建模,則會間接降低模型性能。因此提出對原始樣本數(shù)據(jù)的粗篩選方式。MI-PCA-MI-RVM方法首先通過互信息分析,獲得所有輸入與輸出變量之間的互信息量,并根據(jù)互信息量的概率密度分布,確定區(qū)分無關(guān)變量與相關(guān)變量的閾值,剔除無關(guān)輸入變量。然后,針對剩余的輸入變量,通過主成分-互信息分析挑選出與模型輸出最相關(guān)的主成分作為模型輸入。通過對二甲苯氧化反應(yīng)中的4-CBA含量軟測量模型的測試,結(jié)果表明剔除無關(guān)輸入變量以及消除輸入冗余后,MI-PCA-MI-RVM模型具有良好的魯棒和預(yù)測性能。 (3)針對徑向基神經(jīng)網(wǎng)絡(luò)的結(jié)構(gòu)優(yōu)化問題,提出了基于偏互信息-最小二乘的隱含層單元挑選及其網(wǎng)絡(luò)權(quán)值和閾值更新(Partial Mutual Information-Least Square Regression-Radial Basis Function Neural Network, PMI-LSR-RBFNN)。PMI-LSR-RBFNN方法首先通過改進(jìn)的互信息分析法-偏互信息方法,挑選出合適隱含層單元,這些被挑選出的單元不僅與相互之間冗余最小,而且與輸出變量的相關(guān)性最大。然后,通過最小二乘,對隱含層輸出與輸出層輸出直接進(jìn)行線性回歸,更新了權(quán)值與閾值,建立RBFNN模型。在英威達(dá)氧化過程燃燒副反應(yīng)建模中,與基于K均值、模糊C均值、K中心點和減法聚類的改進(jìn)徑向基網(wǎng)絡(luò)相比,PMI-LSR-RBFNN網(wǎng)絡(luò)結(jié)構(gòu)簡潔且模型性能更佳。通過Sammon非線性映射分析,由偏互信息分析挑選出的隱含層單元在空間位置上并不是均勻分布,但表現(xiàn)出更佳的模型性能。同時,基于建立的模型,進(jìn)行各主要操作變量的靈敏度分析,其結(jié)果符合燃燒副反應(yīng)過程的已知先驗知識。 (4)針對多層前饋神經(jīng)網(wǎng)絡(luò)的結(jié)構(gòu)優(yōu)化問題,提出了基于最小冗余最大相關(guān)-偏互信息聚類方法和最小二乘的隱含層單元挑選及其網(wǎng)絡(luò)權(quán)值和閾值更新方法(MinimalRedundancy Maximal Relevance-Partial Mutual Information Clustering-Least Square Regression-Multi Layer Feed Forward Network, mPMIc-LSR-MLFN)當(dāng)變量維數(shù)增加時偏互信息會消耗大量計算時間,且容易失去估計精度,因此提出了一種新穎的最小冗余最大相關(guān)-偏互信息聚類方法。由最小冗余最大相關(guān)性分析挑選出合適的隱含層單元作為初始聚類中心;再通過偏互信息量的計算,對所有隱含層單元進(jìn)行聚類,并在各類中迭代更新該類中心,直到所有中心不再變化,從而尋找到最佳隱含層單元。最后通過最小二乘法線性回歸更新輸出層輸入與隱含層輸出之間的權(quán)值和閾值。在石腦油干點軟測量模型的應(yīng)用中,與基于K均值、減法等聚類的MLFN和三類改進(jìn)極限學(xué)習(xí)機(jī)(OP-、OS-、B-ELM)相比,mPMIc-LSR-MLFN模型結(jié)構(gòu)最為簡潔,預(yù)測性能最為出色。
[Abstract]:With the rapid development of modern science and technology, the petrochemical industry has been progressing and developing, the quality of petrochemical products has been improved and the kinds of petrochemical products have been diversified. Data-based modeling relies heavily on the quality of sample data and the structure of the model without understanding the mechanism of chemical processes. In both cases, redundancy often occurs. A) The input variables selected may be independent of dependent variables and may be redundant with each other. If they are used as model inputs, the complexity of model input structure will be increased directly, and the redundancy problem may be transmitted to model output indirectly, seriously affecting its performance. The performance of the model is closely related to its structure, and the complexity of the structure determines the computational efficiency of the model. To solve the redundancy problem in these two kinds of intelligent modeling, this paper firstly studies and discusses how to eliminate the redundancy between input variables through the principal component analysis of multivariate statistical methods and the mutual information analysis method. Secondly, through partial mutual information method and the proposed clustering method based on partial mutual information, the redundancy problem between hidden layer outputs of neural networks is eliminated and the structure of neural networks is optimized.
(1) In order to solve the complex redundancy problem between input variables, a principal component analysis-Mutual Information-Radial Basis Function Neural Network (PCA-MI-RBFNN) model based on principal component analysis and mutual information analysis is proposed. The model describes the relationship between input and output variables, so the principal component variable is selected as the input of the model according to the variance maximization, ignoring the relationship between input and output variables. Then combined with the mutual information analysis method, the correlation between the principal components and the output variables can be analyzed accurately, and the best principal components can be selected as the model input. The test results of CBA content soft sensor modeling show that PCA-MI-RBFNN model has good robustness and predictive performance after eliminating input redundancy.
(2) Considering that the selected input variables may be independent of dependent variables and that there may be complex redundancy between input variables, a Mutual Information-Principal Component Analysis-Mutual Information-Relevance Vector Ma is proposed based on mutual information-principal component-mutual information analysis. Chine, MI-PCA-MI-RVM. For high-dimensional raw input variables in chemical processes, some of them have nothing to do with dependent variables. If these variables are directly used for modeling, the model will be inaccurate; some variables are related to dependent variables, but there are redundancy problems between them. If these variables are directly used for modeling, the model will be inter-sessional. MI-PCA-MI-RVM first obtains the mutual information between all input and output variables by mutual information analysis, and then determines the threshold to distinguish independent variables from related variables according to the probability density distribution of mutual information. For the remaining input variables, the principal component which is most relevant to the output of the model is selected as the model input by principal component-mutual information analysis. The results of the soft-sensing model of 4-CBA content in xylene oxidation show that the MI-PCA-MI-RVM model has good Lu after removing the independent input variables and eliminating input redundancy. Rod and prediction performance.
(3) To optimize the structure of radial basis function neural networks (RBFNN), the hidden layer cell selection based on partial mutual information-least squares (PMI-LSR-RBFNN) and its network weights and thresholds update (PMI-LSR-RBFNN) are proposed. Then, the RBFNN model is established by linear regression between the output of the hidden layer and the output of the output layer directly through the least square method, updating the weights and thresholds. The PMI-LSR-RBFNN network has a simpler structure and better performance than the improved radial basis function network based on K-means, fuzzy C-means, K-center and subtractive clustering in the modeling of combustion side reactions in INVIDA oxidation process. At the same time, the sensitivity analysis of the main operating variables based on the model is carried out, and the results accord with the known prior knowledge of the combustion side reaction process.
(4) To optimize the structure of multi-layer feed-forward neural networks, a minimum redundancy maximum correlation-partial mutual information clustering method and a least squares hidden layer unit selection method, as well as a minimal Redundancy Maximal Relevance-Partial Mutual Information Clustering-Least Square Regression-Mul method are proposed. Ti Layer Feed Forward Network (mPMIc-LSR-MLFN) Partial Mutual Information (PMI) consumes a lot of computation time when the dimension of variables increases, and it is easy to lose the estimation accuracy. Therefore, a novel clustering method of minimum redundancy maximum correlation-biased information is proposed. Clustering centers; then, all hidden layer units are clustered by calculating the partial information, and the centers are updated iteratively until all centers are not changed, so as to find the best hidden layer units. Finally, the weights and thresholds between the input of output layer and the output of hidden layer are updated by the least squares linear regression. In the application of soft sensor model of naphtha dry point, compared with MLFN based on K-means, subtraction and other clustering and three kinds of improved extreme learning machines (OP-, OS-, B-ELM), the mPMIc-LSR-MLFN model has the most concise structure and the best prediction performance.
【學(xué)位授予單位】:華東理工大學(xué)
【學(xué)位級別】:博士
【學(xué)位授予年份】:2015
【分類號】:TE65;TP183
[Abstract]:With the rapid development of modern science and technology, the petrochemical industry has been progressing and developing, the quality of petrochemical products has been improved and the kinds of petrochemical products have been diversified. Data-based modeling relies heavily on the quality of sample data and the structure of the model without understanding the mechanism of chemical processes. In both cases, redundancy often occurs. A) The input variables selected may be independent of dependent variables and may be redundant with each other. If they are used as model inputs, the complexity of model input structure will be increased directly, and the redundancy problem may be transmitted to model output indirectly, seriously affecting its performance. The performance of the model is closely related to its structure, and the complexity of the structure determines the computational efficiency of the model. To solve the redundancy problem in these two kinds of intelligent modeling, this paper firstly studies and discusses how to eliminate the redundancy between input variables through the principal component analysis of multivariate statistical methods and the mutual information analysis method. Secondly, through partial mutual information method and the proposed clustering method based on partial mutual information, the redundancy problem between hidden layer outputs of neural networks is eliminated and the structure of neural networks is optimized.
(1) In order to solve the complex redundancy problem between input variables, a principal component analysis-Mutual Information-Radial Basis Function Neural Network (PCA-MI-RBFNN) model based on principal component analysis and mutual information analysis is proposed. The model describes the relationship between input and output variables, so the principal component variable is selected as the input of the model according to the variance maximization, ignoring the relationship between input and output variables. Then combined with the mutual information analysis method, the correlation between the principal components and the output variables can be analyzed accurately, and the best principal components can be selected as the model input. The test results of CBA content soft sensor modeling show that PCA-MI-RBFNN model has good robustness and predictive performance after eliminating input redundancy.
(2) Considering that the selected input variables may be independent of dependent variables and that there may be complex redundancy between input variables, a Mutual Information-Principal Component Analysis-Mutual Information-Relevance Vector Ma is proposed based on mutual information-principal component-mutual information analysis. Chine, MI-PCA-MI-RVM. For high-dimensional raw input variables in chemical processes, some of them have nothing to do with dependent variables. If these variables are directly used for modeling, the model will be inaccurate; some variables are related to dependent variables, but there are redundancy problems between them. If these variables are directly used for modeling, the model will be inter-sessional. MI-PCA-MI-RVM first obtains the mutual information between all input and output variables by mutual information analysis, and then determines the threshold to distinguish independent variables from related variables according to the probability density distribution of mutual information. For the remaining input variables, the principal component which is most relevant to the output of the model is selected as the model input by principal component-mutual information analysis. The results of the soft-sensing model of 4-CBA content in xylene oxidation show that the MI-PCA-MI-RVM model has good Lu after removing the independent input variables and eliminating input redundancy. Rod and prediction performance.
(3) To optimize the structure of radial basis function neural networks (RBFNN), the hidden layer cell selection based on partial mutual information-least squares (PMI-LSR-RBFNN) and its network weights and thresholds update (PMI-LSR-RBFNN) are proposed. Then, the RBFNN model is established by linear regression between the output of the hidden layer and the output of the output layer directly through the least square method, updating the weights and thresholds. The PMI-LSR-RBFNN network has a simpler structure and better performance than the improved radial basis function network based on K-means, fuzzy C-means, K-center and subtractive clustering in the modeling of combustion side reactions in INVIDA oxidation process. At the same time, the sensitivity analysis of the main operating variables based on the model is carried out, and the results accord with the known prior knowledge of the combustion side reaction process.
(4) To optimize the structure of multi-layer feed-forward neural networks, a minimum redundancy maximum correlation-partial mutual information clustering method and a least squares hidden layer unit selection method, as well as a minimal Redundancy Maximal Relevance-Partial Mutual Information Clustering-Least Square Regression-Mul method are proposed. Ti Layer Feed Forward Network (mPMIc-LSR-MLFN) Partial Mutual Information (PMI) consumes a lot of computation time when the dimension of variables increases, and it is easy to lose the estimation accuracy. Therefore, a novel clustering method of minimum redundancy maximum correlation-biased information is proposed. Clustering centers; then, all hidden layer units are clustered by calculating the partial information, and the centers are updated iteratively until all centers are not changed, so as to find the best hidden layer units. Finally, the weights and thresholds between the input of output layer and the output of hidden layer are updated by the least squares linear regression. In the application of soft sensor model of naphtha dry point, compared with MLFN based on K-means, subtraction and other clustering and three kinds of improved extreme learning machines (OP-, OS-, B-ELM), the mPMIc-LSR-MLFN model has the most concise structure and the best prediction performance.
【學(xué)位授予單位】:華東理工大學(xué)
【學(xué)位級別】:博士
【學(xué)位授予年份】:2015
【分類號】:TE65;TP183
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 張堯;席云華;胡金磊;夏成軍;林凌雪;;基于PCA和RBF神經(jīng)網(wǎng)絡(luò)的中長期負(fù)荷預(yù)測方法[J];電氣應(yīng)用;2008年02期
2 段青;趙建國;馬艷;;優(yōu)化組合核函數(shù)相關(guān)向量機(jī)電力負(fù)荷預(yù)測模型[J];電機(jī)與控制學(xué)報;2010年06期
3 顏學(xué)峰,陳德釗,胡上序;用自適應(yīng)偏最小二乘回歸為藥物定量構(gòu)效關(guān)系建模[J];分析化學(xué);2002年05期
4 史文利;高天寶;王樹恩;;基于主成分分析與聚類分析的城市化水平綜合評價[J];工業(yè)工程;2008年03期
5 陳牢,
本文編號:2197309
本文鏈接:http://sikaile.net/kejilunwen/shiyounenyuanlunwen/2197309.html
最近更新
教材專著