基于遷移學(xué)習(xí)和PU學(xué)習(xí)的軟件故障預(yù)測方法研究
[Abstract]:With the continuous development of artificial intelligence, machine learning technology has been applied to software fault prediction. Traditional software fault prediction based on machine learning requires a large number of labeled samples for modeling. In reality, tagged software fault data are often acquired by manual testing, which is time-consuming and costly. In order to reduce the requirement of traditional software fault prediction methods for labeled samples in supervised learning scenarios, this paper studies the two aspects of positive unannotated learning (Positive and Unlabeled Learning,PU learning and migration learning, and proposes a new approach for PU scenarios. Through knowledge transfer of cross-company, cross-project unannotated fault data, the target fault samples are predicted. The main works are as follows: (1) in PU scenario, the instance migration algorithm based on stochastic forest (POSTRF algorithm). Under the PU scenario, based on Bayesian idea of cross-class migration, the sample to be predicted is regarded as the target domain data set, which will be cross-company. The software fault samples of cross-project are regarded as source domain data sets. The source domain data sets are trained with backward-back sampling to obtain multiple PU random decision trees. The sample weights are calculated according to the AUC values obtained from the test of the target domain data and the samples from the sample sets. The PU data set is constructed by migrating samples with similar distribution to target domain data and building model based on POSC4.5 algorithm to predict software fault samples in target domain. Firstly, M samples are collected by bagSize scale and M PU random decision trees are trained, and 75% samples are randomly extracted from the target domain as test sets to classify M random decision trees. The AUC value (Area Under the ROC Curve) of each tree is taken as the weight of each tree, the sample weight of the sample set is weighted according to the tree weight, and the final sample weight is obtained by combining the sample weight of the sample set, so that the sample with higher migration weight than r is used to complete the sample migration. Based on the complete random assumption, the PU data set is constructed for migrating samples and target domain data sets. The uncertain information gain of attributes is calculated with positive sample number, unlabeled sample number and positive prior probability. By selecting the maximum uncertain information gain attribute as the branch node, the top-down recursive tree model is generated. The target domain fault samples are predicted. (2) eight software fault data sets of NASA database are used as experimental data sets, and 0kc3cm1 data sets are used as target domain data sets respectively. The other data sets are used as source domain data sets. The experimental results show that the POSTRF algorithm improves the classification performance of the model by migrating the sample samples of other auxiliary sets on the 0kc3 and cm1 target sets by comparing the proposed algorithm with the POSC4.5 algorithm. The AUC value increased about 3-12 and the fault prediction rate PD increased about 5%. Therefore, the proposed POSTRF algorithm has comparable or better prediction performance to the target domain fault samples than the traditional PU learning algorithm through knowledge migration of cross-project and cross-company software fault data.
【學(xué)位授予單位】:西北農(nóng)林科技大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:TP311.53
【參考文獻】
相關(guān)期刊論文 前7條
1 張荷;李梅;張陽;蔡曉妍;;基于PU學(xué)習(xí)的軟件故障檢測研究[J];計算機應(yīng)用研究;2015年11期
2 石慧;賈代平;苗培;;基于詞頻信息的改進信息增益文本特征選擇算法[J];計算機應(yīng)用;2014年11期
3 鄭科鵬;馮筠;孫霞;馮宏偉;曹國震;;基于靜態(tài)集成PU學(xué)習(xí)數(shù)據(jù)流分類的入侵檢測方法[J];西北大學(xué)學(xué)報(自然科學(xué)版);2014年04期
4 莊福振;羅平;何清;史忠植;;遷移學(xué)習(xí)研究進展[J];軟件學(xué)報;2015年01期
5 張汗靈;湯隆慧;周敏;;基于KMM匹配的參數(shù)遷移學(xué)習(xí)算法[J];湖南大學(xué)學(xué)報(自然科學(xué)版);2011年04期
6 賀濤;曹先彬;譚輝;;基于免疫的中文網(wǎng)絡(luò)短文本聚類算法[J];自動化學(xué)報;2009年07期
7 于玲;吳鐵軍;;集成學(xué)習(xí):Boosting算法綜述[J];模式識別與人工智能;2004年01期
相關(guān)碩士學(xué)位論文 前3條
1 韋余永;基于實例與特征的遷移學(xué)習(xí)文本分類方法研究[D];西南大學(xué);2015年
2 周興勤;基于選擇性集成的增量學(xué)習(xí)研究[D];重慶大學(xué);2014年
3 何佳珍;不確定數(shù)據(jù)的PU學(xué)習(xí)貝葉斯分類器研究[D];西北農(nóng)林科技大學(xué);2012年
,本文編號:2399484
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2399484.html