天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁 > 科技論文 > 軟件論文 >

基于遷移學(xué)習(xí)和PU學(xué)習(xí)的軟件故障預(yù)測方法研究

發(fā)布時間:2019-01-03 14:10
【摘要】:隨著人工智能的不斷發(fā)展,機器學(xué)習(xí)技術(shù)已被應(yīng)用于軟件故障預(yù)測中,傳統(tǒng)基于機器學(xué)習(xí)的軟件故障預(yù)測需要大量已標(biāo)注樣本進行模型構(gòu)建。而現(xiàn)實中,已標(biāo)注軟件故障數(shù)據(jù)往往通過人工測試后獲取,費時費力代價高昂。為了降低傳統(tǒng)軟件故障預(yù)測方法在有監(jiān)督學(xué)習(xí)場景下對標(biāo)注樣本的需求,本文從正例未標(biāo)注學(xué)習(xí)(Positive and Unlabeled Learning,PU學(xué)習(xí))和遷移學(xué)習(xí)兩方面展開研究,提出針對PU場景下,通過對跨公司、跨項目正例未標(biāo)注故障數(shù)據(jù)進行知識遷移,對目標(biāo)故障樣本進行預(yù)測,具體工作如下:(1)PU場景下基于隨機森林的實例遷移算法(POSTRF算法)該算法在PU場景下,基于貝葉斯跨類遷移思想,將待預(yù)測樣本視為目標(biāo)領(lǐng)域數(shù)據(jù)集,將跨公司、跨項目的軟件故障樣本視為源領(lǐng)域數(shù)據(jù)集,對源領(lǐng)域數(shù)據(jù)集進行有放回抽樣訓(xùn)練得到多棵PU隨機決策樹,根據(jù)對目標(biāo)領(lǐng)域數(shù)據(jù)測試得到的AUC值及采樣集樣本計算樣本權(quán)重,通過遷移與目標(biāo)領(lǐng)域數(shù)據(jù)具有相似分布的樣本與目標(biāo)領(lǐng)域數(shù)據(jù)共同構(gòu)建PU數(shù)據(jù)集,基于POSC4.5算法構(gòu)建模型來對目標(biāo)領(lǐng)域的軟件故障樣本進行預(yù)測。算法首先對源領(lǐng)域數(shù)據(jù)集以bagSize比例進行有放回抽樣得到M份采樣集并訓(xùn)練M棵PU隨機決策樹,從目標(biāo)領(lǐng)域中隨機抽取75%樣本作為測試集對M棵隨機決策樹進行分類測試,將每棵樹的AUC值(Area Under the ROC Curve)作為各樹權(quán)重,根據(jù)樹權(quán)重對采樣集樣本加權(quán),將采樣集樣本權(quán)重合并得到最終樣本權(quán)重,以遷移比r遷移權(quán)重較高樣本完成實例遷移。對遷移樣本和目標(biāo)領(lǐng)域數(shù)據(jù)集基于完全隨機假設(shè)構(gòu)建PU數(shù)據(jù)集,以正例樣本數(shù)、未標(biāo)注樣本數(shù)和正例先驗概率計算屬性的不確定信息增益,通過選擇最大不確定信息增益屬性為分支節(jié)點,自上而下遞歸生成樹模型,對目標(biāo)領(lǐng)域故障樣本進行預(yù)測。(2)針對POSTRF算法實驗將NASA數(shù)據(jù)庫的8個軟件故障數(shù)據(jù)集作為實驗數(shù)據(jù)集,分別以0kc3、cm1數(shù)據(jù)集作為目標(biāo)領(lǐng)域數(shù)據(jù)集,其余數(shù)據(jù)集作為源領(lǐng)域數(shù)據(jù)集,將本文的算法與POSC4.5算法進行對比實驗結(jié)果表明,POSTRF算法在0kc3和cm1目標(biāo)集上通過遷移其他輔助集實例樣本,提升了模型分類性能,且AUC值提高了約3%-12%,故障預(yù)測率PD提高了約5%。因此,本文提出的POSTRF算法通過對跨項目、跨公司軟件故障數(shù)據(jù)進行知識遷移,與傳統(tǒng)PU學(xué)習(xí)算法相比對目標(biāo)領(lǐng)域故障樣本具有相當(dāng)或更好的預(yù)測性能。
[Abstract]:With the continuous development of artificial intelligence, machine learning technology has been applied to software fault prediction. Traditional software fault prediction based on machine learning requires a large number of labeled samples for modeling. In reality, tagged software fault data are often acquired by manual testing, which is time-consuming and costly. In order to reduce the requirement of traditional software fault prediction methods for labeled samples in supervised learning scenarios, this paper studies the two aspects of positive unannotated learning (Positive and Unlabeled Learning,PU learning and migration learning, and proposes a new approach for PU scenarios. Through knowledge transfer of cross-company, cross-project unannotated fault data, the target fault samples are predicted. The main works are as follows: (1) in PU scenario, the instance migration algorithm based on stochastic forest (POSTRF algorithm). Under the PU scenario, based on Bayesian idea of cross-class migration, the sample to be predicted is regarded as the target domain data set, which will be cross-company. The software fault samples of cross-project are regarded as source domain data sets. The source domain data sets are trained with backward-back sampling to obtain multiple PU random decision trees. The sample weights are calculated according to the AUC values obtained from the test of the target domain data and the samples from the sample sets. The PU data set is constructed by migrating samples with similar distribution to target domain data and building model based on POSC4.5 algorithm to predict software fault samples in target domain. Firstly, M samples are collected by bagSize scale and M PU random decision trees are trained, and 75% samples are randomly extracted from the target domain as test sets to classify M random decision trees. The AUC value (Area Under the ROC Curve) of each tree is taken as the weight of each tree, the sample weight of the sample set is weighted according to the tree weight, and the final sample weight is obtained by combining the sample weight of the sample set, so that the sample with higher migration weight than r is used to complete the sample migration. Based on the complete random assumption, the PU data set is constructed for migrating samples and target domain data sets. The uncertain information gain of attributes is calculated with positive sample number, unlabeled sample number and positive prior probability. By selecting the maximum uncertain information gain attribute as the branch node, the top-down recursive tree model is generated. The target domain fault samples are predicted. (2) eight software fault data sets of NASA database are used as experimental data sets, and 0kc3cm1 data sets are used as target domain data sets respectively. The other data sets are used as source domain data sets. The experimental results show that the POSTRF algorithm improves the classification performance of the model by migrating the sample samples of other auxiliary sets on the 0kc3 and cm1 target sets by comparing the proposed algorithm with the POSC4.5 algorithm. The AUC value increased about 3-12 and the fault prediction rate PD increased about 5%. Therefore, the proposed POSTRF algorithm has comparable or better prediction performance to the target domain fault samples than the traditional PU learning algorithm through knowledge migration of cross-project and cross-company software fault data.
【學(xué)位授予單位】:西北農(nóng)林科技大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:TP311.53

【參考文獻】

相關(guān)期刊論文 前7條

1 張荷;李梅;張陽;蔡曉妍;;基于PU學(xué)習(xí)的軟件故障檢測研究[J];計算機應(yīng)用研究;2015年11期

2 石慧;賈代平;苗培;;基于詞頻信息的改進信息增益文本特征選擇算法[J];計算機應(yīng)用;2014年11期

3 鄭科鵬;馮筠;孫霞;馮宏偉;曹國震;;基于靜態(tài)集成PU學(xué)習(xí)數(shù)據(jù)流分類的入侵檢測方法[J];西北大學(xué)學(xué)報(自然科學(xué)版);2014年04期

4 莊福振;羅平;何清;史忠植;;遷移學(xué)習(xí)研究進展[J];軟件學(xué)報;2015年01期

5 張汗靈;湯隆慧;周敏;;基于KMM匹配的參數(shù)遷移學(xué)習(xí)算法[J];湖南大學(xué)學(xué)報(自然科學(xué)版);2011年04期

6 賀濤;曹先彬;譚輝;;基于免疫的中文網(wǎng)絡(luò)短文本聚類算法[J];自動化學(xué)報;2009年07期

7 于玲;吳鐵軍;;集成學(xué)習(xí):Boosting算法綜述[J];模式識別與人工智能;2004年01期

相關(guān)碩士學(xué)位論文 前3條

1 韋余永;基于實例與特征的遷移學(xué)習(xí)文本分類方法研究[D];西南大學(xué);2015年

2 周興勤;基于選擇性集成的增量學(xué)習(xí)研究[D];重慶大學(xué);2014年

3 何佳珍;不確定數(shù)據(jù)的PU學(xué)習(xí)貝葉斯分類器研究[D];西北農(nóng)林科技大學(xué);2012年



本文編號:2399484

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2399484.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶afc3d***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com
日本高清不卡在线一区| 91一区国产中文字幕| 色综合久久六月婷婷中文字幕| 91国内视频一区二区三区| 国内精品美女福利av在线| 精品国产品国语在线不卡| 麻豆国产精品一区二区| 伊人网免费在线观看高清版| 亚洲精品高清国产一线久久| 樱井知香黑人一区二区| 一区二区三区人妻在线| 亚洲中文字幕人妻系列| 久久国产亚洲精品赲碰热| 国产一区二区熟女精品免费| 亚洲日本中文字幕视频在线观看 | 99视频精品免费视频播放| 99久免费精品视频在线观| 暴力三级a特黄在线观看| 日韩一区二区三区免费av| 熟女高潮一区二区三区| 午夜精品在线观看视频午夜| 色婷婷视频免费在线观看| 99久久精品国产麻豆| 国产又粗又长又大的视频| 欧美人与动牲交a精品| 福利专区 久久精品午夜| 国产专区亚洲专区久久| 欧美一区二区三区视频区| 国内外免费在线激情视频| 亚洲精品av少妇在线观看| 国产成人精品午夜福利| 正在播放国产又粗又长| 国产内射一级二级三级| 国产精品美女午夜福利| 91福利视频日本免费看看| 少妇在线一区二区三区| 老司机精品视频免费入口| 欧美一区二区三区播放| 欧美尤物在线观看西比尔| 一区二区三区日本高清| 人妻内射精品一区二区|