當(dāng)前位置：主頁(yè) > 碩博論文 > 信息類(lèi)博士論文 >

基于監(jiān)督學(xué)習(xí)的bug報(bào)告和源代碼摘要

發(fā)布時(shí)間：2019-06-06 09:28

【摘要】：開(kāi)發(fā)者在執(zhí)行軟件任務(wù)時(shí),需要與軟件工件如bug報(bào)告、源代碼倉(cāng)庫(kù)等進(jìn)行交互,為了獲取所需要的信息,也許需要徹底地通讀整個(gè)工件。然而,從bug報(bào)告和源代碼中提取有價(jià)值的信息是一項(xiàng)十分繁瑣且耗時(shí)的任務(wù)。為了高效地求解這個(gè)任務(wù),研究者建議為軟件工件自動(dòng)化地建立摘要信息。在本文,為了方便開(kāi)發(fā)者從bug報(bào)告和源代碼倉(cāng)庫(kù)中高效地提取所需要的信息,我們提出使用有監(jiān)督的學(xué)習(xí)技術(shù)來(lái)建立摘要信息。我們使用重復(fù)的bug報(bào)告來(lái)建立bug報(bào)告摘要信息,作為自然語(yǔ)言文本摘要任務(wù)的一個(gè)實(shí)例。在另一個(gè)調(diào)研中,我們執(zhí)行源代碼片段摘要,作為源代碼到源代碼摘要任務(wù)的一個(gè)實(shí)例。對(duì)于bug報(bào)告,我們開(kāi)發(fā)了一種基于PageRank的bug報(bào)告摘要算法(Page Rank based Summarization Technique),簡(jiǎn)稱為PRST。該算法使用三種不同的相似度度量方法,分別基于VSM.Jaccard和WordNet,來(lái)計(jì)算主bug報(bào)告和對(duì)應(yīng)的重復(fù)的bug報(bào)告之間的相似度。由于公共可用的bug報(bào)告語(yǔ)料庫(kù)中缺乏主bug報(bào)告和重復(fù)bug報(bào)告的對(duì)應(yīng)關(guān)系,無(wú)法利用重復(fù)bug報(bào)告中包含的信息來(lái)執(zhí)行bug報(bào)告摘要任務(wù)。因此,我們從Mozilla、KDE、Gnome和Eclipse項(xiàng)目中抽取出59個(gè)bug報(bào)告并建立了一個(gè)獨(dú)立的bug報(bào)告語(yǔ)料庫(kù),稱為OSCAR.同時(shí),我們通過(guò)增加重復(fù)的bug報(bào)告來(lái)重構(gòu)已有的BRC語(yǔ)料庫(kù),并將其作為對(duì)比語(yǔ)料庫(kù)。我們采用幾種先進(jìn)的統(tǒng)計(jì)評(píng)價(jià)指標(biāo),即精度(Precision)、召回率(Recall),F-Score 和 Pyramid Precision,外在地評(píng)價(jià)所提出的算法的有效性。結(jié)果顯示我們提出的算法能夠獲得相對(duì)準(zhǔn)確的bug報(bào)告摘要信息,并且,提高了已有的有監(jiān)督的bug報(bào)告和精度。同樣地,為了建立源代碼摘要信息,我們開(kāi)發(fā)了一種基于SVM和NB分類(lèi)器的代碼片段摘要算法(CodeFragment Summarization,CFS)自動(dòng)生成源代碼片段中源到源摘要信息。在軟件工件摘要范式中,我們首次引入了基于數(shù)據(jù)驅(qū)動(dòng)的小規(guī)模的眾包方法來(lái)幫助我們抽取源代碼句法特征。我們從Eclipse 和 NetBeans官方FAQs中檢索到127個(gè)代碼片段并構(gòu)建一個(gè)用于測(cè)試的代碼片段語(yǔ)料庫(kù)。我們同樣采用先前提到的統(tǒng)計(jì)評(píng)價(jià)指標(biāo)并比較已有的方法來(lái)驗(yàn)證我們提出的方法的有效性。結(jié)果顯示我們的代碼片段摘要器在精度上超過(guò)已有的代碼片段摘要生成方法,同時(shí)句法特征對(duì)生成的摘要信息上的準(zhǔn)確度有著重要的影響。生成的摘要信息能夠有效地幫助開(kāi)發(fā)者解決在手的軟件任務(wù),并有效地改善軟件的性能和質(zhì)量。
[Abstract]:When performing software tasks, developers need to interact with software artifacts such as bug report, source code warehouse and so on. In order to obtain the required information, they may need to read through the whole artifact thoroughly. However, extracting valuable information from bug reports and source code is a tedious and time-consuming task. In order to solve this task efficiently, the researchers suggest that summary information be established automatically for software artifacts. In this paper, in order to facilitate developers to extract the required information efficiently from bug reports and source code warehouses, we propose to use supervised learning technology to establish summary information. We use duplicate bug reports to create bug report summary information as an example of a natural language text summary task. In another study, we performed a source code fragment summary as an example of the source code to source code summary task. For bug report, we develop a bug report summary algorithm based on PageRank, which is called PRST. for short. In this algorithm, three different similarity measures are used to calculate the similarity between the main bug report and the corresponding repeated bug report based on VSM.Jaccard and WordNet, respectively. Due to the lack of the corresponding relationship between the main bug report and the repeated bug report in the publicly available bug report corpus, it is impossible to use the information contained in the duplicate bug report to perform the bug report summary task. Therefore, we extracted 59 bug reports from the Mozilla,KDE,Gnome and Eclipse projects and established a separate bug report corpus called OSCAR. At the same time, we reconstruct the existing BRC corpus by adding repeated bug reports and use it as a comparative corpus. We use several advanced statistical evaluation indexes, namely precision (Precision), recall (Recall), F-Score and Pyramid Precision, to evaluate the effectiveness of the proposed algorithm. The results show that the proposed algorithm can obtain relatively accurate summary information of bug report, and improve the existing supervised bug report and accuracy. Similarly, in order to establish source code summary information, we develop a code fragment summary algorithm based on SVM and NB classifiers (CodeFragment Summarization,CFS) to automatically generate source-to-source summary information in source code fragments. In the software artifact summary paradigm, we first introduce a data-driven small-scale crowdsourcing method to help us extract the syntactic features of the source code. We retrieve 127 code fragments from Eclipse and NetBeans official FAQs and build a code fragment corpus for testing. We also use the statistical evaluation indicators mentioned earlier and compare the existing methods to verify the effectiveness of our proposed method. The results show that our code fragment extractor is more accurate than the existing code fragment summary generation methods, and syntactic features have an important impact on the accuracy of the generated summary information. The generated summary information can effectively help developers solve the software tasks in hand, and effectively improve the performance and quality of the software.
【學(xué)位授予單位】：大連理工大學(xué)
【學(xué)位級(jí)別】：博士
【學(xué)位授予年份】：2016
【分類(lèi)號(hào)】：TP311.5;TP391.1

【相似文獻(xiàn)】

相關(guān)期刊論文前10條

1 程菲;汪建海;羅鍵;;基于重復(fù)檢測(cè)的多摘要消重方法[J];計(jì)算機(jī)工程與設(shè)計(jì);2006年23期

2 于建原;使用Word 97“自動(dòng)編寫(xiě)摘要”功能[J];電腦愛(ài)好者;1998年04期

3 龔筆宏;SCC——利用分類(lèi)技術(shù)改進(jìn)的短摘要比較方法[J];清華大學(xué)學(xué)報(bào)(自然科學(xué)版);2005年S1期

4 趙斌;吉根林;曲維光;顧彥慧;;基于轉(zhuǎn)發(fā)圖的微博事件主題摘要方法[J];南京師大學(xué)報(bào)(自然科學(xué)版);2014年01期

5 任昭春;馬軍;陳竹敏;;基于動(dòng)態(tài)主題建模的Web論壇文檔摘要[J];計(jì)算機(jī)研究與發(fā)展;2012年11期

6 易榮鋒;朱六璋;尹文科;;互聯(lián)網(wǎng)視頻摘要信息自動(dòng)抽取[J];計(jì)算機(jī)系統(tǒng)應(yīng)用;2010年10期

7 張雅奇;張定會(huì);江平;;一種提高QR碼安全性的方法[J];信息技術(shù);2012年11期

8 王群;劉群;向明輝;吳渝;;基于局部-空間模型的視頻摘要研究與設(shè)計(jì)[J];計(jì)算機(jī)工程;2011年02期

9 宋人杰;曹振麗;顧寧;;站內(nèi)搜索系統(tǒng)動(dòng)態(tài)摘要算法的研究[J];計(jì)算機(jī)工程與設(shè)計(jì);2008年05期

10 ;診所[J];數(shù)字通信;2005年21期

相關(guān)博士學(xué)位論文前2條

1 金鋒;文檔摘要算法的研究與應(yīng)用[D];清華大學(xué);2011年

2 嚴(yán)睿;演進(jìn)式動(dòng)態(tài)新聞文檔摘要生成方法研究[D];北京大學(xué);2013年

相關(guān)碩士學(xué)位論文前10條

1 郭海蓉;增量聚類(lèi)在動(dòng)態(tài)多文檔摘要中的研究與應(yīng)用[D];西南科技大學(xué);2015年

2 郝輝輝;基于詞向量和主題模型的生物醫(yī)學(xué)摘要技術(shù)[D];大連理工大學(xué);2015年

3 蘭怡潔;基于情感的視頻摘要研究[D];北京交通大學(xué);2017年

4 李輝;基于語(yǔ)義關(guān)系的摘要提取[D];南京理工大學(xué);2004年

5 向文韜;DTS中路徑敏感的摘要技術(shù)研究與實(shí)現(xiàn)[D];北京郵電大學(xué);2013年

6 安文佳;基于閱讀行為的圖書(shū)章節(jié)摘要生成研究[D];浙江大學(xué);2014年

7 季知祥;電子商務(wù)中針對(duì)產(chǎn)品的摘要挖掘技術(shù)研究[D];哈爾濱工業(yè)大學(xué);2011年

8 劉紅艷;基于hLDA層次主題模型的多文檔摘要技術(shù)研究[D];北京郵電大學(xué);2012年

9 胡成杰;Java語(yǔ)言基于函數(shù)摘要的過(guò)程間靜態(tài)測(cè)試方法研究[D];北京郵電大學(xué);2011年

10 唐向陽(yáng);基于簡(jiǎn)化MD5摘要技術(shù)快照差分算法的研究[D];暨南大學(xué);2011年

，

本文編號(hào)：2494252

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會(huì)員下載

Download by Member

本文鏈接：http://sikaile.net/shoufeilunwen/xxkjbs/2494252.html

上一篇：基于檢測(cè)的數(shù)據(jù)關(guān)聯(lián)多目標(biāo)跟蹤算法研究
下一篇：基于深度學(xué)習(xí)的短文本分析與計(jì)算方法研究

論文發(fā)表

·知網(wǎng)|萬(wàn)方|維普|龍?jiān)磡省級(jí)|國(guó)家級(jí)|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于監(jiān)督學(xué)習(xí)的bug報(bào)告和源代碼摘要