基于監(jiān)督學(xué)習(xí)的bug報(bào)告和源代碼摘要
[Abstract]:When performing software tasks, developers need to interact with software artifacts such as bug report, source code warehouse and so on. In order to obtain the required information, they may need to read through the whole artifact thoroughly. However, extracting valuable information from bug reports and source code is a tedious and time-consuming task. In order to solve this task efficiently, the researchers suggest that summary information be established automatically for software artifacts. In this paper, in order to facilitate developers to extract the required information efficiently from bug reports and source code warehouses, we propose to use supervised learning technology to establish summary information. We use duplicate bug reports to create bug report summary information as an example of a natural language text summary task. In another study, we performed a source code fragment summary as an example of the source code to source code summary task. For bug report, we develop a bug report summary algorithm based on PageRank, which is called PRST. for short. In this algorithm, three different similarity measures are used to calculate the similarity between the main bug report and the corresponding repeated bug report based on VSM.Jaccard and WordNet, respectively. Due to the lack of the corresponding relationship between the main bug report and the repeated bug report in the publicly available bug report corpus, it is impossible to use the information contained in the duplicate bug report to perform the bug report summary task. Therefore, we extracted 59 bug reports from the Mozilla,KDE,Gnome and Eclipse projects and established a separate bug report corpus called OSCAR. At the same time, we reconstruct the existing BRC corpus by adding repeated bug reports and use it as a comparative corpus. We use several advanced statistical evaluation indexes, namely precision (Precision), recall (Recall), F-Score and Pyramid Precision, to evaluate the effectiveness of the proposed algorithm. The results show that the proposed algorithm can obtain relatively accurate summary information of bug report, and improve the existing supervised bug report and accuracy. Similarly, in order to establish source code summary information, we develop a code fragment summary algorithm based on SVM and NB classifiers (CodeFragment Summarization,CFS) to automatically generate source-to-source summary information in source code fragments. In the software artifact summary paradigm, we first introduce a data-driven small-scale crowdsourcing method to help us extract the syntactic features of the source code. We retrieve 127 code fragments from Eclipse and NetBeans official FAQs and build a code fragment corpus for testing. We also use the statistical evaluation indicators mentioned earlier and compare the existing methods to verify the effectiveness of our proposed method. The results show that our code fragment extractor is more accurate than the existing code fragment summary generation methods, and syntactic features have an important impact on the accuracy of the generated summary information. The generated summary information can effectively help developers solve the software tasks in hand, and effectively improve the performance and quality of the software.
【學(xué)位授予單位】:大連理工大學(xué)
【學(xué)位級(jí)別】:博士
【學(xué)位授予年份】:2016
【分類(lèi)號(hào)】:TP311.5;TP391.1
【相似文獻(xiàn)】
相關(guān)期刊論文 前10條
1 程菲;汪建海;羅鍵;;基于重復(fù)檢測(cè)的多摘要消重方法[J];計(jì)算機(jī)工程與設(shè)計(jì);2006年23期
2 于建原;使用Word 97“自動(dòng)編寫(xiě)摘要”功能[J];電腦愛(ài)好者;1998年04期
3 龔筆宏;SCC——利用分類(lèi)技術(shù)改進(jìn)的短摘要比較方法[J];清華大學(xué)學(xué)報(bào)(自然科學(xué)版);2005年S1期
4 趙斌;吉根林;曲維光;顧彥慧;;基于轉(zhuǎn)發(fā)圖的微博事件主題摘要方法[J];南京師大學(xué)報(bào)(自然科學(xué)版);2014年01期
5 任昭春;馬軍;陳竹敏;;基于動(dòng)態(tài)主題建模的Web論壇文檔摘要[J];計(jì)算機(jī)研究與發(fā)展;2012年11期
6 易榮鋒;朱六璋;尹文科;;互聯(lián)網(wǎng)視頻摘要信息自動(dòng)抽取[J];計(jì)算機(jī)系統(tǒng)應(yīng)用;2010年10期
7 張雅奇;張定會(huì);江平;;一種提高QR碼安全性的方法[J];信息技術(shù);2012年11期
8 王群;劉群;向明輝;吳渝;;基于局部-空間模型的視頻摘要研究與設(shè)計(jì)[J];計(jì)算機(jī)工程;2011年02期
9 宋人杰;曹振麗;顧寧;;站內(nèi)搜索系統(tǒng)動(dòng)態(tài)摘要算法的研究[J];計(jì)算機(jī)工程與設(shè)計(jì);2008年05期
10 ;診所[J];數(shù)字通信;2005年21期
相關(guān)博士學(xué)位論文 前2條
1 金鋒;文檔摘要算法的研究與應(yīng)用[D];清華大學(xué);2011年
2 嚴(yán)睿;演進(jìn)式動(dòng)態(tài)新聞文檔摘要生成方法研究[D];北京大學(xué);2013年
相關(guān)碩士學(xué)位論文 前10條
1 郭海蓉;增量聚類(lèi)在動(dòng)態(tài)多文檔摘要中的研究與應(yīng)用[D];西南科技大學(xué);2015年
2 郝輝輝;基于詞向量和主題模型的生物醫(yī)學(xué)摘要技術(shù)[D];大連理工大學(xué);2015年
3 蘭怡潔;基于情感的視頻摘要研究[D];北京交通大學(xué);2017年
4 李輝;基于語(yǔ)義關(guān)系的摘要提取[D];南京理工大學(xué);2004年
5 向文韜;DTS中路徑敏感的摘要技術(shù)研究與實(shí)現(xiàn)[D];北京郵電大學(xué);2013年
6 安文佳;基于閱讀行為的圖書(shū)章節(jié)摘要生成研究[D];浙江大學(xué);2014年
7 季知祥;電子商務(wù)中針對(duì)產(chǎn)品的摘要挖掘技術(shù)研究[D];哈爾濱工業(yè)大學(xué);2011年
8 劉紅艷;基于hLDA層次主題模型的多文檔摘要技術(shù)研究[D];北京郵電大學(xué);2012年
9 胡成杰;Java語(yǔ)言基于函數(shù)摘要的過(guò)程間靜態(tài)測(cè)試方法研究[D];北京郵電大學(xué);2011年
10 唐向陽(yáng);基于簡(jiǎn)化MD5摘要技術(shù)快照差分算法的研究[D];暨南大學(xué);2011年
,本文編號(hào):2494252
本文鏈接:http://sikaile.net/shoufeilunwen/xxkjbs/2494252.html