基于監(jiān)督學習的開源平臺軟件開發(fā)行為研究
發(fā)布時間:2019-04-10 06:35
【摘要】:自二十世紀末以來,蓬勃發(fā)展的開源軟件正在逐步挑戰(zhàn)著傳統(tǒng)專有軟件占主導地位的軟件產業(yè)格局,逐漸增多的開源軟件的出現(xiàn)對軟件產業(yè)的市場結構產生了巨大的影響。分布式開發(fā)模型也在隨著開源軟件開發(fā)需求的轉變而逐步發(fā)展,而基于拖拽式的分布式開發(fā)模型的出現(xiàn)引領了一種新的關于分布式軟件開發(fā)模式的發(fā)展方向。對開源開發(fā)中的開發(fā)行為特征的研究是軟件演化領域的研究熱點,可以幫助開發(fā)者更深刻地理解軟件演化進程中的規(guī)律,從而改進現(xiàn)存的軟件開發(fā)過程。隨著越來越多的開發(fā)人員參與到開源軟件開發(fā)中,一些代碼托管平臺,例如GitHub和BitBucket,逐步開始為分布式軟件開發(fā)提供相應的支持。在對GitHub上的開發(fā)行為進行分析時,需要對海量的關系松散的數據進行處理,而想要獲得其中的深度價值往往需要通過包括機器學習等智能化復雜分析。本文對掛載在GitHub上的使用基于拖拽式開發(fā)模式的開源項目進行分析,發(fā)掘出在該模式下開發(fā)流程周轉、外部貢獻接納以及處理外部貢獻的時間等規(guī)律。分析開發(fā)人員的開發(fā)動作行為,并且根據不同的開發(fā)行為特征對貢獻最后能否被接納的影響力大小去構建預測模型,來預測一個外部貢獻能否最終被采納。在對行為特征進行提取時,考慮加入基于歷史記錄的行為特征,對構建預測模型所需的特征集合進行了有效的補充。本文構建的預測模型要解決的是對拖拽式請求的最終狀態(tài)進行分類的問題,將采用適用大規(guī)模數據監(jiān)督學習算法(支持向量機)來實現(xiàn)大規(guī)模數據的分類。本文將會對所選取的預測模型的表現(xiàn)進行對比,在選擇合適的預測模型上進行研究,并將針對現(xiàn)存的SVM算法,在核函數參數優(yōu)化的過程中存在著計算量過大,學習性能以及識別率不夠高等問題加以改進,最后對預測模型對于數據擬合化的探討。本文的創(chuàng)新研究內容如下:1.研究開源系統(tǒng)中拖拽式請求的接受策略,本文通過對機器學習常見算法分類器對GitHub海量數據特征值進行選取和分類,由于考慮到了測試部分與基于歷史數據的行為特征,在特征集合中引入測試覆蓋、人員歷史成功提交請求率以及項目歷史成功接納請求率因素,對特征值集進行有效擴充。2.為了提升網格搜索效率,本文對網格搜索算法的窮舉模式進行改進,并應用到了預測模型的構建中,提出一種基于模式搜索與網格搜索算法相結合的網格探測參數選擇算法(GDPS)。對構建預測模型運用的SVM核函數的最優(yōu)參數對進行選擇,提升SVM算法學習性能和識別率,從而得到一個準確率更高的預測模型。
[Abstract]:Since the end of the 20th century, the booming open source software is gradually challenging the traditional proprietary software dominant software industry pattern, the emergence of gradually increasing open source software has a great impact on the market structure of the software industry. The distributed development model is gradually developing with the change of open source software development requirements, and the appearance of the drag-and-drop distributed development model leads to the development direction of a new distributed software development model. The research on the characteristics of development behavior in open source development is a hot topic in the field of software evolution, which can help developers to understand the law of software evolution more deeply and improve the existing software development process. As more and more developers are involved in open source software development, some code-managed platforms, such as GitHub and BitBucket, have gradually begun to provide appropriate support for distributed software development. When analyzing the development behavior on GitHub, it is necessary to deal with a large amount of loose data, and in order to obtain the depth value, it is often necessary to use intelligent and complex analysis, such as machine learning, and so on. In this paper, the open source projects based on drag-and-drop development model mounted on GitHub are analyzed, and the rules of development process turnover, external contribution acceptance and processing time of external contribution are found out. This paper analyzes the developer's development action behavior and constructs a prediction model according to the influence of different development behaviors on the final acceptance of the contribution to predict whether an external contribution can eventually be adopted. In the process of extracting behavior features, we consider adding history-based behavior features to effectively complement the set of features needed to construct the prediction model. In this paper, the prediction model is to solve the problem of classification of the final state of drag-and-drop requests, and a large-scale data supervised learning algorithm (support vector machine) will be used to realize the classification of large-scale data. In this paper, the performance of the selected prediction model will be compared, the selection of a suitable prediction model will be studied, and according to the existing SVM algorithm, there will be too much computation in the process of parameter optimization of the kernel function. Some problems such as learning performance and low recognition rate are improved. Finally, the prediction model for data adaptation is discussed. The innovative research contents of this paper are as follows: 1. This paper studies the acceptance strategy of drag-and-drop requests in open source systems. This paper selects and classifies the eigenvalues of GitHub massive data by machine learning common algorithm classifiers, considering the behavior characteristics of the test part and historical data. The feature set is effectively extended by introducing test coverage, human history successful submission request rate, and project historical success acceptance request rate factor into the feature set. 2. In order to improve the efficiency of grid search, this paper improves the exhaustive pattern of grid search algorithm and applies it to the construction of prediction model. A grid detection parameter selection algorithm (GDPS). Based on the combination of pattern search and grid search algorithm is proposed in this paper. The optimal parameter pairs of the SVM kernel function used to construct the prediction model are selected to improve the learning performance and the recognition rate of the SVM algorithm so as to obtain a prediction model with higher accuracy.
【學位授予單位】:哈爾濱工程大學
【學位級別】:碩士
【學位授予年份】:2016
【分類號】:TP311.52
本文編號:2455558
[Abstract]:Since the end of the 20th century, the booming open source software is gradually challenging the traditional proprietary software dominant software industry pattern, the emergence of gradually increasing open source software has a great impact on the market structure of the software industry. The distributed development model is gradually developing with the change of open source software development requirements, and the appearance of the drag-and-drop distributed development model leads to the development direction of a new distributed software development model. The research on the characteristics of development behavior in open source development is a hot topic in the field of software evolution, which can help developers to understand the law of software evolution more deeply and improve the existing software development process. As more and more developers are involved in open source software development, some code-managed platforms, such as GitHub and BitBucket, have gradually begun to provide appropriate support for distributed software development. When analyzing the development behavior on GitHub, it is necessary to deal with a large amount of loose data, and in order to obtain the depth value, it is often necessary to use intelligent and complex analysis, such as machine learning, and so on. In this paper, the open source projects based on drag-and-drop development model mounted on GitHub are analyzed, and the rules of development process turnover, external contribution acceptance and processing time of external contribution are found out. This paper analyzes the developer's development action behavior and constructs a prediction model according to the influence of different development behaviors on the final acceptance of the contribution to predict whether an external contribution can eventually be adopted. In the process of extracting behavior features, we consider adding history-based behavior features to effectively complement the set of features needed to construct the prediction model. In this paper, the prediction model is to solve the problem of classification of the final state of drag-and-drop requests, and a large-scale data supervised learning algorithm (support vector machine) will be used to realize the classification of large-scale data. In this paper, the performance of the selected prediction model will be compared, the selection of a suitable prediction model will be studied, and according to the existing SVM algorithm, there will be too much computation in the process of parameter optimization of the kernel function. Some problems such as learning performance and low recognition rate are improved. Finally, the prediction model for data adaptation is discussed. The innovative research contents of this paper are as follows: 1. This paper studies the acceptance strategy of drag-and-drop requests in open source systems. This paper selects and classifies the eigenvalues of GitHub massive data by machine learning common algorithm classifiers, considering the behavior characteristics of the test part and historical data. The feature set is effectively extended by introducing test coverage, human history successful submission request rate, and project historical success acceptance request rate factor into the feature set. 2. In order to improve the efficiency of grid search, this paper improves the exhaustive pattern of grid search algorithm and applies it to the construction of prediction model. A grid detection parameter selection algorithm (GDPS). Based on the combination of pattern search and grid search algorithm is proposed in this paper. The optimal parameter pairs of the SVM kernel function used to construct the prediction model are selected to improve the learning performance and the recognition rate of the SVM algorithm so as to obtain a prediction model with higher accuracy.
【學位授予單位】:哈爾濱工程大學
【學位級別】:碩士
【學位授予年份】:2016
【分類號】:TP311.52
【參考文獻】
相關期刊論文 前5條
1 羅霖;;大規(guī)模機器學習問題研究[J];艦船電子工程;2013年02期
2 袁霖;王懷民;尹剛;史殿習;李翔;;開源環(huán)境下開發(fā)人員行為特征挖掘與分析[J];計算機學報;2010年10期
3 顧昊;錢曉俊;梁洪亮;;開源平臺下軟件管理技術的研究[J];計算機應用研究;2007年08期
4 趙國棟;黃永中;;開源軟件在高校的應用與推廣策略研究[J];中國遠程教育;2007年01期
5 伍恒,張衛(wèi)民,王靖;軟件的分布式協(xié)同開發(fā)環(huán)境[J];吉首大學學報(自然科學版);2003年01期
相關博士學位論文 前3條
1 張利軍;大規(guī)模機器學習理論研究與應用[D];浙江大學;2012年
2 薛貞霞;支持向量機及半監(jiān)督學習中若干問題的研究[D];西安電子科技大學;2009年
3 王朝勇;支持向量機若干算法研究及應用[D];吉林大學;2008年
相關碩士學位論文 前3條
1 董亞東;面向不平衡分類的邏輯回歸算法[D];鄭州大學;2015年
2 徐奔;開源軟件開發(fā)人員行為特征的可視化挖掘[D];上海交通大學;2013年
3 王梅;一種改進的核函數參數選擇方法[D];西安科技大學;2011年
,本文編號:2455558
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2455558.html
最近更新
教材專著