面向源碼的功能定位技術(shù)研究
本文選題:軟件聚類 + 搜索 ; 參考:《哈爾濱工程大學(xué)》2016年碩士論文
【摘要】:隨著近幾年開源項(xiàng)目的發(fā)展,個(gè)人通過互聯(lián)網(wǎng)能獲取到越來越多的工程代碼。在缺少需求和設(shè)計(jì)文檔的情況下,想在短時(shí)間內(nèi)僅憑閱讀代碼來理解工程功能模塊組成或某一功能對應(yīng)的代碼十分困難。而當(dāng)前的源碼搜索引擎只能按查詢詞搜索與之匹配的代碼片段,卻無法在將查詢詞視為功能描述的基礎(chǔ)上,從宏觀角度給出涉及的結(jié)構(gòu)信息。因此本文研究如何根據(jù)功能描述在源碼工程中定位相應(yīng)的類結(jié)構(gòu)等信息。為解決該問題,本文提出了一種結(jié)合軟件聚類的功能搜索技術(shù),使用主題分析和軟件聚類,該技術(shù)能得到較為精確的類結(jié)構(gòu)信息。該方法包含三方面工作。首先,提出了針對Java的軟件特征提取方法和特征向量構(gòu)造方式。構(gòu)造的特征矩陣作為層次聚類算法的輸入,輸出相應(yīng)的簇描述。然后,提出了針對類結(jié)構(gòu)的主題分析方法。該方法不以文件為單位,只針對類結(jié)構(gòu)信息分析。由此搜索得到的將不再是代碼片段,而是和功能相關(guān)的一系列類結(jié)構(gòu)信息。最終,綜合利用軟件聚類和主題分析實(shí)現(xiàn)了功能搜索技術(shù)。此過程使用軟件聚類的結(jié)果指導(dǎo)主題分析,主題分析的輸出經(jīng)過軟件聚類過濾,最終展現(xiàn)給用戶更加精確的信息。照此方法,本文設(shè)計(jì)并實(shí)現(xiàn)了一個(gè)原型系統(tǒng)。實(shí)驗(yàn)表明,該原型系統(tǒng)能有效提升軟件聚類的結(jié)果。和GitHub字符串匹配結(jié)果相比,在搜索功能上得到數(shù)據(jù)更加精確,能自動化地排除不相關(guān)結(jié)果,得到良好的功能模塊結(jié)果。在應(yīng)用前景上,該方法在實(shí)際的閱讀源碼、維護(hù)系統(tǒng)方面都有很好的指導(dǎo)作用。開發(fā)者可以借此方法來指導(dǎo)先期探索,加快理解工程,提升開發(fā)效率。進(jìn)一步的發(fā)展,可以借此方法完成功能模塊的復(fù)用。
[Abstract]:With the development of open source projects in recent years, individuals can obtain more and more engineering code through the Internet. In the absence of requirements and design documents, it is very difficult to understand the composition of engineering function modules or the corresponding code of a function in a short period of time by reading the code. However, the current source search engine can only search the corresponding code fragments according to the query words, but it can not give the structural information from the macro point of view on the basis of the function description of the query words. Therefore, this paper studies how to locate the class structure in source code engineering according to the function description. In order to solve this problem, this paper presents a functional search technique combining software clustering, which uses topic analysis and software clustering, which can obtain more accurate structure information. The method includes three aspects of work. Firstly, a software feature extraction method and feature vector construction method for Java are proposed. The constructed feature matrix is used as the input of hierarchical clustering algorithm and the corresponding cluster description is output. Then, a topic analysis method for class structure is proposed. This method does not take the file as the unit, only aims at the class structure information analysis. This search will no longer be a code fragment, but a series of functional-related class structure information. Finally, the function search technology is realized by using software clustering and topic analysis. In this process, the results of software clustering are used to guide the topic analysis, and the output of the topic analysis is filtered by the software clustering, and finally presents more accurate information to the users. According to this method, this paper designs and implements a prototype system. Experiments show that the prototype system can effectively improve the results of software clustering. Compared with the result of GitHub string matching, the search function can get more accurate data, can automatically eliminate irrelevant results, and get good results of function modules. In the application foreground, this method has the very good instruction function in the actual reading source code, the maintenance system aspect. Developers can use this method to guide advance exploration, accelerate understanding of engineering, and improve development efficiency. Further development, we can use this method to complete the reuse of functional modules.
【學(xué)位授予單位】:哈爾濱工程大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2016
【分類號】:TP311.5;TP391.3
【參考文獻(xiàn)】
相關(guān)期刊論文 前5條
1 李運(yùn)田;吳瓊;鄭獻(xiàn)衛(wèi);;改進(jìn)的TF-IDF模型在特征抽取中的應(yīng)用[J];工業(yè)控制計(jì)算機(jī);2014年02期
2 劉石;李合;王嘯吟;張路;謝冰;;基于語法與語義分析的代碼搜索結(jié)果優(yōu)化[J];計(jì)算機(jī)科學(xué);2009年08期
3 胡慶林;葉念渝;朱明富;;數(shù)據(jù)挖掘中聚類算法的綜述[J];計(jì)算機(jī)與數(shù)字工程;2007年02期
4 賀玲;吳玲達(dá);蔡益朝;;數(shù)據(jù)挖掘中的聚類算法綜述[J];計(jì)算機(jī)應(yīng)用研究;2007年01期
5 姜靈敏;;基于改進(jìn)遺傳算法的動態(tài)聚類方法及其應(yīng)用[J];科技管理研究;2005年11期
相關(guān)博士學(xué)位論文 前2條
1 楊瑞龍;基于短語特征的Web文檔聚類方法研究[D];重慶大學(xué);2010年
2 陳毅恒;文本檢索結(jié)果聚類及類別標(biāo)簽抽取技術(shù)研究[D];哈爾濱工業(yè)大學(xué);2010年
相關(guān)碩士學(xué)位論文 前3條
1 周悅來;基于網(wǎng)格和信息熵的聚類算法[D];湖南大學(xué);2011年
2 謝同;基于文本的Web圖片搜索引擎的研究與實(shí)現(xiàn)[D];電子科技大學(xué);2007年
3 羅玫;基于LUCENE2.0的源代碼搜索引擎架構(gòu)的實(shí)現(xiàn)[D];西北工業(yè)大學(xué);2007年
,本文編號:1807590
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/1807590.html