科技項目相似性檢測系統(tǒng)研究
本文選題:科技項目 + 相似度計算 ; 參考:《杭州電子科技大學》2011年碩士論文
【摘要】:隨著政府對科技項目投入的經(jīng)費逐年增加,科技項目申報數(shù)量和范圍的逐步擴大,導致了項目審查和評審工作量的劇增,同時引起了種種管理上的難題,出現(xiàn)了很多“重復立項”的問題。目前,文檔復制檢測技術在保護知識產(chǎn)權和優(yōu)化搜索引擎方面應用廣泛,但在科技項目管理領域應用甚少。本論文主要研究基于科技項目知識表示模型的一種基于字段的相似度計算方法及其系統(tǒng),便于高效、準確地查找相似的科技項目,向項目評審者提供預警,有效防止同類科技項目重復立項。論文主要研究工作包括: 1針對科技項目知識表示,研究提出一種結合向量空間模型和物元模型的知識表示模型。通過科技項目文本分詞獲取關鍵詞,運用TF方法計算關鍵詞權重,從而建立科技項目知識表示模型。 2基于項目知識表示模型,研究提出一種基于字段結構的科技項目相似度計算方法。本方法以科技項目單個字段作為關鍵詞詞頻統(tǒng)計單元,采用字符串Hash匹配的方法,根據(jù)向量余弦公式計算項目對應字段之間的相似度,再對各字段相似度加權平均得到項目之間的相似度。 3基于以上研究成果開發(fā)了科技項目相似性檢測系統(tǒng)。系統(tǒng)由項目知識庫、項目知識構建模塊、相似度計算模塊、判斷和解釋模塊以及并行計算任務管理模塊組成。首先,項目知識構建模塊為待審查項目知識和已立項項目知識信息構建項目知識模型。然后,相似度計算模塊根據(jù)這兩個項目知識模型計算出相似度。最后,判斷和解釋模塊根據(jù)相似度來判斷項目之間的相似關系。而并行計算任務管理模塊的主要任務則是對相似度計算模塊、判斷和解釋模塊進行并行處理。 本文開發(fā)的科技項目相似性檢測系統(tǒng)已應用于浙江省科技項目管理系統(tǒng)中,應用驗證了論文研究成果的可行性和有效性,對重復立項問題提供了良好的檢查和預警手段。
[Abstract]:With the increase of government expenditure on science and technology projects year by year, the number and scope of scientific and technological projects have been gradually expanded, resulting in a sharp increase in the workload of project review and evaluation, and at the same time causing a variety of management problems. There are a lot of "duplicate project" problems. At present, document copy detection technology is widely used in intellectual property protection and search engine optimization, but it is rarely used in the field of science and technology project management. This paper mainly studies a field based similarity calculation method and its system based on the knowledge representation model of science and technology projects, which is convenient to find similar science and technology projects efficiently and accurately, and provide early warning to project reviewers. Effectively prevent similar scientific and technological projects from being repeated. The main work of this paper is as follows: 1 for the knowledge representation of science and technology projects, a knowledge representation model combining vector space model and matter-element model is proposed. The key words are obtained by the text segmentation of scientific and technological projects, and the weight of keywords is calculated by TF method, and then the model of knowledge representation of scientific and technological projects is established. 2 based on the model of knowledge representation of science and technology projects, This paper presents a method for calculating the similarity of scientific and technological items based on field structure. In this method, the single field of scientific and technological project is used as the key word frequency statistic unit, and the matching method of string Hash is used to calculate the similarity between the corresponding fields according to the vector cosine formula. Then the similarity of each field is weighted to get the similarity between items. 3 based on the above research results, a similarity detection system for scientific and technological projects is developed. The system consists of project knowledge base, project knowledge construction module, similarity calculation module, judgment and interpretation module and parallel computing task management module. First, the project knowledge construction module constructs the project knowledge model for the project knowledge to be examined and the project knowledge information established. Then, the similarity calculation module calculates the similarity according to the two item knowledge models. Finally, the judgment and explanation module judge the similarity between items according to similarity. The main task of the parallel computing task management module is to process the similarity calculation module, judgment and interpretation module in parallel. The similarity detection system of scientific and technological projects developed in this paper has been applied to the scientific and technological project management system of Zhejiang Province. The application verifies the feasibility and effectiveness of the research results in this paper and provides a good means of checking and early warning for repeated projects.
【學位授予單位】:杭州電子科技大學
【學位級別】:碩士
【學位授予年份】:2011
【分類號】:G311;TP315
【參考文獻】
相關期刊論文 前10條
1 麻會東;劉國華;梁鵬;苑迎;;文檔復制檢測技術[J];燕山大學學報;2007年05期
2 孫茂松,鄒嘉彥;漢語自動分詞研究評述[J];當代語言學;2001年01期
3 史彥軍,滕弘飛,金博;抄襲論文識別研究與進展[J];大連理工大學學報;2005年01期
4 金博,史彥軍,滕弘飛;基于語義理解的文本相似度算法[J];大連理工大學學報;2005年02期
5 黨延忠;;項目關聯(lián)分析與宏觀知識挖掘[J];管理學報;2008年04期
6 陳桂林,王永成,韓客松,王剛;一種改進的快速分詞算法[J];計算機研究與發(fā)展;2000年04期
7 宋擒豹,沈鈞毅;數(shù)字商品非法復制和擴散的監(jiān)測機制[J];計算機研究與發(fā)展;2001年01期
8 劉群,張華平,俞鴻魁,程學旗;基于層疊隱馬模型的漢語詞法分析[J];計算機研究與發(fā)展;2004年08期
9 陳文亮;朱靖波;朱慕華;姚天順;;基于領域詞典的文本特征表示[J];計算機研究與發(fā)展;2005年12期
10 李素建;基于語義計算的語句相關度研究[J];計算機工程與應用;2002年07期
相關碩士學位論文 前2條
1 沈斌;基于分詞的中文文本相似度計算研究[D];天津財經(jīng)大學;2006年
2 丁瓊;基于向量空間模型的文本自動分類系統(tǒng)的研究與實現(xiàn)[D];同濟大學;2007年
,本文編號:2021370
本文鏈接:http://sikaile.net/guanlilunwen/keyanlw/2021370.html