科技項(xiàng)目相似性檢測(cè)系統(tǒng)研究
本文選題:科技項(xiàng)目 + 相似度計(jì)算 ; 參考:《杭州電子科技大學(xué)》2011年碩士論文
【摘要】:隨著政府對(duì)科技項(xiàng)目投入的經(jīng)費(fèi)逐年增加,科技項(xiàng)目申報(bào)數(shù)量和范圍的逐步擴(kuò)大,導(dǎo)致了項(xiàng)目審查和評(píng)審工作量的劇增,同時(shí)引起了種種管理上的難題,出現(xiàn)了很多“重復(fù)立項(xiàng)”的問題。目前,文檔復(fù)制檢測(cè)技術(shù)在保護(hù)知識(shí)產(chǎn)權(quán)和優(yōu)化搜索引擎方面應(yīng)用廣泛,但在科技項(xiàng)目管理領(lǐng)域應(yīng)用甚少。本論文主要研究基于科技項(xiàng)目知識(shí)表示模型的一種基于字段的相似度計(jì)算方法及其系統(tǒng),便于高效、準(zhǔn)確地查找相似的科技項(xiàng)目,向項(xiàng)目評(píng)審者提供預(yù)警,有效防止同類科技項(xiàng)目重復(fù)立項(xiàng)。論文主要研究工作包括: 1針對(duì)科技項(xiàng)目知識(shí)表示,研究提出一種結(jié)合向量空間模型和物元模型的知識(shí)表示模型。通過科技項(xiàng)目文本分詞獲取關(guān)鍵詞,運(yùn)用TF方法計(jì)算關(guān)鍵詞權(quán)重,從而建立科技項(xiàng)目知識(shí)表示模型。 2基于項(xiàng)目知識(shí)表示模型,研究提出一種基于字段結(jié)構(gòu)的科技項(xiàng)目相似度計(jì)算方法。本方法以科技項(xiàng)目單個(gè)字段作為關(guān)鍵詞詞頻統(tǒng)計(jì)單元,采用字符串Hash匹配的方法,根據(jù)向量余弦公式計(jì)算項(xiàng)目對(duì)應(yīng)字段之間的相似度,再對(duì)各字段相似度加權(quán)平均得到項(xiàng)目之間的相似度。 3基于以上研究成果開發(fā)了科技項(xiàng)目相似性檢測(cè)系統(tǒng)。系統(tǒng)由項(xiàng)目知識(shí)庫(kù)、項(xiàng)目知識(shí)構(gòu)建模塊、相似度計(jì)算模塊、判斷和解釋模塊以及并行計(jì)算任務(wù)管理模塊組成。首先,項(xiàng)目知識(shí)構(gòu)建模塊為待審查項(xiàng)目知識(shí)和已立項(xiàng)項(xiàng)目知識(shí)信息構(gòu)建項(xiàng)目知識(shí)模型。然后,相似度計(jì)算模塊根據(jù)這兩個(gè)項(xiàng)目知識(shí)模型計(jì)算出相似度。最后,判斷和解釋模塊根據(jù)相似度來判斷項(xiàng)目之間的相似關(guān)系。而并行計(jì)算任務(wù)管理模塊的主要任務(wù)則是對(duì)相似度計(jì)算模塊、判斷和解釋模塊進(jìn)行并行處理。 本文開發(fā)的科技項(xiàng)目相似性檢測(cè)系統(tǒng)已應(yīng)用于浙江省科技項(xiàng)目管理系統(tǒng)中,應(yīng)用驗(yàn)證了論文研究成果的可行性和有效性,對(duì)重復(fù)立項(xiàng)問題提供了良好的檢查和預(yù)警手段。
[Abstract]:With the increase of government expenditure on science and technology projects year by year, the number and scope of scientific and technological projects have been gradually expanded, resulting in a sharp increase in the workload of project review and evaluation, and at the same time causing a variety of management problems. There are a lot of "duplicate project" problems. At present, document copy detection technology is widely used in intellectual property protection and search engine optimization, but it is rarely used in the field of science and technology project management. This paper mainly studies a field based similarity calculation method and its system based on the knowledge representation model of science and technology projects, which is convenient to find similar science and technology projects efficiently and accurately, and provide early warning to project reviewers. Effectively prevent similar scientific and technological projects from being repeated. The main work of this paper is as follows: 1 for the knowledge representation of science and technology projects, a knowledge representation model combining vector space model and matter-element model is proposed. The key words are obtained by the text segmentation of scientific and technological projects, and the weight of keywords is calculated by TF method, and then the model of knowledge representation of scientific and technological projects is established. 2 based on the model of knowledge representation of science and technology projects, This paper presents a method for calculating the similarity of scientific and technological items based on field structure. In this method, the single field of scientific and technological project is used as the key word frequency statistic unit, and the matching method of string Hash is used to calculate the similarity between the corresponding fields according to the vector cosine formula. Then the similarity of each field is weighted to get the similarity between items. 3 based on the above research results, a similarity detection system for scientific and technological projects is developed. The system consists of project knowledge base, project knowledge construction module, similarity calculation module, judgment and interpretation module and parallel computing task management module. First, the project knowledge construction module constructs the project knowledge model for the project knowledge to be examined and the project knowledge information established. Then, the similarity calculation module calculates the similarity according to the two item knowledge models. Finally, the judgment and explanation module judge the similarity between items according to similarity. The main task of the parallel computing task management module is to process the similarity calculation module, judgment and interpretation module in parallel. The similarity detection system of scientific and technological projects developed in this paper has been applied to the scientific and technological project management system of Zhejiang Province. The application verifies the feasibility and effectiveness of the research results in this paper and provides a good means of checking and early warning for repeated projects.
【學(xué)位授予單位】:杭州電子科技大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2011
【分類號(hào)】:G311;TP315
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 麻會(huì)東;劉國(guó)華;梁鵬;苑迎;;文檔復(fù)制檢測(cè)技術(shù)[J];燕山大學(xué)學(xué)報(bào);2007年05期
2 孫茂松,鄒嘉彥;漢語(yǔ)自動(dòng)分詞研究評(píng)述[J];當(dāng)代語(yǔ)言學(xué);2001年01期
3 史彥軍,滕弘飛,金博;抄襲論文識(shí)別研究與進(jìn)展[J];大連理工大學(xué)學(xué)報(bào);2005年01期
4 金博,史彥軍,滕弘飛;基于語(yǔ)義理解的文本相似度算法[J];大連理工大學(xué)學(xué)報(bào);2005年02期
5 黨延忠;;項(xiàng)目關(guān)聯(lián)分析與宏觀知識(shí)挖掘[J];管理學(xué)報(bào);2008年04期
6 陳桂林,王永成,韓客松,王剛;一種改進(jìn)的快速分詞算法[J];計(jì)算機(jī)研究與發(fā)展;2000年04期
7 宋擒豹,沈鈞毅;數(shù)字商品非法復(fù)制和擴(kuò)散的監(jiān)測(cè)機(jī)制[J];計(jì)算機(jī)研究與發(fā)展;2001年01期
8 劉群,張華平,俞鴻魁,程學(xué)旗;基于層疊隱馬模型的漢語(yǔ)詞法分析[J];計(jì)算機(jī)研究與發(fā)展;2004年08期
9 陳文亮;朱靖波;朱慕華;姚天順;;基于領(lǐng)域詞典的文本特征表示[J];計(jì)算機(jī)研究與發(fā)展;2005年12期
10 李素建;基于語(yǔ)義計(jì)算的語(yǔ)句相關(guān)度研究[J];計(jì)算機(jī)工程與應(yīng)用;2002年07期
相關(guān)碩士學(xué)位論文 前2條
1 沈斌;基于分詞的中文文本相似度計(jì)算研究[D];天津財(cái)經(jīng)大學(xué);2006年
2 丁瓊;基于向量空間模型的文本自動(dòng)分類系統(tǒng)的研究與實(shí)現(xiàn)[D];同濟(jì)大學(xué);2007年
,本文編號(hào):2021370
本文鏈接:http://sikaile.net/guanlilunwen/keyanlw/2021370.html