基于互聯(lián)網(wǎng)數(shù)據(jù)的專利分析研究
本文選題:技術(shù)生命周期 + 專利定量分析; 參考:《哈爾濱工業(yè)大學(xué)》2017年碩士論文
【摘要】:網(wǎng)絡(luò)使得數(shù)據(jù)量飛速增長,海量的專利數(shù)據(jù)不斷的涌入人們的生活。現(xiàn)如今企業(yè)需要了解相關(guān)的專利情報(bào)信息,以制定更加精確的發(fā)展戰(zhàn)略,可一些隱藏在專利文獻(xiàn)中的信息并沒有得到充分的利用,傳統(tǒng)的基于人工統(tǒng)計(jì)的分析方法忽視了它們的存在,專利分析報(bào)告中也只是一些人工手動(dòng)統(tǒng)計(jì)的分析結(jié)果。因此,本課題通過調(diào)研我國專利信息分析的發(fā)展現(xiàn)狀,在數(shù)據(jù)統(tǒng)計(jì)分析的基礎(chǔ)上,計(jì)算其技術(shù)發(fā)展參數(shù)的變化。除此之外,挖掘潛藏在專利文獻(xiàn)中的可利用的信息,主要集中在專利主題的提取和專利文獻(xiàn)的自動(dòng)分類。為了彌補(bǔ)傳統(tǒng)專利分析報(bào)告內(nèi)容的單調(diào)貧乏和自動(dòng)化書寫,本研究還致力于豐富專利分析報(bào)告內(nèi)容,實(shí)現(xiàn)報(bào)告的自動(dòng)寫作系統(tǒng)。為了得到更多相關(guān)的專利數(shù)據(jù)以及完善專利檢索的性能,調(diào)研了專利查詢?cè)~擴(kuò)展對(duì)結(jié)果的影響;谠~典和百度平臺(tái)得到的擴(kuò)展詞集,雖然得到的結(jié)果較為全面卻不夠精確,相關(guān)反饋與此相反。綜合各個(gè)方法的優(yōu)缺點(diǎn),提出了詞典與相關(guān)反饋相結(jié)合擴(kuò)展查詢的方法,其召回率和精確率均得到了一定的提升;谂老x技術(shù)得到專利數(shù)據(jù)時(shí),為了優(yōu)化僅通過計(jì)算技術(shù)發(fā)展參數(shù)來預(yù)測成熟度的做法,加入了新的衡量參數(shù),即技術(shù)創(chuàng)新度。它的計(jì)算加入了對(duì)文本相似度的分析,并對(duì)本數(shù)據(jù)集從不同角度的分類來計(jì)算技術(shù)創(chuàng)新度。為了探討每年專利申請(qǐng)量的變化趨勢(shì),使用時(shí)間序列預(yù)測算法對(duì)得到的數(shù)據(jù)序列進(jìn)行處理,指數(shù)平滑與ARMA取得了較好的效果,并驗(yàn)證了生命技術(shù)因子的確對(duì)數(shù)據(jù)序列的預(yù)測產(chǎn)生了影響。專利的IPC號(hào)并不是唯一獲取主題的方法,在專利文獻(xiàn)集合中,應(yīng)用文本主題提取算法,可以得到更有針對(duì)性更加細(xì)致的技術(shù)主題關(guān)鍵詞。本文在已得到的數(shù)據(jù)集應(yīng)用了Text Rank、LDA以及TFIDF三種算法,以反映主題的程度作為衡量,Text Rank取得了0.63,雖高于0.55的LDA,但其過于依賴單文檔。通過調(diào)節(jié)LDA選取的初始主題數(shù),發(fā)現(xiàn)當(dāng)設(shè)置其為4時(shí),困惑度最小。對(duì)于專利文檔的自動(dòng)分類,在大類別上的實(shí)驗(yàn)結(jié)果均小于等于0.7,在小類別上的實(shí)驗(yàn)效果明顯提升,其衡量值最低也接近0.7,其中k NN的R值達(dá)到了0.88;谝延械难芯砍晒,本課題為使其更貼近實(shí)際生活應(yīng)用,探討了專利分析系統(tǒng)的實(shí)現(xiàn),并輔助用戶實(shí)現(xiàn)專利分析報(bào)告的寫作。
[Abstract]:The network makes the amount of data grow rapidly, and the massive patent data constantly pour into people's life. Nowadays, companies need to know the relevant patent information in order to formulate a more precise development strategy, but some of the information hidden in the patent literature has not been fully utilized. The traditional analytical methods based on artificial statistics ignore their existence, and the patent analysis report is only some manual statistical analysis results. Therefore, through investigating the current situation of patent information analysis in China, this paper calculates the change of technological development parameters on the basis of statistical analysis of data. In addition, the mining of available information hidden in patent documents mainly focuses on the extraction of patent topics and automatic classification of patent documents. In order to make up for the monotonous and automatic writing of the traditional patent analysis report, this study also focuses on enriching the patent analysis report content and realizing the automatic writing system of the patent analysis report. In order to obtain more relevant patent data and improve the performance of patent retrieval, the effect of patent query word expansion on the results was investigated. The extended lexicon set based on the dictionary and Baidu platform, although the result is more comprehensive but not accurate, the related feedback is the opposite. Combining the advantages and disadvantages of each method, an extended query method combining dictionary and correlation feedback is proposed, and its recall rate and accuracy rate are improved to a certain extent. Based on the patented data of crawler technology, in order to optimize the method of predicting maturity only by calculating technological development parameters, a new measure parameter, technological innovation degree, is added. Its calculation includes the analysis of text similarity and the classification of the data set from different angles to calculate the technological innovation. In order to explore the trend of patent application volume, the time series prediction algorithm is used to process the obtained data series, and the exponential smoothing and ARMA have achieved good results. It is verified that the factors of life technology do have an effect on the prediction of data series. The IPC number of patent is not the only way to obtain the topic. In the collection of patent documents, we can get more pertinence and more meticulous key words of technical topic by applying the text subject extraction algorithm in the collection of patent documents. In this paper, three algorithms, Text Ranker-LDA and TFIDF, are applied to the data sets, which are measured by the degree of topic. The LDAs are 0.63, which are higher than 0.55, but they are too dependent on single document. By adjusting the initial number of topics selected by LDA, it is found that when the number of themes is set to 4, the degree of confusion is minimal. For the automatic classification of patent documents, the experimental results in large categories are less than 0.7, and the experimental results in small categories are obviously improved, and the lowest measurement value is close to 0.7, in which the R value of kNN reaches 0.88. Based on the existing research results, in order to make it more close to the practical application, this paper discusses the realization of patent analysis system, and assists users to write patent analysis report.
【學(xué)位授予單位】:哈爾濱工業(yè)大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:TP391.1
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 王江濤;;機(jī)器人新聞寫作的局限與不足——基于騰訊財(cái)經(jīng)寫作機(jī)器人Dream writer作品的分析[J];傳媒觀察;2016年07期
2 李洪雪;張磊;;2010-2014年中國藥科大學(xué)雜環(huán)化合物專利申請(qǐng)狀況分析[J];中國藥科大學(xué)學(xué)報(bào);2016年02期
3 熊立波;鐘盈炯;林波;;“快筆小新”與機(jī)器人寫作[J];新聞與寫作;2016年02期
4 王悅;支庭榮;;機(jī)器人寫作對(duì)未來新聞生產(chǎn)的深遠(yuǎn)影響——兼評(píng)新華社的“快筆小新”[J];新聞與寫作;2016年02期
5 盧永春;;人工智能推動(dòng)媒體轉(zhuǎn)型[J];中國報(bào)業(yè);2015年23期
6 王博;劉盛博;丁X;劉則淵;;基于LDA主題模型的專利內(nèi)容分析方法[J];科研管理;2015年03期
7 蘇敏;阮卓;張玲;王曉春;孫玉;遲玉琢;;助力學(xué)科報(bào)告的專利檢索與分析[J];圖書館學(xué)刊;2015年01期
8 張惠琴;邵云飛;張宇翔;;基于專利分析的產(chǎn)品技術(shù)成熟度預(yù)測——以液晶顯示技術(shù)為例[J];技術(shù)經(jīng)濟(jì);2014年10期
9 王哲;姜大成;馬運(yùn)運(yùn);孫志一;;解酒類傳統(tǒng)藥物專利信息分析[J];世界科學(xué)技術(shù)-中醫(yī)藥現(xiàn)代化;2014年08期
10 李萌;郭蕾;;日本2012年度發(fā)明專利審查質(zhì)量概況分析[J];產(chǎn)業(yè)與科技論壇;2014年15期
相關(guān)博士學(xué)位論文 前1條
1 蔣勝利;高維數(shù)據(jù)的特征選擇與特征提取研究[D];西安電子科技大學(xué);2011年
相關(guān)碩士學(xué)位論文 前1條
1 黎楠;面向?qū)@闹黝}挖掘技術(shù)研究及應(yīng)用[D];北京工業(yè)大學(xué);2015年
,本文編號(hào):1877254
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/1877254.html