面向復(fù)雜語義的專利本體構(gòu)建方法研究
發(fā)布時(shí)間:2018-04-06 07:39
本文選題:專利結(jié)構(gòu) 切入點(diǎn):實(shí)體關(guān)系 出處:《武漢大學(xué)》2014年博士論文
【摘要】:專利數(shù)據(jù)涵蓋了世界上95%的最新技術(shù)、發(fā)明,專利數(shù)量和質(zhì)量已經(jīng)成為一個(gè)企業(yè)、行業(yè)乃至國家經(jīng)濟(jì)競爭力的標(biāo)志。對專利數(shù)據(jù)進(jìn)行有效地利用,可以為企業(yè)研發(fā)決策提供支持,使企業(yè)有效地節(jié)省重復(fù)開發(fā)的成本。隨著人們知識產(chǎn)權(quán)意識的增強(qiáng),專利數(shù)據(jù)增長迅猛,往往與同一項(xiàng)技術(shù)相關(guān)的專利數(shù)量較多,形成了一個(gè)專利群,其中的專利在實(shí)現(xiàn)原理或技術(shù)細(xì)節(jié)上存在著不同程度的相似性。在數(shù)據(jù)呈爆炸性增長的現(xiàn)代社會,人們期望以最便捷的方式獲得所需要的信息。 已有的專利分析方法未考慮關(guān)鍵詞之間的語義關(guān)系,這類方法主要基于技術(shù)關(guān)鍵字進(jìn)行統(tǒng)計(jì)分析,將專利文檔建模為由關(guān)鍵字對應(yīng)的權(quán)重構(gòu)成的向量,以向量空間模型計(jì)算專利文檔之間的相似性。然而,基于這一模型無法甄別專利群中的不同專利間使用的語義相同或相近的不同技術(shù)關(guān)鍵詞。因此,如果在分析過程中考慮專利中語義信息的利用,將會獲得比較好的分析效果。 本項(xiàng)目將研究包含豐富語義關(guān)系的專利構(gòu)成信息的抽取,并通過本體組織和管理這些結(jié)構(gòu)數(shù)據(jù),試圖基于專利結(jié)構(gòu)相關(guān)語義知識進(jìn)行深入的專利分析。 由于從專利文檔中抽取信息以文本理解為基礎(chǔ),雖然已有一些研究工作以信息處理技術(shù)為支撐實(shí)現(xiàn)了經(jīng)濟(jì)、生物、化學(xué)等領(lǐng)域的數(shù)據(jù)抽取并通過本體來管理,但這些方法不適用于從中文專利文檔中抽取實(shí)體關(guān)系和這些實(shí)體關(guān)系的組織和管理。這是因?yàn)閷@Y(jié)構(gòu)信息獲取過程及專利分析應(yīng)用中存在著一些特有問題: (1)專利文檔蘊(yùn)含豐富的結(jié)構(gòu)相關(guān)的實(shí)體關(guān)系數(shù)據(jù),建模專利結(jié)構(gòu)本體時(shí)需要對專利中包含的概念及其關(guān)系建立分類,以盡可能全面、有效地反映專利結(jié)構(gòu)中的實(shí)體間語義關(guān)系的差別和特性; (2)文本中描述了專利組成部分之間的物理位置關(guān)系和動態(tài)關(guān)系,文本表達(dá)靈活,句子結(jié)構(gòu)復(fù)雜,而且其中還出現(xiàn)大量描述實(shí)體名稱和關(guān)系特征的單個(gè)技術(shù)專利獨(dú)有的新技術(shù)術(shù)語,實(shí)體關(guān)系可能包含在短語、句子或多個(gè)句子之間。從中文專利文檔抽取實(shí)體關(guān)系應(yīng)該考慮所有這些因素; (3)利用專利結(jié)構(gòu)本體進(jìn)行專利分析時(shí),將考慮每個(gè)專利的實(shí)體語義關(guān)系對分析結(jié)果的影響。這一過程將會非常復(fù)雜。 但是,另一方面,每個(gè)專利從申請到獲批都經(jīng)過反復(fù)審核、修改,因此專利數(shù)據(jù)符合書寫規(guī)范、數(shù)據(jù)質(zhì)量高。盡管不同技術(shù)領(lǐng)域的專利文檔所描述的新技術(shù)千差萬別,但它們對專利技術(shù)的描述卻表現(xiàn)出很多共同點(diǎn):①專利文檔引入了很多以基本術(shù)語為中心詞的新合成技術(shù)術(shù)語:②描述專利技術(shù)的構(gòu)成時(shí)遵循一定的時(shí)空順序;③在描述新技術(shù)實(shí)現(xiàn)時(shí),對其中的加工處理相關(guān)的實(shí)體關(guān)系進(jìn)行了描述。 利用文檔具有的以上優(yōu)勢特征,解決從專利文檔中實(shí)體語義關(guān)系的抽取問題是有意義的,它將為進(jìn)行深入語義分析、挖掘領(lǐng)域技術(shù)專利知識提供高質(zhì)量的數(shù)據(jù);谶@一思路,我們研究了有效的專利技術(shù)本體建模和數(shù)據(jù)獲取方法,并應(yīng)用本體知識進(jìn)行專利分析。 針對專利文檔書寫質(zhì)量高、所描述的技術(shù)新穎等特點(diǎn),本文研究了專利結(jié)構(gòu)本體構(gòu)建的方法和應(yīng)用,主要完成了以下工作: (1)技術(shù)結(jié)構(gòu)相關(guān)概念及其語義關(guān)系建模 基于關(guān)系實(shí)例是本體概念和關(guān)系的最直觀表現(xiàn)的思想,給出了對關(guān)系實(shí)例進(jìn)行數(shù)據(jù)分析、挖掘的方法:通過層次聚類獲得語義關(guān)系基本分類;利用分類結(jié)果對專利結(jié)構(gòu)圖中的語義關(guān)系賦予關(guān)系類型標(biāo)記,挖掘關(guān)系結(jié)構(gòu)圖中的頻繁模式;進(jìn)而根據(jù)頻繁模式,分析專利中與實(shí)體關(guān)聯(lián)的不同類型關(guān)系的共現(xiàn)情況,最終決定專利本體類及其關(guān)系的模式信息;最后,給出了基于本體中已有類及關(guān)系的推理規(guī)則,通過這些規(guī)則可利用已有關(guān)系實(shí)例獲得專利中的隱含實(shí)體語義關(guān)系。通過實(shí)驗(yàn)證明所提出的建模方法可減少專利本體建模的時(shí)間花費(fèi),這一建模能很好地涵蓋領(lǐng)域?qū)嶓w關(guān)系類型,便于有效組織和管理專利結(jié)構(gòu)相關(guān)知識。 (2)基于自學(xué)習(xí)的專利結(jié)構(gòu)數(shù)據(jù)獲取方法 研究中充分利用了反映專利遵循書寫規(guī)范文檔的各級文本模式特征,提出了一種利用專利文檔中實(shí)現(xiàn)關(guān)系特征詞和實(shí)體關(guān)系的抽取。在文本預(yù)處理階段,通過統(tǒng)計(jì)學(xué)習(xí),獲得關(guān)系實(shí)例在對應(yīng)文本段中表現(xiàn)各種模式特征如:字詞搭配、短語構(gòu)成、句間關(guān)系等形成的多級模式規(guī)則;然后,給定少量實(shí)體實(shí)例關(guān)系作為種子,基于種子關(guān)系實(shí)例具有的語義特征構(gòu)造初始的關(guān)系抽取模板,通過自學(xué)習(xí)方法抽取多元實(shí)體關(guān)系;最后,通過文本段解析過程獲得句間隱含的實(shí)體關(guān)系。 (3)專利知識數(shù)據(jù)的典型應(yīng)用 在專利分析典型示范應(yīng)用方面,給出了基于貪婪算法的專利技術(shù)結(jié)構(gòu)對比分析的方法,進(jìn)而提出了基于相似子結(jié)構(gòu)自底向上計(jì)算專利相似性的方法;進(jìn)而基于專利結(jié)構(gòu)相似性對專利進(jìn)行聚類分析,分析專利權(quán)人的技術(shù)相似度。實(shí)驗(yàn)結(jié)果證明了專利結(jié)構(gòu)知識可提高專利分析結(jié)果的準(zhǔn)確性。 (4)專利本體的構(gòu)建與應(yīng)用實(shí)現(xiàn) 實(shí)現(xiàn)了本體構(gòu)建過程,包括:利用本體工具建立通過實(shí)例挖掘獲得的專利模式;從文檔抽取各類關(guān)系特征詞、句子構(gòu)成模式等信息;從文檔抽取關(guān)系實(shí)例。最后,提出了基于專利權(quán)人相似性的專利知識實(shí)現(xiàn)了用戶合作伙伴推薦的新型專利分析方法。
[Abstract]:The patent data covers the latest technology, 95% of the world's invention, the quantity and quality of patent has become a symbol of enterprise, industry and national economic competitiveness. The effective use of patent data, can provide support for enterprise development decision-making, to enable enterprises to effectively save the cost of duplication of development. With the enhancement of people's intellectual property rights awareness of the patent data is growing rapidly, often associated with the same patent number, the formation of a patent group, the patent has different degree of similarity in the realization of the principle or technical details. In modern society the explosive growth in data, people expect to get the needed information in the most convenient way.
The patent analysis method does not consider the semantic relations between keywords, this kind of method is mainly based on the key technology of statistical analysis, the weight vector patent document modeling grounds keywords corresponding composition, using vector space model to calculate the similarity between the patent documents. However, different technical words of this model can not identify different patent patent group the use of the same or similar meaning based. Therefore, if consider the use of semantic information in the patent in the process of analysis, analysis will get better effect.
This project will study the extraction of patent composition information including rich semantic relations, and organize and manage these structural data through ontology, trying to conduct in-depth patent analysis based on patent structure and related semantic knowledge.
The extraction of information from patent documents to text understanding, although there have been some research work on information processing technology to support the realization of economic, biological, chemical and other fields of data extraction and through ontology management, organization and management but these methods are not applicable to the Chinese patent document entity relation extraction and the relationship between these entities. This is because the patent structure information acquisition process and application of patent analysis there are some special problems:
(1) patent documents contain abundant structure related entity relational data. When modeling patent structure ontology, we need to classify the concepts and relationships contained in patents, so as to comprehensively and effectively reflect the differences and characteristics of semantic relations among entities in the patent structure.
(2) the text describes the physical location of the relationship between patent component and dynamic relationship, text expression flexible, complex sentence structures, but which also appeared new technical terms a single patent describes the name of the entity and relationship characteristics of the unique, entity relationship may include in a phrase, sentence or more from sentences. Chinese patent document entity relation extraction should consider all these factors;
(3) the impact of the entity semantic relations of each patent on the results of the analysis will be considered when the patent analysis is used for patent analysis. This process will be very complex.
But, on the other hand, each patent from application to approval have been repeated examination, modify, so patent data in accordance with the written specification, data of high quality. Although the new technology of different technology in the field of patent documents described vary, but they describe the patent technology has shown many points in common: the patent document introduces many in basic terms for the new synthetic technology center: the term word form description of patent technology to follow a spatial and temporal order; in the description of new technology implementation, the entity relation processing which describe the.
The document has the characteristics of the above advantages, solve the patent documents extraction entity semantic relation is meaningful, it will provide in-depth semantic analysis, knowledge mining technology patents to provide high quality data. Based on this idea, we study the patent technology of ontology modeling and data acquisition and application of effective method. Ontology for patent analysis.
Aiming at the characteristics of high quality of patent documents and the novel technology described, this paper studies the methods and applications of ontology construction of patent structure.
(1) the concept of technical structure and its semantic relationship modeling
Based on the relationship between ontology concepts and relations are examples of the most intuitive performance of the thought, gives the relationship instance data analysis, mining methods: to obtain the basic classification of semantic relations through hierarchical clustering using semantic relations; the classification results of the patent in the structure diagram to type marker, frequent pattern mining and relationship structure diagram; according to the frequent patterns of co-occurrence of different types of analysis of the relationship between patent and entity related information, the final decision model of patent ontology classes and their relationships; finally, the existing ontology and reasoning rules are given based on the relationship, through these rules can be used in the existing examples to obtain the underlying entity semantic relation in the patent modeling. The proposed method is proved by experiments can reduce the time cost of patent ontology modeling, this model can well cover the fields of entity relationship type, then Effective organization and management of patent structure related knowledge.
(2) the method of data acquisition of patent structure based on self learning
Study on the full use of the reflection levels of text mode characteristics of patent follow the written specification document, presented by the realization of the relationship between the characteristics and the relationship between the entity extraction of patent document. In the text preprocessing phase, through statistical learning, obtain the relation instances in the corresponding text in the performance of various pattern features such as word collocation, phrases the rules of pattern formation, multi-level relationship between sentences and so on; then, a small amount of a given entity instance relationship as seed, relation extraction template initial semantic feature structure is based on the example of the relationship between seed, through self-learning method to extract multi entity relation; finally, obtain the implicit entity relationship between sentences through text segment parsing process.
(3) typical application of patent knowledge data
In a typical demonstration of the application of patent analysis, method of comparative analysis of patent technology structure based on greedy algorithm is given, and then put forward the similar sub structure bottom-up computation of patent similarity based methods; and then clustering analysis of patent patent similar structure based on similarity analysis technology patent. The experimental result shows that the structure of patent knowledge can improve the accuracy of patent analysis results.
(4) the construction and application of the patent ontology
The process of ontology construction, ontology building tools include: the use of examples obtained from patent mining model; document extraction of all kinds of relations between feature words, sentence patterns and other information from the document; extract relation instances. Finally, put forward the patentee similarity of patent knowledge to achieve a new patent user partner recommendation analysis method based on.
【學(xué)位授予單位】:武漢大學(xué)
【學(xué)位級別】:博士
【學(xué)位授予年份】:2014
【分類號】:TP391.1
【參考文獻(xiàn)】
相關(guān)期刊論文 前2條
1 周煒;鄭建榮;顏建軍;;基于子圖同構(gòu)與事例匹配的裝配體局部結(jié)構(gòu)相似性分析[J];計(jì)算機(jī)輔助設(shè)計(jì)與圖形學(xué)學(xué)報(bào);2010年02期
2 吳平博;陳群秀;馬亮;;基于時(shí)空分析的線索性事件的抽取與集成系統(tǒng)研究[J];中文信息學(xué)報(bào);2006年01期
,本文編號:1718599
本文鏈接:http://sikaile.net/falvlunwen/zhishichanquanfa/1718599.html
最近更新
教材專著