科學術語本體構建的社會化方法
發(fā)布時間:2017-12-27 21:32
本文關鍵詞:科學術語本體構建的社會化方法 出處:《中國科學技術大學》2016年博士論文 論文類型:學位論文
更多相關文章: 科學術語本體 社會化投票 LDA 主題層級 領域關鍵詞表
【摘要】:一般來說,本體至少包含兩個要素:領域概念和概念之間的關系。科學術語本體指的是,在一個科學領域里,由領域概念和概念之間的層級關系構成的一種簡單形式的本體。科學術語本體在科研項目管理、研究評價(Research Assessment Exercise)等活動中扮演著極其重要的角色,因為科學術語本體能夠準確地將一個科學領域里的資源做詳細的分類,從而提高信息檢索效率。例如,在中國國家自然科學基金委,近幾年,平均每年都收到超過170,000份的基金申請書。平均來說,每個基金委的項目主任(Program Director)在不到三周的時間內(nèi),要負責超過1,500份申請書的項目評議專家指派工作。實踐當中,大多數(shù)項目主任都采取這樣的策略:先把項目申請書分組,然后指派項目評議專家。為了幫助項目主任快速地、宏觀上把握所負責項目申請書的內(nèi)容,從而提高分組效率,我們亟需構建科學術語本體。當前術語本體構建方法主要由兩類:一類是手工方式構建,另外一類是自動構建。手工方式構建術語本體一般由領域決策者(Domain Decision Makers)主導,如基金委的管理人員、期刊編輯、本體工程師等。自動構建術語本體依賴于計算機算法處理自然語言。以質(zhì)量和效率兩方面作為標準來評價兩類術語本體構建方法:手工方式構建的術語本體一般質(zhì)量比較高,沒有噪音數(shù)據(jù),但是費時費力,并且對領域決策者的技能要求比較高。相比較而言,自動方式構建術語本體能夠在短時間內(nèi)處理大量數(shù)據(jù),并且能及時更新,但是這樣構建的術語本體質(zhì)量較低,經(jīng)常有噪音數(shù)據(jù)。為了兼顧質(zhì)量和效率兩方面,我們提出了第三種術語本體構建方法:社會化方式構建術語本體。社會化方式構建術語本體之所以可行,得益于我們所處的Web 2.0時代。各式各樣的社會化媒體能夠把人們方便地聚集在網(wǎng)絡上協(xié)同工作。尤其是科研社交網(wǎng)絡的興起(如ResearchGate、科研之友等)能夠使一個科學領域的學者跨越時間、空間交流。社會化方式構建術語本體的本質(zhì)就是通過科研社交網(wǎng)絡,鼓勵一個科學領域的學者積極參與到術語本體的構建過程中去,從而減輕領域決策者的負擔。綜上所述,本文的研究問題是:如何以社會化的方式構建科學術語本體?構建一個科學領域的術語本體包含兩個核心的任務:(1)構建領域關鍵詞表;(2)生成關鍵詞之間的層級關系。本文的研究目標包含以下三個方面:(1)提出一個社會化方式構建科學術語本體的統(tǒng)一可擴展的理論框架;(2)設計社會化投票方式構建領域關鍵詞表的方法并實現(xiàn);(3)設計以關鍵詞相似度和專指度生成關鍵詞層級關系的方法。在信息系統(tǒng)研究領域,行為科學(Behavioral Science)和設計科學(Design Science)是兩個主要范式。行為科學致力于構建和檢驗理論(Theories),用以描述、解釋或預測人和組織的行為,設計科學專注于創(chuàng)造和檢驗人工物(Artifacts),從而拓展人和組織的能力。本研究遵循設計科學研究方法。總體上,本文包含構造(Build)和評價(Evaluate)兩個階段。在構造階段,我們首先提出了以社會化投票方式構建領域關鍵詞表的方法,其次設計了集成了LDA主題模型和包容層次結構模型(Subsumption Hierarchy Model)的關鍵詞層級結構生成方法。在評價階段,我們首先通過問卷(Survey)的方式評價了以社會化投票方式構建領域關鍵詞表的方法,其次,以實驗(Experiment)的方法對關鍵詞層級結構生成方法的LDA主題模型部分進行了評價,再次,以實驗的方法對關鍵詞層級結構生成方法的包容層次結構模型部分進行了評價,最后,以用戶研究(User Study)的方法對整個術語本體構建方法進行了評價。在理論上本研究(1)提出了一個社會化方式構建科學術語本體的統(tǒng)一可擴展的理論框架;(2)設計了以社會化投票方式構建領域關鍵詞表的方法;(3)設計了以關鍵詞相似度和專指度生成關鍵詞層級關系的方法。在實踐方面,本研究提出的領域關鍵詞表構建方法被應用于中國國家自然科學基金委的項目評審工作中。據(jù)我們了解,全國科學技術名詞審定委員會每年都要耗費大量的人力、物力做技術名詞規(guī)范工作,但大都用手工的方式,本研究為類似的組織提供了構建科學領域術語本體的備擇方案。
[Abstract]:Generally speaking, the noumenon contains at least two elements: the relationship between the domain concept and the concept. The noumenon of scientific terms refers to a simple form of ontology formed by the hierarchy of concepts and concepts in a scientific field. The scientific term ontology evaluation on scientific project management, (Research Assessment Exercise) plays a very important role in the activities of scientific terminology because ontology can accurately be a science in the field of resources to do a detailed classification, so as to improve the efficiency of information retrieval. In China, for example, in recent years, the National Natural Science Foundation of China has received more than 170000 applications per year on average. On average, the Program Director of each fund committee is responsible for more than 1500 applications of project evaluation experts in less than three weeks. In practice, most project directors adopt such strategies: first group project applications, and then assign project experts. In order to help the project director to quickly and macroscopically grasp the content of the project application, so as to improve the efficiency of the group, we need to build the scientific terminology ontology. Currently, there are two main types of terminology ontology construction methods: one is constructed by hand, and the other is automatic. Manual construction of terminology ontology is generally dominated by Domain Decision Makers, such as fund managers, journal editors, ontology engineers, etc. Automatic construction of terminology ontology relies on computer algorithms for natural language processing. Two aspects of quality and efficiency are used as criteria to evaluate two kinds of terminology ontology construction methods: manually constructed noumenal ontology is generally of high quality and no noise data, but time-consuming and laborious, and has high skill requirements for domain decision makers. In contrast, automatic construction of term ontology can process large amounts of data in a short time and update in time, but the quality of noumenon constructed in this way is of low quality and often has noisy data. In order to take into account the two aspects of quality and efficiency, we have proposed third ways to construct the noumenon of terminology: the socialized way to construct the terminology ontology. The socialized way to build the terminology ontology is feasible, thanks to our Web 2 era. A variety of social media can easily gather people to work together on the network. In particular, the rise of scientific research social networks (such as ResearchGate, friends of scientific research, etc.) can enable scholars in a scientific field to cross over time and space. The essence of socialized way to build terminology ontology is to encourage a scientific scholar to participate in the construction of terminology ontology through scientific research social network, so as to lighten the burden of decision-makers in the field. To sum up, the research question in this paper is: how to build a scientific terminology ontology in a socialized way? To build a scientific term noumenon contains two core tasks: (1) building domain keywords list; (2) generating hierarchical relations between keywords. The goal of this paper includes the following three aspects: (1) proposed a unified theoretical framework to construct a scientific term ontology socialization mode can be extended; (2) the design of social voting method to build domain keyword list and implementation; (3) the design method of generating keywords hierarchy to keyword similarity and the specificity of the. In the field of information system research, Behavioral Science (Design) and Design Science (Design Science) are the main paradigms. Behavioral science is committed to building and testing theory (Theories), which is used to describe, explain or predict human and organizational behavior, design science to focus on creating and testing artifacts (Artifacts), so as to expand the capabilities of people and organizations. This study follows the design of scientific research methods. On the whole, this article contains two stages of structure (Build) and evaluation (Evaluate). In the construction stage, we first put forward a method of constructing domain keyword list based on social voting. Secondly, we designed a keyword hierarchy structure generation method which integrated LDA theme model and Subsumption Hierarchy Model. In the evaluation stage, we firstly through the questionnaire (Survey) of the evaluation methods of constructing, keyword tables to social voting, secondly, to experiment (Experiment) part of the LDA theme model of hierarchical structure keywords generation method of the method was evaluated again, experimental methods to the subsumption hierarchy model of keywords hierarchy the generation methods were evaluated, finally, to the user (User Study) of the method was evaluated for the term ontology construction method. In theory, this study (1) proposed a unified theoretical framework to construct a scientific term ontology socialization mode can be extended; (2) the design of the construction method of keyword table in social voting; (3) design method to generate keywords similarity and the specificity of the key words of the hierarchy. In practice, the construction method of domain keyword table proposed by this research is applied to the project evaluation work of the National Natural Science Foundation of China. According to our understanding, the national science and technology term Approval Committee consumes a lot of manpower and material resources to do technical nouns standardization work. But most of them use manual way, this study provides alternative programs for similar organizations to build ontology in scientific domain.
【學位授予單位】:中國科學技術大學
【學位級別】:博士
【學位授予年份】:2016
【分類號】:TP391.1
,
本文編號:1343262
本文鏈接:http://sikaile.net/shoufeilunwen/xxkjbs/1343262.html
最近更新
教材專著