基于本體的健康知識庫自動構(gòu)建方法研究

發(fā)布時間：2019-05-20 00:13

【摘要】：隨著在線問診平臺的普及,人們逐漸積累了大量的問診數(shù)據(jù)。如何準(zhǔn)確地從這些數(shù)據(jù)中提取出更多有用的醫(yī)療健康信息,進(jìn)而形成一個結(jié)構(gòu)化的知識庫供后人使用,是人們面臨的一個問題。信息抽取是解決數(shù)據(jù)提取問題的核心技術(shù),它實現(xiàn)了從雜亂無章的文本中提取結(jié)構(gòu)化數(shù)據(jù)。本課題致力于健康知識庫自動構(gòu)建方法的研究,目的是自動獲取網(wǎng)絡(luò)上的健康問診數(shù)據(jù),從這些非結(jié)構(gòu)化的問診內(nèi)容中提取出疾病癥狀、治療方案、所需檢查等信息,形成一個結(jié)構(gòu)化的健康知識庫。采用基于本體的信息抽取算法實現(xiàn)了對問診對話的信息抽取,并對結(jié)果進(jìn)行結(jié)構(gòu)化存儲。本課題實現(xiàn)了一個面向問診領(lǐng)域的定向爬蟲系統(tǒng),收集實驗所用的數(shù)據(jù),對獲取的數(shù)據(jù)進(jìn)行特征分析和標(biāo)注,并采用三層本體框架構(gòu)建了問診領(lǐng)域本體,詳細(xì)定義出問診對話中的概念和關(guān)系,并用實例進(jìn)行填充。本課題還提出了以關(guān)鍵詞和關(guān)聯(lián)規(guī)則為基礎(chǔ)的規(guī)則生成算法,以及基于本體的抽取算法,首先從標(biāo)注的樣本中提取關(guān)鍵詞,進(jìn)而挖掘其關(guān)聯(lián)關(guān)系生成模式匹配規(guī)則,接著通過解析不同概念的關(guān)系決定它們的抽取順序和范圍,并根據(jù)本體實例對句子進(jìn)行分類和抽取。其中,采用基于特征的對數(shù)似然比算法提取概念關(guān)鍵詞,相比原始的對數(shù)似然比算法進(jìn)一步降低了高頻非特征詞的影響;提出了一種基于關(guān)鍵詞位置屬性搜索頻繁項集的FP-growth算法,過濾掉了存在位置沖突的關(guān)鍵詞形成的抽取規(guī)則,提高了訓(xùn)練出的規(guī)則的可靠性;以本體模型中不同概念的邏輯關(guān)系決定抽取的先后順序,并通過本體實例對句子分類,提升了抽取算法的準(zhǔn)確性。通過對比實驗驗證了本課題提出的改進(jìn)算法均取得了較好的抽取效果,可以實現(xiàn)對問診對話中健康知識的抽取。最后,基于以上研究理論設(shè)計和實現(xiàn)了一個問診健康知識庫的自動構(gòu)建系統(tǒng)。
[Abstract]:With the popularity of online consultation platform, people gradually accumulate a large number of consultation data. How to extract more useful medical and health information from these data accurately, and then form a structured knowledge base for future generations to use, is a problem that people face. Information extraction is the core technology to solve the problem of data extraction, which realizes the extraction of structured data from disorganized text. This topic is devoted to the research of the automatic construction method of health knowledge base, the purpose of which is to automatically obtain the health consultation data on the network, and extract the information such as disease symptoms, treatment plan, needed examination and so on from these unstructured consultation contents. Form a structured health knowledge base. The information extraction algorithm based on ontology is used to extract the information from the consultation dialogue, and the results are stored structurally. In this paper, a directional crawler system oriented to the field of consultation is implemented, the data used in the experiment are collected, the obtained data are analyzed and marked, and the domain ontology of consultation is constructed by using the three-tier ontology framework. The concepts and relationships in the consultation dialogue are defined in detail and filled with examples. This paper also proposes a rule generation algorithm based on keywords and association rules, and an ontology-based extraction algorithm. Firstly, keywords are extracted from tagged samples, and then their association relationship generation pattern matching rules are excavated. Then the extraction order and scope of different concepts are determined by analyzing the relationship between different concepts, and the sentences are classified and extracted according to ontology examples. Among them, the logarithmic likelihood ratio algorithm based on feature is used to extract concept keywords, which further reduces the influence of high frequency non-feature words compared with the original logarithmic likelihood ratio algorithm. In this paper, a FP-growth algorithm based on keyword position attribute search frequent itemsets is proposed, which filters out the extraction rules formed by keyword conflicts and improves the reliability of the trained rules. The order of extraction is determined by the logical relationship of different concepts in ontology model, and the accuracy of extraction algorithm is improved by classifying sentences through ontology examples. Through comparative experiments, it is verified that the improved algorithms proposed in this paper have achieved good extraction results, and can realize the extraction of health knowledge in consultation dialogue. Finally, based on the above research theory, an automatic construction system of consultation health knowledge base is designed and implemented.
【學(xué)位授予單位】：哈爾濱工業(yè)大學(xué)
【學(xué)位級別】：碩士
【學(xué)位授予年份】：2016
【分類號】：TP391.1

【相似文獻(xiàn)】

相關(guān)期刊論文前10條

1 張志強(qiáng),李天柱,張波,陳少飛,郝亞南;基于文檔結(jié)構(gòu)的信息抽取規(guī)則的描述語言比較研究[J];河北大學(xué)學(xué)報(自然科學(xué)版);2004年02期

2 彭祥禮;朱小軍;查志勇;;Web信息抽取和展現(xiàn)系統(tǒng)的設(shè)計與實現(xiàn)[J];電力信息化;2012年02期

3 石倩;陳榮;魯明羽;;基于規(guī)則歸納的信息抽取系統(tǒng)實現(xiàn)[J];計算機(jī)工程與應(yīng)用;2008年21期

4 李洋;;基于Web的信息抽取研究[J];吉林工程技術(shù)師范學(xué)院學(xué)報;2007年12期

5 化柏林;劉一寧;鄭彥寧;;針對學(xué)術(shù)定義的抽取規(guī)則構(gòu)建方法研究[J];情報理論與實踐;2011年12期

6 張志遠(yuǎn);徐濤;馮霞;;航班信息抽取規(guī)則的自動生成技術(shù)[J];計算機(jī)工程;2011年06期

7 李向陽;戴江山;張亞非;;一種Web信息抽取規(guī)則的優(yōu)化方法[J];蘭州理工大學(xué)學(xué)報;2006年01期

8 曲著偉;李敏強(qiáng);;基于數(shù)據(jù)區(qū)域發(fā)現(xiàn)的信息抽取規(guī)則生成方法[J];計算機(jī)工程;2009年22期

9 魏保子;王儒敬;;基于多Agent技術(shù)的分布式信息抽取系統(tǒng)研究[J];微電子學(xué)與計算機(jī);2008年06期

10 方少卿;胡學(xué)鋼;;基于Web挖掘的信息抽取系統(tǒng)的研究[J];銅陵學(xué)院學(xué)報;2010年04期

相關(guān)會議論文前2條

1 葉娜;羅海濤;朱靖波;張斌;;基于歸納邏輯編程的多槽信息抽取規(guī)則自動學(xué)習(xí)方法[A];全國第八屆計算語言學(xué)聯(lián)合學(xué)術(shù)會議（JSCL-2005）論文集[C];2005年

2 楊文柱;徐林昊;郝亞南;陳少飛;李天柱;;個性化的智能Web查詢助手的設(shè)計與實現(xiàn)[A];第十九屆全國數(shù)據(jù)庫學(xué)術(shù)會議論文集（技術(shù)報告篇）[C];2002年

相關(guān)碩士學(xué)位論文前10條

1 魏武;復(fù)雜結(jié)構(gòu)精確Web信息抽取規(guī)則語言與關(guān)鍵技術(shù)研究[D];南京大學(xué);2014年

2 羅鐳;基于用戶交互的半監(jiān)督式Web信息抽取規(guī)則生成技術(shù)研究[D];南京大學(xué);2014年

3 咸珂;基于本體的健康知識庫自動構(gòu)建方法研究[D];哈爾濱工業(yè)大學(xué);2016年

4 余淼;主題搜索引擎的信息抽取和索引的研究[D];重慶大學(xué);2007年

5 莊重;WEB信息抽取的研究[D];湖北工業(yè)大學(xué);2009年

6 於媛;Web信息抽取系統(tǒng)SEU-WIE設(shè)計與實現(xiàn)[D];東南大學(xué);2006年

7 張曉歡;基于本體的產(chǎn)品信息抽取系統(tǒng)的研究[D];天津理工大學(xué);2009年

8 狄慧;基于Agent的Web信息抽取研究[D];大連理工大學(xué);2004年

9 陳建輝;基于模式發(fā)現(xiàn)的在線就業(yè)信息抽取[D];內(nèi)蒙古工業(yè)大學(xué);2006年

10 郭德先;一種模式發(fā)現(xiàn)算法及其Web信息抽取應(yīng)用[D];景德鎮(zhèn)陶瓷學(xué)院;2008年

，

本文編號：2481193

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2481193.html

上一篇：網(wǎng)上書店及庫存一體化管理系統(tǒng)的設(shè)計與實現(xiàn)
下一篇：基于多CCD攝像機(jī)的馬鈴薯分級檢測圖像采集系統(tǒng)設(shè)計

論文發(fā)表

·知網(wǎng)|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于本體的健康知識庫自動構(gòu)建方法研究