天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁 > 碩博論文 > 信息類碩士論文 >

蛋白質(zhì)磷酸化與疾病關(guān)系抽取研究

發(fā)布時間:2018-01-11 04:03

  本文關(guān)鍵詞:蛋白質(zhì)磷酸化與疾病關(guān)系抽取研究 出處:《中國科學(xué)技術(shù)大學(xué)》2017年碩士論文 論文類型:學(xué)位論文


  更多相關(guān)文章: 生物信息學(xué) 疾病命名實(shí)體識別 醫(yī)學(xué)術(shù)語 語義詞典 條件隨機(jī)場蛋白質(zhì)磷酸化 關(guān)系抽取


【摘要】:蛋白質(zhì)磷酸化是生物體內(nèi)最重要的一種蛋白質(zhì)翻譯后修飾,目前大量的人類疾病都被證實(shí)是由異常的磷酸化修飾所引起的,一些與疾病相關(guān)的磷酸化修飾可以被開發(fā)為疾病的分子標(biāo)志或治療靶標(biāo)。隨著生物醫(yī)學(xué)文獻(xiàn)的爆炸性增長,如何從生物醫(yī)學(xué)文獻(xiàn)中自動抽取蛋白質(zhì)磷酸化與疾病之間的關(guān)系成為相關(guān)領(lǐng)域的研究熱點(diǎn)。蛋白質(zhì)磷酸化與疾病關(guān)系抽取任務(wù)包括疾病命名實(shí)體識別和蛋白質(zhì)磷酸化與疾病關(guān)系判斷。目前解決疾病命名實(shí)體識別問題的主流方法是機(jī)器學(xué)習(xí),但是機(jī)器學(xué)習(xí)的方法難以有效地識別疾病命名實(shí)體中的醫(yī)學(xué)術(shù)語,蛋白質(zhì)磷酸化與疾病之間的關(guān)系抽取目前沒有可獲得的公開系統(tǒng)。本文對蛋白質(zhì)磷酸化與疾病之間的關(guān)系抽取問題進(jìn)行了研究,研究工作和貢獻(xiàn)如下:本文給出了一種條件隨機(jī)場與語義詞典相結(jié)合的疾病命名實(shí)體識別方法,其中利用網(wǎng)絡(luò)資源來構(gòu)建含有語義信息的醫(yī)學(xué)術(shù)語詞典可以克服病疾命名實(shí)體中的醫(yī)學(xué)術(shù)語識別的難點(diǎn)。先使用該詞典獲得醫(yī)學(xué)術(shù)語的語義信息;然后CRF利用這些信息結(jié)合詞法與詞性特征、拼寫與領(lǐng)域特征對疾病命名實(shí)體進(jìn)行識別;最后對縮寫詞識別進(jìn)行調(diào)整,來提升疾病名實(shí)體識別的效果。在NCBI Disease Corpus數(shù)據(jù)集上的實(shí)驗(yàn)結(jié)果表明,本文方法比DNorm方法提升了約2.5%的F值;在開放數(shù)據(jù)集上實(shí)驗(yàn)驗(yàn)證了本文方法對于較長疾病實(shí)體識別具有一定的優(yōu)勢。蛋白質(zhì)磷酸化與疾病之間的關(guān)系分為Absence(缺失)、Presence(存在)、Down-regulation(調(diào)降)和Up-regulation(調(diào)升)四種類型。本文實(shí)現(xiàn)了一個蛋白質(zhì)磷酸化與疾病關(guān)系抽取系統(tǒng)PDRMine,該系統(tǒng)分為三個步驟:首先利用基于規(guī)則的蛋白質(zhì)磷酸化信息抽取系統(tǒng)RLIMS-P從文獻(xiàn)中抽取蛋白質(zhì)磷酸化信息;再利用本文設(shè)計的疾病命名實(shí)體識別方法識別包含磷酸化信息句子中的疾病命名實(shí)體;最后利用基于規(guī)則的方法對蛋白質(zhì)磷酸化與疾病之間的關(guān)系類型進(jìn)行判斷。觸發(fā)詞的識別是最后一步的難點(diǎn),本文通過同義詞擴(kuò)展的方法擴(kuò)大了觸發(fā)詞集合,提升了蛋白質(zhì)磷酸化與疾病之間關(guān)系類型的判斷效果。在開放數(shù)據(jù)集上取得了 72.6%的準(zhǔn)確率和66.4%的召回率。
[Abstract]:Protein phosphorylation is one of the most important organisms within a posttranslational protein modification, at present a large number of human diseases have been confirmed to be caused by abnormal phosphorylation, some phosphorylation associated with the disease can be developed as a marker of disease or therapeutic targets. Along with the explosive growth of biomedical literature, how to become a hot research topic in related fields of biomedical literature from the relationship between automatic extraction of protein phosphorylation and protein phosphorylation. The relation between the disease and disease extraction tasks including disease named entity recognition and relation between protein phosphorylation and disease. At present the mainstream method of judgment disease named entity recognition is the problem of machine learning, but the methods of machine learning to to identify the disease named medical terminology in the entity and relation extraction between protein phosphorylation and disease is not The open system can be obtained. This paper studied the relationship between the extraction of protein phosphorylation and disease, research work and contributions are as follows: This paper presents a conditional random field and semantic dictionary combining disease named entity recognition method to build the medical terminology dictionary containing semantic information can overcome the difficulty of medical terminology recognition disease named entity in the use of network resources. The first use of the semantic information dictionary for medical terminology; then CRF uses these information combined with lexical and POS features, spelling and domain feature of the disease named entity recognition; finally, to adjust the identification of abbreviations, to enhance the disease name recognition in effect. NCBI Disease Corpus data sets. The experimental results show that this method improves the DNorm method than about 2.5% F-measure; in the open data set on the experiment The method has some advantages for longer disease entity recognition. The relationship between protein phosphorylation and disease were divided into Absence (deletion), Presence (present), Down-regulation (cut) and Up-regulation (up) four types. This paper implements a relationship between protein phosphorylation and disease PDRMine extraction system, the system is divided into three steps: first, based on the RLIMS-P protein phosphorylation system of information extraction rules extraction of protein phosphorylation information from the literature; then the disease named entity recognition method to identify phosphorylation information in the sentence contains a disease named entity; the type of relationship between the rule-based method of protein phosphorylation and disease of judge. The trigger word recognition is difficult in the last step, the synonym expansion to expand the trigger word set, lifting the protein p The effect of the type of relationship between acidification and disease. The accuracy rate of 72.6% and the recall rate of 66.4% were obtained on the open data set.

【學(xué)位授予單位】:中國科學(xué)技術(shù)大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:TP391.1

【參考文獻(xiàn)】

相關(guān)期刊論文 前4條

1 張宏濤;黃民烈;朱小燕;;基于自訓(xùn)練的蛋白質(zhì)相互作用關(guān)系抽取方法[J];清華大學(xué)學(xué)報(自然科學(xué)版);2012年03期

2 姜錚;王芳;何湘;劉大偉;陳宣男;趙紅慶;黃留玉;袁靜;;蛋白質(zhì)磷酸化修飾的研究進(jìn)展[J];生物技術(shù)通訊;2009年02期

3 王浩暢;趙鐵軍;;生物醫(yī)學(xué)文本挖掘技術(shù)的研究與進(jìn)展[J];中文信息學(xué)報;2008年03期

4 劉婷;王文禮;姜麗麗;;磷酸化蛋白質(zhì)組學(xué)研究現(xiàn)狀[J];內(nèi)蒙古醫(yī)學(xué)院學(xué)報;2007年04期

相關(guān)碩士學(xué)位論文 前1條

1 楊婭;生物醫(yī)學(xué)文本中的疾病實(shí)體識別和標(biāo)準(zhǔn)化研究[D];大連理工大學(xué);2015年

,

本文編號:1408014

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/shoufeilunwen/xixikjs/1408014.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶1c62e***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com