天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

帶有通配符和長度約束的模式匹配問題求解及其應(yīng)用研究

發(fā)布時間:2018-08-26 07:25
【摘要】:帶有通配符模式匹配是模式識別領(lǐng)域中重要的研究方向之一,在計算生物學(xué)、信息檢索、網(wǎng)絡(luò)安全等研究領(lǐng)域中都得到了廣泛關(guān)注。它是通過在模式識別問題中引入通配符這種特殊字符,來匹配字母表中的任意字符,這樣帶來更多靈活性的匹配結(jié)果,但也使得問題求解變得更為復(fù)雜。當頻繁共現(xiàn)在一段文本區(qū)域內(nèi)的多個模式之間表現(xiàn)為某種模式形式。例如在DNA序列中,啟動子TATTA序列常常出現(xiàn)在CAATCT序列下游中間間隔30-50個通配符,其中也不是簡單的重復(fù)。由這兩個子序列共同組成的模式可提高序列特異性,可以標記“CAATCT[30-50]TATA...”。將其特點推廣為子模式序列集間隔,其中兩個相鄰子模式的間隔在一定長度范圍內(nèi),為表示這種靈活的位置間隔,將通配符從指代單個字符擴展為指代一定長度的子串,此通配符稱作限長空位(Bounded length gaps)。同時通過引入one-off約束限制條件來保證匹配子串模式序列集的穩(wěn)定性。研究帶有通配符可變長度約束的模式匹配問題。本文圍繞帶通配符和長度約束的模式匹配問題中解結(jié)構(gòu)的復(fù)雜性特征,從設(shè)計求解算法以及相似性度量模型應(yīng)用等問題,展開一些研究工作,主要內(nèi)容概括為以下三個方面:(1)結(jié)合帶通配符和長度約束的精確模式匹配問題中求解的復(fù)雜性、精確性與完備性等特征,針對目前已有研究成果中還缺少針對性的模型建立求解策略。為此,借鑒約束可滿足問題框架(CSPs),首先對帶通配符和長度約束的精確模式匹配問題構(gòu)建三元組求解模型。模型對該問題的約束條件和解空間等基本概念給出形式化的描述,并將該問題已知的8條特殊情況統(tǒng)一表述為問題的基本性質(zhì),其中包括在特殊條件下的完備性和相鄰匹配解在文本中的位置關(guān)系。同時提出一種求解帶通配符的模式串精確匹配問題的FIN算法。首先在定義了解空間的劃分邊界中,提出了采用分治策略的解空間劃分算法,將帶通配符的模式串精確匹配問題等價劃分為若干獨立的子問題,并從理論上說明了劃分前后解結(jié)構(gòu)等價性。實驗結(jié)果表明FIN算法與解決相同問題的算法(PAIG)對比,該算法不僅可以得到匹配數(shù)目,而且可以得到完備的匹配解位置。(2)針對處理帶通配符的近似模式匹配問題中,使用傳統(tǒng)算法求解存在匹配子串結(jié)果質(zhì)量不高、易丟解等問題,提出一種啟發(fā)式的算法W-DPBI。該算法采取文本倒置搜索策略與流程的優(yōu)化。與同類DP和SAIL-APPROX算法進行實驗對比,結(jié)果表明該算法獲取解的平均增長率可達21.9%,最高可達57%,匹配結(jié)果具有良好的優(yōu)勢,在特定情況下可以明顯提高求解近似匹配結(jié)果的質(zhì)量和能力,具有較好的運用靈活性和啟發(fā)性。(3)結(jié)合模式匹配及其相關(guān)求解算法在計算生物學(xué)領(lǐng)域中應(yīng)用以及相關(guān)實驗方法研究的成果,針對藥物基因和疾病基因序列中的相似度結(jié)構(gòu)等特征,對所收集的數(shù)據(jù)信息源,采用近似匹配協(xié)同過濾算法并與相關(guān)算法組合求解搜索的策略,著重從已知疾病信息與基因信息角度來計算藥物與疾病之間的相似度,應(yīng)用于藥物重定位與構(gòu)建模型研究。實驗結(jié)果表明,該方法能夠明顯提高潛在治療關(guān)系的藥物-疾病的富集程度。相比于已有分類模型和隨機抽樣結(jié)果,可以有效降低了預(yù)測的假陽性率,其模型參數(shù)可作為藥物研發(fā)試驗的參照。
[Abstract]:Pattern matching with wildcards is one of the most important research directions in pattern recognition. It has attracted wide attention in computational biology, information retrieval, network security and other fields. It matches any character in the alphabet by introducing a special character, wildcards, into pattern recognition, which brings more flexibility. For example, in DNA sequences, the promoter TATTA sequence often appears in the downstream middle of the CAATCT sequence with 30-50 wildcards, which are not simply duplicated. A pattern composed of subsequences can improve sequence specificity by marking "CAATCT [30-50] TATA...". A substring is called a bounded length gaps. At the same time, the stability of the sequence of matching substring patterns is guaranteed by introducing one-off constraints. The problem of pattern matching with variable length constraints of wildcards is studied. The main contents are summarized as follows: (1) Considering the complexity, accuracy and completeness of solving exact pattern matching problems with wildcards and length constraints, there is still a lack of needles in the existing research results. In this paper, a three-tuple solution model for exact pattern matching problem with wildcard and length constraints is constructed by using the Constraint Satisfaction Problem Framework (CSPs). The model formally describes the basic concepts of constraint conditions and solution space of the problem, and eight special cases of the problem are given. The basic properties of the problem are formulated in a unified way, including the completeness under special conditions and the location relationship between adjacent matching solutions in the text. At the same time, a FIN algorithm for exact matching of pattern strings with wildcards is proposed. The algorithm divides the exact matching problem of pattern strings with wildcards into several independent sub-problems and theoretically illustrates the structural equivalence of the solution before and after partitioning.The experimental results show that the FIN algorithm can not only obtain the number of matches, but also obtain the complete matching solution position. To solve the problem of approximate pattern matching with wildcards, a heuristic algorithm W-DPBI is proposed to solve the problem of low quality matching substrings and easy to be lost. The algorithm adopts the strategy of text inversion search and the optimization of process. Compared with similar DP and SAIL-APPROX algorithms, the results show that the algorithm is effective. The average growth rate of the solution obtained by the method is 21.9% and the maximum is 57%. The matching results have good advantages, which can obviously improve the quality and ability of solving approximate matching results under certain conditions, and have good flexibility and inspiration in application. (3) Combining pattern matching and related algorithms in computational biology applications and According to the similarity structure of drug gene and disease gene sequence, the strategy of approximate matching collaborative filtering algorithm combined with related algorithm is adopted to search the collected data information source. The emphasis is on calculating the relationship between drug and disease from the perspective of known disease information and gene information. Similarity is applied to drug relocation and modeling. Experimental results show that this method can significantly improve the drug-disease enrichment of potential therapeutic relationships. Compared with existing classification models and random sampling results, it can effectively reduce the predicted false positive rate, and its model parameters can be used as a reference for drug development trials.
【學(xué)位授予單位】:合肥工業(yè)大學(xué)
【學(xué)位級別】:博士
【學(xué)位授予年份】:2016
【分類號】:TP391.4

【參考文獻】

相關(guān)期刊論文 前10條

1 黃海寧;張浩;汪海;;沙利度胺抗腫瘤機制及其作用靶點CRBN的研究進展[J];中國藥理學(xué)通報;2015年06期

2 張浩;葉明全;;求解PMWOC問題的位并行算法[J];計算機應(yīng)用研究;2015年10期

3 強繼朋;謝飛;高雋;胡學(xué)鋼;吳信東;;帶任意長度通配符的模式匹配[J];自動化學(xué)報;2014年11期

4 項泰寧;郭丹;王海平;胡學(xué)鋼;;帶通配符的模式匹配問題及其解空間特征分析[J];計算機科學(xué);2014年09期

5 吳信東;強繼朋;謝飛;;Pattern Matching with Flexible Wildcards[J];Journal of Computer Science & Technology;2014年05期

6 王可鑒;石樂明;賀林;張永祥;楊侖;;中國藥物研發(fā)的新機遇:基于醫(yī)藥大數(shù)據(jù)的系統(tǒng)性藥物重定位[J];科學(xué)通報;2014年18期

7 沈璐;紀允;紀冬寶;李萍;;帶可變長度通配符的模式匹配算法[J];計算機工程與應(yīng)用;2015年15期

8 吳信東;謝飛;黃詠明;胡學(xué)鋼;高雋;;帶通配符和One-Off條件的序列模式挖掘[J];軟件學(xué)報;2013年08期

9 王寶勛;劉秉權(quán);孫承杰;王曉龍;孫林;;基于論壇話題段落劃分的答案識別[J];自動化學(xué)報;2013年01期

10 張永祥;程肖蕊;周文霞;;藥物重定位——網(wǎng)絡(luò)藥理學(xué)的重要應(yīng)用領(lǐng)域[J];中國藥理學(xué)與毒理學(xué)雜志;2012年06期

相關(guān)博士學(xué)位論文 前3條

1 劉應(yīng)玲;帶可變長度通配符的模式匹配算法研究[D];合肥工業(yè)大學(xué);2014年

2 趙華;多模型下的近似字符串匹配算法研究[D];華中科技大學(xué);2013年

3 孫德才;基于q-gram過濾的近似串匹配技術(shù)研究[D];湖南大學(xué);2012年

,

本文編號:2204145

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/shoufeilunwen/xxkjbs/2204145.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶db628***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com