帶有通配符和長度約束的模式匹配問題求解及其應(yīng)用研究
[Abstract]:Pattern matching with wildcards is one of the most important research directions in pattern recognition. It has attracted wide attention in computational biology, information retrieval, network security and other fields. It matches any character in the alphabet by introducing a special character, wildcards, into pattern recognition, which brings more flexibility. For example, in DNA sequences, the promoter TATTA sequence often appears in the downstream middle of the CAATCT sequence with 30-50 wildcards, which are not simply duplicated. A pattern composed of subsequences can improve sequence specificity by marking "CAATCT [30-50] TATA...". A substring is called a bounded length gaps. At the same time, the stability of the sequence of matching substring patterns is guaranteed by introducing one-off constraints. The problem of pattern matching with variable length constraints of wildcards is studied. The main contents are summarized as follows: (1) Considering the complexity, accuracy and completeness of solving exact pattern matching problems with wildcards and length constraints, there is still a lack of needles in the existing research results. In this paper, a three-tuple solution model for exact pattern matching problem with wildcard and length constraints is constructed by using the Constraint Satisfaction Problem Framework (CSPs). The model formally describes the basic concepts of constraint conditions and solution space of the problem, and eight special cases of the problem are given. The basic properties of the problem are formulated in a unified way, including the completeness under special conditions and the location relationship between adjacent matching solutions in the text. At the same time, a FIN algorithm for exact matching of pattern strings with wildcards is proposed. The algorithm divides the exact matching problem of pattern strings with wildcards into several independent sub-problems and theoretically illustrates the structural equivalence of the solution before and after partitioning.The experimental results show that the FIN algorithm can not only obtain the number of matches, but also obtain the complete matching solution position. To solve the problem of approximate pattern matching with wildcards, a heuristic algorithm W-DPBI is proposed to solve the problem of low quality matching substrings and easy to be lost. The algorithm adopts the strategy of text inversion search and the optimization of process. Compared with similar DP and SAIL-APPROX algorithms, the results show that the algorithm is effective. The average growth rate of the solution obtained by the method is 21.9% and the maximum is 57%. The matching results have good advantages, which can obviously improve the quality and ability of solving approximate matching results under certain conditions, and have good flexibility and inspiration in application. (3) Combining pattern matching and related algorithms in computational biology applications and According to the similarity structure of drug gene and disease gene sequence, the strategy of approximate matching collaborative filtering algorithm combined with related algorithm is adopted to search the collected data information source. The emphasis is on calculating the relationship between drug and disease from the perspective of known disease information and gene information. Similarity is applied to drug relocation and modeling. Experimental results show that this method can significantly improve the drug-disease enrichment of potential therapeutic relationships. Compared with existing classification models and random sampling results, it can effectively reduce the predicted false positive rate, and its model parameters can be used as a reference for drug development trials.
【學(xué)位授予單位】:合肥工業(yè)大學(xué)
【學(xué)位級別】:博士
【學(xué)位授予年份】:2016
【分類號】:TP391.4
【參考文獻】
相關(guān)期刊論文 前10條
1 黃海寧;張浩;汪海;;沙利度胺抗腫瘤機制及其作用靶點CRBN的研究進展[J];中國藥理學(xué)通報;2015年06期
2 張浩;葉明全;;求解PMWOC問題的位并行算法[J];計算機應(yīng)用研究;2015年10期
3 強繼朋;謝飛;高雋;胡學(xué)鋼;吳信東;;帶任意長度通配符的模式匹配[J];自動化學(xué)報;2014年11期
4 項泰寧;郭丹;王海平;胡學(xué)鋼;;帶通配符的模式匹配問題及其解空間特征分析[J];計算機科學(xué);2014年09期
5 吳信東;強繼朋;謝飛;;Pattern Matching with Flexible Wildcards[J];Journal of Computer Science & Technology;2014年05期
6 王可鑒;石樂明;賀林;張永祥;楊侖;;中國藥物研發(fā)的新機遇:基于醫(yī)藥大數(shù)據(jù)的系統(tǒng)性藥物重定位[J];科學(xué)通報;2014年18期
7 沈璐;紀允;紀冬寶;李萍;;帶可變長度通配符的模式匹配算法[J];計算機工程與應(yīng)用;2015年15期
8 吳信東;謝飛;黃詠明;胡學(xué)鋼;高雋;;帶通配符和One-Off條件的序列模式挖掘[J];軟件學(xué)報;2013年08期
9 王寶勛;劉秉權(quán);孫承杰;王曉龍;孫林;;基于論壇話題段落劃分的答案識別[J];自動化學(xué)報;2013年01期
10 張永祥;程肖蕊;周文霞;;藥物重定位——網(wǎng)絡(luò)藥理學(xué)的重要應(yīng)用領(lǐng)域[J];中國藥理學(xué)與毒理學(xué)雜志;2012年06期
相關(guān)博士學(xué)位論文 前3條
1 劉應(yīng)玲;帶可變長度通配符的模式匹配算法研究[D];合肥工業(yè)大學(xué);2014年
2 趙華;多模型下的近似字符串匹配算法研究[D];華中科技大學(xué);2013年
3 孫德才;基于q-gram過濾的近似串匹配技術(shù)研究[D];湖南大學(xué);2012年
,本文編號:2204145
本文鏈接:http://sikaile.net/shoufeilunwen/xxkjbs/2204145.html