無重疊約束近似模式匹配
發(fā)布時間:2018-07-31 07:15
【摘要】:模式匹配(也稱為串匹配)是計算機科學中基本問題之一,已經(jīng)廣泛地應用到了生物研究、音樂信息檢索和序列模式挖掘等各個領域。在模式匹配中,有的僅僅考慮最后一個模式子串在序列中匹配的位置,這種模式匹配稱為寬松模式匹配;更為挑戰(zhàn)性的問題是對每個模式串均考慮其在序列串中匹配的位置,而這種模式匹配稱為嚴格模式匹配。嚴格模式匹配中近似模式匹配允許模式串與所匹配的序列串之間存在一定差異,增加了匹配的靈活性。本文研究的是具有Hamming距離的近似模式匹配。無重疊約束是要求任何兩個出現(xiàn)中不能在相同位置處使用序列中相同字符。具有無重疊約束的模式匹配問題是指給定模式P在序列S中的所有出現(xiàn)集合中找到最大子集C,使得集合C中任意兩個出現(xiàn)均是無重疊的出現(xiàn)。目前研究的模式匹配多為精確匹配,精確匹配要求每個字符必須匹配,一定程度上限定了匹配的靈活性。為此本文提出了無重疊約束的近似模式匹配問題(Approximate Pattern matching with Non-Overlapping constraints,APNO)。我們運用網(wǎng)樹結構解決該問題,提出了NETLOR和NETROL算法,并對其改進提出了NETLORE和NETROLE算法。該算法在求解時,先將模式匹配問題轉換為一棵網(wǎng)樹,然后在網(wǎng)樹中每次查找樹根葉子路徑,并將該路徑所對應的解作為一個無重疊近似出現(xiàn),之后依據(jù)樹根葉子路徑在網(wǎng)樹的局部范圍內進行快速剪枝,為下一次查找樹根葉子路徑做準備。最后大量實驗驗證了本文提出的四個算法的正確性,并且驗證了NETLORE是更為高效的算法。
[Abstract]:Pattern matching (also called string matching) is one of the basic problems in computer science, which has been widely used in biological research, music information retrieval and sequential pattern mining and other fields. In pattern matching, some only consider the position of the last pattern string matching in the sequence, this pattern matching is called loose pattern matching; the more challenging problem is that each pattern string considers its matching position in the sequence string. This pattern matching is called strict pattern matching. In strict pattern matching, approximate pattern matching allows some difference between pattern string and matching sequence string, which increases the flexibility of matching. In this paper, approximate pattern matching with Hamming distance is studied. No overlap constraint requires any two occurrences not to use the same characters in the sequence at the same location. The problem of pattern matching with no overlap constraint is that given pattern P finds the largest subset C in all appearance sets of sequence S so that any two occurrences in set C are non-overlapping. At present, the pattern matching is mostly precise matching, which requires each character to match, which limits the flexibility of matching to a certain extent. In this paper, an approximate pattern matching problem without overlapping constraints, (Approximate Pattern matching with Non-Overlapping constraints, is proposed. We use the net tree structure to solve this problem, propose NETLOR and NETROL algorithms, and propose NETLORE and NETROLE algorithms to improve them. In the algorithm, the pattern matching problem is first transformed into a net tree, and then the root leaf path of the tree is searched in the network tree each time, and the corresponding solution of this path appears as a non-overlapping approximation. Then pruning is carried out in the local area of the web tree according to the tree root leaf path to prepare for the next search of the tree root leaf path. Finally, a large number of experiments verify the correctness of the proposed four algorithms, and verify that NETLORE is a more efficient algorithm.
【學位授予單位】:河北工業(yè)大學
【學位級別】:碩士
【學位授予年份】:2015
【分類號】:TP391.1
本文編號:2154835
[Abstract]:Pattern matching (also called string matching) is one of the basic problems in computer science, which has been widely used in biological research, music information retrieval and sequential pattern mining and other fields. In pattern matching, some only consider the position of the last pattern string matching in the sequence, this pattern matching is called loose pattern matching; the more challenging problem is that each pattern string considers its matching position in the sequence string. This pattern matching is called strict pattern matching. In strict pattern matching, approximate pattern matching allows some difference between pattern string and matching sequence string, which increases the flexibility of matching. In this paper, approximate pattern matching with Hamming distance is studied. No overlap constraint requires any two occurrences not to use the same characters in the sequence at the same location. The problem of pattern matching with no overlap constraint is that given pattern P finds the largest subset C in all appearance sets of sequence S so that any two occurrences in set C are non-overlapping. At present, the pattern matching is mostly precise matching, which requires each character to match, which limits the flexibility of matching to a certain extent. In this paper, an approximate pattern matching problem without overlapping constraints, (Approximate Pattern matching with Non-Overlapping constraints, is proposed. We use the net tree structure to solve this problem, propose NETLOR and NETROL algorithms, and propose NETLORE and NETROLE algorithms to improve them. In the algorithm, the pattern matching problem is first transformed into a net tree, and then the root leaf path of the tree is searched in the network tree each time, and the corresponding solution of this path appears as a non-overlapping approximation. Then pruning is carried out in the local area of the web tree according to the tree root leaf path to prepare for the next search of the tree root leaf path. Finally, a large number of experiments verify the correctness of the proposed four algorithms, and verify that NETLORE is a more efficient algorithm.
【學位授予單位】:河北工業(yè)大學
【學位級別】:碩士
【學位授予年份】:2015
【分類號】:TP391.1
【參考文獻】
相關期刊論文 前7條
1 柴欣;賈曉菲;武優(yōu)西;江賀;吳信東;;一般間隙及一次性條件的嚴格模式匹配[J];軟件學報;2015年05期
2 武優(yōu)西;劉亞偉;郭磊;吳信東;;子網(wǎng)樹求解一般間隙和長度約束嚴格模式匹配[J];軟件學報;2013年05期
3 黃國林;郭丹;胡學鋼;;基于通配符和長度約束的近似模式匹配算法[J];計算機應用;2013年03期
4 李艷;孫樂;朱懷忠;武優(yōu)西;;網(wǎng)樹求解有向無環(huán)圖中具有長度約束的簡單路徑和最長路徑問題[J];計算機學報;2012年10期
5 黃國林;郭丹;胡學鋼;;求解近似模式匹配的啟發(fā)式算法[J];計算機科學與探索;2013年01期
6 武優(yōu)西;吳信東;江賀;閔帆;;一種求解MPMGOOC問題的啟發(fā)式算法[J];計算機學報;2011年08期
7 陸麗娜,魏恒義,楊怡玲,管旭東;Web日志挖掘中的序列模式識別[J];小型微型計算機系統(tǒng);2000年05期
,本文編號:2154835
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2154835.html
最近更新
教材專著