天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當前位置:主頁 > 科技論文 > 軟件論文 >

基于Aho-Corasick自動機算法的概率模型中文分詞CPACA算法

發(fā)布時間:2018-08-28 15:49
【摘要】:Aho-Corasick自動機算法是著名的多模式串匹配算法,它在模式串失配時,通過fail指針轉移至有效的后續(xù)狀態(tài),存在一個或多個有效的后續(xù)狀態(tài)可能。據此特性,該文提出了一種適應于中文分詞的自動機算法。該算法使用動態(tài)規(guī)劃的方法,計算上下文匹配概率,轉移至最佳的有效后續(xù)狀態(tài),即實現了基于字符串匹配的機械分詞方法與基于統(tǒng)計概率模型的方法結合。實驗結果表明,該算法分詞準確率高。
[Abstract]:The Aho-Corasick automaton algorithm is a famous multi-pattern string matching algorithm. When the pattern string mismatches, it can be transferred to an effective subsequent state by fail pointer, and there are one or more effective follow-up states. In this paper, an automaton algorithm for Chinese word segmentation is proposed. The algorithm uses dynamic programming method to calculate the context matching probability and transfer to the best effective follow-up state, that is, the combination of mechanical word segmentation method based on string matching and statistical probability model is realized. Experimental results show that the algorithm has high accuracy.
【作者單位】: 女王大學工程與應用科學學院;
【分類號】:TP391.1
,

本文編號:2209871

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2209871.html


Copyright(c)文論論文網All Rights Reserved | 網站地圖 |

版權申明:資料由用戶6adf2***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com