天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當前位置:主頁 > 科技論文 > 軟件論文 >

基于互信息改進算法和t-測試差的壯文分詞算法研究

發(fā)布時間:2018-06-22 05:28

  本文選題:壯文分詞 + MI改進算法; 參考:《中南民族大學(xué)學(xué)報(自然科學(xué)版)》2017年04期


【摘要】:針對傳統(tǒng)的壯文分詞方法將單詞之間的空格作為分隔標志,在多數(shù)情況下,會破壞多個單詞關(guān)聯(lián)組合而成的語義詞所要表達的完整且獨立的語義信息,在借鑒前人使用互信息MI方法來度量相鄰單詞間關(guān)聯(lián)程度的基礎(chǔ)上,首次采用互信息改進算法MI~k和t-測試差對壯文文本分詞,并結(jié)合兩者在評價相鄰單詞間的靜態(tài)結(jié)合能力和動態(tài)結(jié)合能力的各自優(yōu)勢,提出了一種MI~k和t-測試差相結(jié)合的TD-MIk混合算法對壯文文本分詞,并對互信息改進算法MI~k、t-測試差、TD-MI~k混合算法三種方法的分詞效果進行了比較.使用人民網(wǎng)壯文版上的文本集作為訓(xùn)練及測試語料進行了實驗,結(jié)果表明:三種分詞方法都能夠較準確而有效地提取文本中的語義詞,并且TD-MI~k混合算法的分詞準確率最高.
[Abstract]:In view of the traditional Zhuang word segmentation method, the space between words is taken as the separation mark, in most cases, the complete and independent semantic information to be expressed by the semantic words formed by the association of multiple words will be destroyed. On the basis of using the mutual information MI method to measure the correlation degree between adjacent words, the improved mutual information algorithms MIK and t- test difference are used for the first time. Combined with their respective advantages in evaluating the static and dynamic combination of adjacent words, a TD-MIK hybrid algorithm combining MIK and t- test difference is proposed for word segmentation in Zhuang text. The segmentation effect of the improved mutual information algorithm, MIGK / TD-MIK hybrid algorithm, is compared in this paper. The experimental results show that the three word segmentation methods can extract the semantic words from the text accurately and effectively, and the segmentation accuracy of the TD-MIPK hybrid algorithm is the highest. The experiment results show that the text set on the Zhuang text version of people's net can be used as the training and testing corpus, and the results show that all the three word segmentation methods can extract the semantic words from the text more accurately and effectively.
【作者單位】: 中南民族大學(xué)計算機科學(xué)學(xué)院;河池學(xué)院計算機與信息工程學(xué)院;
【基金】:國家科技支撐計劃項目子課題(2015BAD29B01) 中南民族大學(xué)研究生學(xué)術(shù)創(chuàng)新基金項目(2017sycxjj051)
【分類號】:TP391.1


本文編號:2051772

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2051772.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶8d30b***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com