中文自動分詞技術(shù)的改進與優(yōu)化研究
[Abstract]:Chinese automatic word segmentation technology is an important basic topic in the field of Chinese information processing. It provides information extraction, full-text retrieval, data mining, machine translation to related fields, such as information extraction, full-text retrieval, data mining, and machine translation. Question and answer system and other fields) has a great role in promoting the research. In this paper, the main technologies involved in the field of Chinese automatic word segmentation are studied comprehensively and carefully, including the dictionary structure of Chinese automatic word segmentation, the word segmentation algorithm of Chinese automatic word segmentation; The difficult problems in Chinese word segmentation are studied deeply. Finally, the application of Chinese automatic word segmentation technology in this field is described in combination with the popular search engine technology. The main contributions of this paper are as follows: firstly, the dictionary structure of Chinese automatic word segmentation is studied extensively and deeply, which is based on three classical dictionaries: word by word dichotomy, word by word dichotomy and Trie index tree. Finally, a word segmentation dictionary mechanism based on multi-hash balanced binary search tree is proposed. Secondly, this paper has carried on the key breakthrough in the naming entity recognition aspect. In the aspect of Chinese personal name recognition, a new method of Chinese personal name recognition is designed based on the existing research results, and the realization process is given. In the aspect of Chinese institution name recognition, this paper designs and implements a Chinese medical institution name recognition system based on CRF and rules, which is based on the CRF statistical model, and integrates the rules and knowledge in the field of linguistics. The experimental results show that the accuracy and recall rate of closed test are 91.68% and 95.2121% respectively. Finally, according to the urgent need of mass information retrieval in today's society, the application of Chinese automatic word segmentation technology in search engine is introduced in detail. On the one hand, the Chinese automatic word segmentation technology is popularized. On the other hand, it also makes a good point for the future optimization and development of search engine.
【學(xué)位授予單位】:江蘇科技大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2013
【分類號】:TP391.1
【參考文獻】
相關(guān)期刊論文 前10條
1 孫茂松,鄒嘉彥;漢語自動分詞研究評述[J];當(dāng)代語言學(xué);2001年01期
2 林亞平,劉云中,周順先,陳治平,蔡立軍;基于最大熵的隱馬爾可夫模型文本信息抽取[J];電子學(xué)報;2005年02期
3 周俊生;戴新宇;尹存燕;陳家駿;;基于層疊條件隨機場模型的中文機構(gòu)名自動識別[J];電子學(xué)報;2006年05期
4 馬哲,姚敏;一種改進的基于PATRICIA樹的漢語自動分詞詞典機制[J];華南理工大學(xué)學(xué)報(自然科學(xué)版);2004年S1期
5 駱衛(wèi)華,羅振聲,宮小瑾;中文文本自動校對技術(shù)的研究[J];計算機研究與發(fā)展;2004年01期
6 劉群,張華平,俞鴻魁,程學(xué)旗;基于層疊隱馬模型的漢語詞法分析[J];計算機研究與發(fā)展;2004年08期
7 羅智勇;宋柔;;現(xiàn)代漢語通用分詞系統(tǒng)中歧義切分的實用技術(shù)[J];計算機研究與發(fā)展;2006年06期
8 李振星,徐澤平,唐衛(wèi)清,唐榮錫;全二分最大匹配快速分詞算法[J];計算機工程與應(yīng)用;2002年11期
9 張華平,劉群;基于角色標(biāo)注的中國人名自動識別研究[J];計算機學(xué)報;2004年01期
10 王瑞雷;欒靜;潘曉花;盧修配;;一種改進的中文分詞正向最大匹配算法[J];計算機應(yīng)用與軟件;2011年03期
本文編號:2322110
本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2322110.html