基于最大匹配的論文特征提取系統(tǒng)的設(shè)計與實現(xiàn)
[Abstract]:In Chinese search engine, the function of Chinese word segmentation is obvious, and its result directly affects the performance of search engine. At present, there are three kinds of Chinese word segmentation techniques: word segmentation by string matching, word segmentation by artificial intelligence on the basis of understanding the semantics of word segmentation, and word segmentation by statistical calculation. The so-called Chinese word segmentation system is a method of word segmentation in modern Chinese sentences. Because of the grammatical habits of modern Chinese, the markers between Chinese sentences and words indicate. English words and words between the space, so there is no word segmentation problem. But in China, every sentence, word and word problem has no space, so we must use some intelligent technology to separate. Chinese automatic word segmentation algorithm has become a hot topic in computer science since the nineteen's, because of the complexity of language and the bottleneck of computer technology, it has been in the development stage. In this paper, the existing word segmentation algorithms are analyzed, summarized and summarized, and two problems which are difficult to solve in Chinese recognition are discussed: ambiguity recognition and unrecorded words. Ambiguity recognition and new word recognition are the biggest problems encountered in the development of Chinese word segmentation. The future development of Chinese word segmentation should not only solve this kind of problems, so as to achieve a higher correct rate of word segmentation, but also continue to expand the scope of application of Chinese word segmentation. The feature set of the word term is obtained, and the feature extraction method of word frequency space is designed. Firstly, the maximum matching algorithm is used to segment the file, then the word frequency matrix is imported, and the frequency of each occurrence in the word frequency matrix is counted. Finally, the text features are extracted. This paper mainly studies the development and design of library paper feature extraction system. This paper applies Chinese word segmentation technology and feature extraction technology to design a paper feature extraction system which can be applied to library. The design process and experimental results of the system are introduced in detail. With the application of this system, the paper management of the school library becomes more efficient and the search speed is faster.
【學(xué)位授予單位】:電子科技大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2012
【分類號】:TP391.1
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 龔漢明,周長勝;漢語分詞技術(shù)綜述[J];北京機(jī)械工業(yè)學(xué)院學(xué)報;2004年03期
2 劉海峰;王元元;;一種基于統(tǒng)計的漢語切詞方法[J];工程地質(zhì)計算機(jī)應(yīng)用;2006年02期
3 歐振猛,余順爭;中文分詞算法在搜索引擎應(yīng)用中的研究[J];計算機(jī)工程與應(yīng)用;2000年08期
4 應(yīng)志偉,柴佩琪,陳其暉;文語轉(zhuǎn)換系統(tǒng)中基于語料的漢語自動分詞研究[J];計算機(jī)應(yīng)用;2000年02期
5 馬玉春,宋瀚濤;Web中文文本分詞技術(shù)研究[J];計算機(jī)應(yīng)用;2004年04期
6 鄒海山,吳勇,吳月珠,陳陣;中文搜索引擎中的中文信息處理技術(shù)[J];計算機(jī)應(yīng)用研究;2000年12期
7 曹倩,丁艷,王超,潘金貴;漢語自動分詞研究及其在信息檢索中的應(yīng)用[J];計算機(jī)應(yīng)用研究;2004年05期
8 黃昌寧;趙海;;中文分詞十年回顧[J];中文信息學(xué)報;2007年03期
9 曹紅兵;;新一代搜索引擎UJIK0[J];圖書館建設(shè);2007年02期
10 于海燕;陳曉江;馮健;房鼎益;;Web文本內(nèi)容過濾方法的研究[J];微電子學(xué)與計算機(jī);2006年09期
相關(guān)碩士學(xué)位論文 前1條
1 于洪杰;垃圾郵件過濾技術(shù)算法研究[D];大連海事大學(xué);2007年
本文編號:2377849
本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2377849.html