基于PSO-BP神經(jīng)網(wǎng)絡(luò)的Lucene搜索引擎的研究
[Abstract]:Lucene is a full-text search architecture with excellent index structure, good system architecture and high performance, scalable information search library. However, the support for Chinese word segmentation and various text formats is very inadequate. At present, there are many Chinese word segmentation algorithms used in Lucene, including StandardAnalyzer and CJKAnalyzer, provided by Lucene itself and ChineseAnalyzer and IK_CAnalyzer provided by third parties. Among them, StandardAnalyzer is based on word segmentation, that is to say, word segmentation is based on word segmentation. Its disadvantage is that it needs complex word matching algorithm and a large number of CPU operations. CJKAnalyzer and ChineseAnalyzer use dichotomy, so called dichotomy is each word as a word to divide; The word segmentation technology of IK_CAnalyzer is based on the word segmentation dictionary, and adopts the special forward iterative finest granularity segmentation algorithm and the analysis mode of multiple sub-processors. At present, the Lucene search engine has not realized the Chinese word segmentation method based on understanding, because the computer can not recognize the meaning of each word in different context, so the word segmentation method based on understanding has no practical application effect. In view of the deficiency of Lucene in Chinese word segmentation, especially the lack of Chinese word segmentation technology based on understanding, this paper discusses the application of BP (Back Propagation) neural network algorithm in Chinese word segmentation. Aiming at the shortcomings of BP neural network in the application of Chinese word segmentation, such as slow convergence, easy to fall into local minima, and low speed and efficiency, an improved particle swarm optimization algorithm (PSO,) is proposed. Particle SwarmOptimization) optimizes BP neural network, PSO-BP neural network, and applies it to Chinese word segmentation. Compared with traditional BP neural network, PSO-BP neural network not only solves the problem of slow convergence speed of traditional BP neural network. At the same time, the accuracy of word segmentation is improved. Then, the API of the third-party Chinese word segmentation component provided by Lucene is systematically studied and analyzed in this paper. The Chinese word segmentation technology optimized by PSO-BP neural network is successfully applied to Lucene, and compared with the Chinese word segmentation technology provided by Lucene. The result shows that this technique is superior to the Chinese word segmentation technology. Finally, this paper uses Lucene which includes PSO-BP neural network Chinese word segmentation component to design and implement the search engine, so as to realize the intelligent exploration of Chinese word segmentation of search engine, which provides a good platform for the follow-up work and research.
【學(xué)位授予單位】:中國石油大學(xué)(華東)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2013
【分類號】:TP391.3;TP183
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 龔漢明,周長勝;漢語分詞技術(shù)綜述[J];北京機(jī)械工業(yè)學(xué)院學(xué)報;2004年03期
2 余華;曹亮;李啟元;;BP神經(jīng)網(wǎng)絡(luò)算法的改進(jìn)及其在手寫體漢字識別中的應(yīng)用[J];江西師范大學(xué)學(xué)報(自然科學(xué)版);2009年05期
3 周平;;Lucene全文檢索引擎技術(shù)及應(yīng)用[J];重慶工學(xué)院學(xué)報(自然科學(xué)版);2007年04期
4 于洪波;;中文分詞技術(shù)研究[J];東莞理工學(xué)院學(xué)報;2010年05期
5 張利;張立勇;張曉淼;耿鐵鎖;岳宗閣;;基于改進(jìn)BP網(wǎng)絡(luò)的中文歧義字段分詞方法研究[J];大連理工大學(xué)學(xué)報;2007年01期
6 劉玲;嚴(yán)登俊;龔燈才;張紅梅;李大鵬;;基于粒子群模糊神經(jīng)網(wǎng)絡(luò)的短期電力負(fù)荷預(yù)測[J];電力系統(tǒng)及其自動化學(xué)報;2006年03期
7 姚李孝,宋玲芳,李慶宇,萬詩新;基于模糊聚類分析與BP網(wǎng)絡(luò)的電力系統(tǒng)短期負(fù)荷預(yù)測[J];電網(wǎng)技術(shù);2005年01期
8 丁麗;相玉紅;黃安民;張卓勇;;BP神經(jīng)網(wǎng)絡(luò)與近紅外光譜定量預(yù)測杉木中的綜纖維素、木質(zhì)素、微纖絲角[J];光譜學(xué)與光譜分析;2009年07期
9 王欣;葉華俊;黎慶濤;謝錦春;盧家炯;夏阿林;王健;;近紅外光譜結(jié)合人工神經(jīng)網(wǎng)絡(luò)分析蔗汁的錘度和旋光度[J];光譜學(xué)與光譜分析;2010年07期
10 嚴(yán)文娟;張晶;胡廣芹;趙靜;林凌;陸小左;李剛;;BP神經(jīng)網(wǎng)絡(luò)用于肝炎患者舌診近紅外光譜的研究[J];光譜學(xué)與光譜分析;2010年10期
本文編號:2428689
本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2428689.html