天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁 > 科技論文 > 搜索引擎論文 >

基于PSO-BP神經(jīng)網(wǎng)絡(luò)的Lucene搜索引擎的研究

發(fā)布時間:2019-02-23 09:37
【摘要】:Lucene是一個全文搜索體系架構(gòu),具有優(yōu)異的索引結(jié)構(gòu)、良好的系統(tǒng)架構(gòu)以及高性能、可伸縮的信息搜索庫等優(yōu)點(diǎn),但是對于中文分詞以及多種文本格式的支持卻很是不足。目前Lucene采用的中文分詞算法有很多,包括Lucene自身提供的StandardAnalyzer和CJKAnalyzer,以及第三方提供的ChineseAnalyzer和IK_CAnalyzer等等很多種中文分詞系統(tǒng)。其中,StandardAnalyzer是基于單字分詞的,即在對中文文本進(jìn)行分詞時,以字為單位進(jìn)行切分,其缺點(diǎn)是需要復(fù)雜的單字匹配算法,以及大量的CPU運(yùn)算;CJKAnalyzer和ChineseAnalyzer采用的均是二分法,所謂二分法就是每每兩個字當(dāng)作一個詞來切分;IK_CAnalyzer分詞技術(shù)是基于分詞詞典的,采用了特有的正向迭代最細(xì)粒度切分算法和多子處理器分析模式。目前,Lucene搜索引擎并未實(shí)現(xiàn)基于理解的中文分詞方法,因為計算機(jī)無法識別每個詞在不同語境中的含義,所以基于理解的分詞方法還未有實(shí)際的運(yùn)用效果。 針對Lucene對中文分詞的不足,尤其是缺少基于理解領(lǐng)域的中文分詞技術(shù)等缺陷,本文探討了BP(Back Propagation)神經(jīng)網(wǎng)絡(luò)算法在中文分詞中的應(yīng)用研究,并針對BP神經(jīng)網(wǎng)絡(luò)應(yīng)用中文分詞具有收斂速度慢,容易陷入局部極小值以及速度和效率低等缺陷,提出了一種改進(jìn)的微粒群優(yōu)化算法(PSO, Particle SwarmOptimization)優(yōu)化BP神經(jīng)網(wǎng)絡(luò)——PSO-BP神經(jīng)網(wǎng)絡(luò),并將其運(yùn)用于中文分詞中,與傳統(tǒng)的BP神經(jīng)網(wǎng)絡(luò)相比較,可以得出PSO-BP神經(jīng)網(wǎng)絡(luò)不僅解決了傳統(tǒng)BP神經(jīng)網(wǎng)絡(luò)收斂速度慢的缺陷,同時也提高了分詞的精度。 然后,本文對Lucene提供的第三方中文分詞組件的API進(jìn)行了系統(tǒng)地研究與分析,將經(jīng)PSO-BP神經(jīng)網(wǎng)絡(luò)優(yōu)化后的中文分詞技術(shù)成功應(yīng)用于Lucene中,并與Lucene自帶的中文分詞技術(shù)進(jìn)行比較,得出該技術(shù)明顯優(yōu)于自帶的中文分詞技術(shù)。 最后,,本文采用包含PSO-BP神經(jīng)網(wǎng)絡(luò)中文分詞組件的Lucene進(jìn)行搜索引擎的設(shè)計和實(shí)現(xiàn),從而實(shí)現(xiàn)搜索引擎的中文分詞的智能化探索,為后續(xù)的工作和研究提供了一個良好的平臺。
[Abstract]:Lucene is a full-text search architecture with excellent index structure, good system architecture and high performance, scalable information search library. However, the support for Chinese word segmentation and various text formats is very inadequate. At present, there are many Chinese word segmentation algorithms used in Lucene, including StandardAnalyzer and CJKAnalyzer, provided by Lucene itself and ChineseAnalyzer and IK_CAnalyzer provided by third parties. Among them, StandardAnalyzer is based on word segmentation, that is to say, word segmentation is based on word segmentation. Its disadvantage is that it needs complex word matching algorithm and a large number of CPU operations. CJKAnalyzer and ChineseAnalyzer use dichotomy, so called dichotomy is each word as a word to divide; The word segmentation technology of IK_CAnalyzer is based on the word segmentation dictionary, and adopts the special forward iterative finest granularity segmentation algorithm and the analysis mode of multiple sub-processors. At present, the Lucene search engine has not realized the Chinese word segmentation method based on understanding, because the computer can not recognize the meaning of each word in different context, so the word segmentation method based on understanding has no practical application effect. In view of the deficiency of Lucene in Chinese word segmentation, especially the lack of Chinese word segmentation technology based on understanding, this paper discusses the application of BP (Back Propagation) neural network algorithm in Chinese word segmentation. Aiming at the shortcomings of BP neural network in the application of Chinese word segmentation, such as slow convergence, easy to fall into local minima, and low speed and efficiency, an improved particle swarm optimization algorithm (PSO,) is proposed. Particle SwarmOptimization) optimizes BP neural network, PSO-BP neural network, and applies it to Chinese word segmentation. Compared with traditional BP neural network, PSO-BP neural network not only solves the problem of slow convergence speed of traditional BP neural network. At the same time, the accuracy of word segmentation is improved. Then, the API of the third-party Chinese word segmentation component provided by Lucene is systematically studied and analyzed in this paper. The Chinese word segmentation technology optimized by PSO-BP neural network is successfully applied to Lucene, and compared with the Chinese word segmentation technology provided by Lucene. The result shows that this technique is superior to the Chinese word segmentation technology. Finally, this paper uses Lucene which includes PSO-BP neural network Chinese word segmentation component to design and implement the search engine, so as to realize the intelligent exploration of Chinese word segmentation of search engine, which provides a good platform for the follow-up work and research.
【學(xué)位授予單位】:中國石油大學(xué)(華東)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2013
【分類號】:TP391.3;TP183

【參考文獻(xiàn)】

相關(guān)期刊論文 前10條

1 龔漢明,周長勝;漢語分詞技術(shù)綜述[J];北京機(jī)械工業(yè)學(xué)院學(xué)報;2004年03期

2 余華;曹亮;李啟元;;BP神經(jīng)網(wǎng)絡(luò)算法的改進(jìn)及其在手寫體漢字識別中的應(yīng)用[J];江西師范大學(xué)學(xué)報(自然科學(xué)版);2009年05期

3 周平;;Lucene全文檢索引擎技術(shù)及應(yīng)用[J];重慶工學(xué)院學(xué)報(自然科學(xué)版);2007年04期

4 于洪波;;中文分詞技術(shù)研究[J];東莞理工學(xué)院學(xué)報;2010年05期

5 張利;張立勇;張曉淼;耿鐵鎖;岳宗閣;;基于改進(jìn)BP網(wǎng)絡(luò)的中文歧義字段分詞方法研究[J];大連理工大學(xué)學(xué)報;2007年01期

6 劉玲;嚴(yán)登俊;龔燈才;張紅梅;李大鵬;;基于粒子群模糊神經(jīng)網(wǎng)絡(luò)的短期電力負(fù)荷預(yù)測[J];電力系統(tǒng)及其自動化學(xué)報;2006年03期

7 姚李孝,宋玲芳,李慶宇,萬詩新;基于模糊聚類分析與BP網(wǎng)絡(luò)的電力系統(tǒng)短期負(fù)荷預(yù)測[J];電網(wǎng)技術(shù);2005年01期

8 丁麗;相玉紅;黃安民;張卓勇;;BP神經(jīng)網(wǎng)絡(luò)與近紅外光譜定量預(yù)測杉木中的綜纖維素、木質(zhì)素、微纖絲角[J];光譜學(xué)與光譜分析;2009年07期

9 王欣;葉華俊;黎慶濤;謝錦春;盧家炯;夏阿林;王健;;近紅外光譜結(jié)合人工神經(jīng)網(wǎng)絡(luò)分析蔗汁的錘度和旋光度[J];光譜學(xué)與光譜分析;2010年07期

10 嚴(yán)文娟;張晶;胡廣芹;趙靜;林凌;陸小左;李剛;;BP神經(jīng)網(wǎng)絡(luò)用于肝炎患者舌診近紅外光譜的研究[J];光譜學(xué)與光譜分析;2010年10期



本文編號:2428689

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2428689.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶bba29***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com
欧美国产极品一区二区| 亚洲一级二级三级精品| 欧美综合色婷婷欧美激情| 人妻巨大乳一二三区麻豆| 中日韩美女黄色一级片| 精品少妇一区二区视频| 日本大学生精油按摩在线观看| 日韩欧美一区二区黄色| 91久久精品中文内射| 在线免费视频你懂的观看| 插进她的身体里在线观看骚| 亚洲熟妇熟女久久精品| 女生更色还是男生更色| 亚洲欧美视频欧美视频| 成人精品一级特黄大片| 亚洲精品成人福利在线| 日本黄色美女日本黄色| 国产三级不卡在线观看视频| 九九热精品视频免费观看| 中文字幕人妻综合一区二区| 欧美日韩三区在线观看| 能在线看的视频你懂的| 久久久免费精品人妻一区二区三区| 日本黄色美女日本黄色| 国产美女精品午夜福利视频| 99久久婷婷国产亚洲综合精品| 女人高潮被爽到呻吟在线观看| 免费观看一级欧美大片| 久久国产人妻一区二区免费| 国产精品一区二区高潮| 草草草草在线观看视频| 夫妻性生活黄色录像视频| 亚洲欧美一二区日韩高清在线 | 熟女白浆精品一区二区| 国产传媒精品视频一区| 又大又长又粗又黄国产| 欧美精品亚洲精品一区| 欧美有码黄片免费在线视频| 免费大片黄在线观看日本| 久久热这里只有精品视频| 午夜资源在线观看免费高清|