Nutch中網(wǎng)頁排序效果的改進方法
發(fā)布時間:2018-12-17 12:08
【摘要】:Nutch是一個Java實現(xiàn)的開源搜索引擎。針對目前Nutch對中文進行單字切分且沒有實現(xiàn)PageRank計算的缺點,改進PageRank算法,設(shè)計并實現(xiàn)基于MapReduce的PageRank計算方法,對Nutch中文分詞進行改進,加入JE中文分詞器。實驗結(jié)果表明,改進后的Nutch具有更高的查詢結(jié)果準確率和中文網(wǎng)頁排序效果。
[Abstract]:Nutch is an open source search engine implemented by Java. In view of the disadvantage of Nutch segmentation of Chinese characters without PageRank computation, the PageRank algorithm is improved, the PageRank computing method based on MapReduce is designed and implemented, the Chinese word segmentation of Nutch is improved and JE Chinese word segmentation is added. The experimental results show that the improved Nutch has higher query accuracy and Chinese web page sorting effect.
【作者單位】: 廣西大學計算機與電子信息學院;
【基金】:廣西自然科學基金資助項目(桂科自0832059)
【分類號】:TP391.3
[Abstract]:Nutch is an open source search engine implemented by Java. In view of the disadvantage of Nutch segmentation of Chinese characters without PageRank computation, the PageRank algorithm is improved, the PageRank computing method based on MapReduce is designed and implemented, the Chinese word segmentation of Nutch is improved and JE Chinese word segmentation is added. The experimental results show that the improved Nutch has higher query accuracy and Chinese web page sorting effect.
【作者單位】: 廣西大學計算機與電子信息學院;
【基金】:廣西自然科學基金資助項目(桂科自0832059)
【分類號】:TP391.3
【相似文獻】
相關(guān)期刊論文 前10條
1 潘濤;梁正友;;Nutch中網(wǎng)頁排序效果的改進方法[J];計算機工程;2010年13期
2 詹恒飛;楊岳湘;方宏;;Nutch分布式網(wǎng)絡(luò)爬蟲研究與優(yōu)化[J];計算機科學與探索;2011年01期
3 江務(wù)學;張t,
本文編號:2384178
本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2384178.html
最近更新
教材專著