天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁 > 科技論文 > 搜索引擎論文 >

基于Nutch的聚類搜索引擎的研究與實現(xiàn)

發(fā)布時間:2018-11-11 14:14
【摘要】:在互聯(lián)網(wǎng)蓬勃發(fā)展的今天,網(wǎng)絡(luò)信息呈指數(shù)式增長。面對海量的網(wǎng)絡(luò)信息,如何以最快捷、準(zhǔn)確的方式獲取信息,也許是每一個網(wǎng)民最大的需求。在這種情況下,谷歌、百度、雅虎等搜索引擎順勢而生,為網(wǎng)民獲取信息打開了通路。但是,傳統(tǒng)的搜索引擎遠非完美,其以線性列表的方式顯示搜索結(jié)果,給網(wǎng)民快速獲、準(zhǔn)確地取信息帶來了困難。因此,研究者們將文本聚類引入到對搜索引擎返回結(jié)果進行分析的過程中,以幫助用戶快速找到所求。 本文的研究工作主要圍繞如何提高聚類質(zhì)量和聚類算法計算效率展開。具體做法是從非負(fù)矩陣分解算法、向量空間模型、后綴數(shù)組排序和中文分詞模塊四個方面著手,對中文聚類算法的關(guān)鍵技術(shù)進行深入的研究,并以Lingo聚類算法為原型,研究提出了一種用于對中小規(guī)模文檔集進行聚類分析的中文聚類算法Rlingo。 本文所做的主要工作是:第一、首次將基于板倉-齋藤散度的非負(fù)矩陣分解引入到聚類分析中,提高了聚類標(biāo)簽的可讀性和聚類結(jié)果的整體質(zhì)量;第二、將位置因素和詞性因素引入對傳統(tǒng)的向量空間模型進行改進,進一步提高了聚類結(jié)果的質(zhì)量;第三、基于線性后綴數(shù)組排序算法:skew算法,提出了一種能消除無實際意義特征詞對特征抽取質(zhì)量干擾的改進型skew后綴數(shù)組排序算法,減少了聚類算法對中小規(guī)模文檔集進行聚類分析的處理時間;第四、基于Nutch,利用Rlingo實現(xiàn)了一個面向旅游的聚類系統(tǒng),系統(tǒng)性能基本達到預(yù)期效果。 最后,,本文設(shè)置了對照實驗,比較了Rlingo、Lingo、K-means和STC的綜合性能。實驗表明:Rlingo聚類算法對中小文檔集的聚類結(jié)果明顯優(yōu)于其他三種聚類算法,改進的聚類算法基本達到預(yù)期效果。
[Abstract]:In the vigorous development of the Internet today, network information is exponential growth. In the face of mass network information, how to obtain information in the most rapid and accurate way is perhaps the biggest demand of every Internet user. In this case, Google, Baidu, Yahoo and other search engines, opened the way for Internet users to access information. However, the traditional search engine is far from perfect, which displays the search results in the form of linear list, which makes it difficult for Internet users to get information quickly and accurately. Therefore, the researchers introduce text clustering into the process of analyzing the results returned by search engines in order to help users quickly find what they are looking for. This paper focuses on how to improve the clustering quality and the computational efficiency of the clustering algorithm. In this paper, the key technologies of Chinese clustering algorithm are studied from four aspects: non-negative matrix decomposition algorithm, vector space model, suffix array sort and Chinese word segmentation module. The algorithm is based on Lingo clustering algorithm. This paper presents a Chinese clustering algorithm Rlingo. for clustering analysis of small and medium-sized document sets. The main work of this paper is as follows: first, the nonnegative matrix decomposition based on the Bankura-Saito divergence is introduced into the clustering analysis for the first time, which improves the readability of the clustering tags and the overall quality of the clustering results; Secondly, the position factor and the part of speech factor are introduced into the traditional vector space model to improve the quality of the clustering results. Thirdly, based on the linear suffix array sorting algorithm: skew algorithm, an improved skew suffix array sorting algorithm is proposed, which can eliminate the quality interference of feature extraction without actual meaning. The processing time of clustering analysis for small and medium-sized document sets is reduced by clustering algorithm. Fourthly, a tourism-oriented clustering system based on Nutch, is implemented with Rlingo. Finally, a comparative experiment was conducted to compare the comprehensive performance of Rlingo,Lingo,K-means and STC. The experimental results show that the clustering results of Rlingo clustering algorithm for small and medium document sets are obviously better than the other three clustering algorithms, and the improved clustering algorithm basically achieves the expected results.
【學(xué)位授予單位】:華南理工大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2013
【分類號】:TP391.3;TP311.13

【參考文獻】

相關(guān)期刊論文 前3條

1 劉金紅;陸余良;;主題網(wǎng)絡(luò)爬蟲研究綜述[J];計算機應(yīng)用研究;2007年10期

2 黃昌寧;趙海;;中文分詞十年回顧[J];中文信息學(xué)報;2007年03期

3 魏群;趙驥;劉保相;;網(wǎng)頁模糊歸類算法的應(yīng)用與實現(xiàn)[J];微計算機信息;2006年15期

相關(guān)博士學(xué)位論文 前1條

1 周

本文編號:2325074


資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2325074.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶7cee2***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com
青青免费操手机在线视频| 深夜福利亚洲高清性感| 国产偷拍盗摄一区二区| 欧美日韩国产另类一区二区| 91欧美视频在线观看免费| 91欧美一区二区三区| 中文字幕亚洲视频一区二区| 尹人大香蕉中文在线播放| 青青操日老女人的穴穴| 欧美日韩视频中文字幕| 亚洲日本中文字幕视频在线观看| 日韩综合国产欧美一区| 欧美精品亚洲精品日韩专区| 国产亚洲欧美日韩精品一区| 精品人妻一区二区三区四区久久| 国产精品久久女同磨豆腐| 丰满人妻一二三区av| 亚洲乱码av中文一区二区三区| 亚洲综合激情另类专区老铁性| 国产成人国产精品国产三级| 国产三级不卡在线观看视频| 激情爱爱一区二区三区| 精品女同一区二区三区| 99少妇偷拍视频在线| 日本免费一区二区三女| 中文字幕亚洲精品在线播放| 中文字幕人妻综合一区二区| 中国少妇精品偷拍视频 | 九九久久精品久久久精品| 亚洲欧洲一区二区综合精品| 在线日韩中文字幕一区 | 国产午夜福利一区二区| 久久福利视频视频一区二区| 亚洲精品蜜桃在线观看| 五月综合激情婷婷丁香| 亚洲第一区欧美日韩在线| 在线观看欧美视频一区| 男人把女人操得嗷嗷叫| 欧美日韩综合综合久久久| 91精品视频全国免费| 欧美国产在线观看精品|