面向Blog的自動(dòng)摘要與搜索排序算法研究
本文選題:Blog搜索 切入點(diǎn):摘要抽取 出處:《蘇州大學(xué)》2012年碩士論文
【摘要】:隨著信息技術(shù)的不斷發(fā)展,Blog應(yīng)用正在不斷普及和深化。龐大的Blog用戶群所形成的海量信息源使得Blog成為一個(gè)極其豐富而有價(jià)值的信息資源庫(kù)。面對(duì)如此巨大的信息資源,優(yōu)秀的Blog搜索引擎顯得十分重要,這一需求使得Blog搜索引擎受到越來越多研究人員的關(guān)注。在與之相關(guān)的研究中,合理的自動(dòng)摘要能夠讓用戶迅速判斷出信息的有效性,良好的搜索排序算法能夠優(yōu)先返回給用戶質(zhì)量更高的結(jié)果,這兩部分對(duì)Blog搜索引擎的好壞無疑有著決定性的作用。 本文主要針對(duì)面向Blog的自動(dòng)摘要和搜索排序算法進(jìn)行了較深入的研究,主要研究工作概括如下: 1)描述了Blog相關(guān)概念,介紹了和本文研究相關(guān)的國(guó)內(nèi)外研究現(xiàn)狀,分別對(duì)面向Blog的自動(dòng)摘要和搜索排序算法的相關(guān)方法進(jìn)行了詳細(xì)分析。 2)根據(jù)本文應(yīng)用需求,對(duì)Blog中的信息進(jìn)行了兩方面預(yù)處理,包括:將評(píng)論識(shí)別成討論型評(píng)論、關(guān)注型評(píng)論、垃圾評(píng)論三類,并根據(jù)類型挖掘?qū)?yīng)的價(jià)值;利用貝葉斯文本分類方法,融合博文、標(biāo)簽和評(píng)論三種特征對(duì)博文進(jìn)行分類。 3)提出了一種基于特征信息的Blog自動(dòng)摘要方法。該方法在充分利用Blog特征信息的基礎(chǔ)上,基于潛在語(yǔ)義相關(guān)性來融合評(píng)論中的關(guān)注點(diǎn),生成對(duì)讀者更為友好的摘要,同時(shí)通過摘要復(fù)選的方法平衡了主題覆蓋與信息冗余。 4)利用博主之間的各種關(guān)注關(guān)系評(píng)價(jià)博主的影響力,繼而計(jì)算博文的內(nèi)容價(jià)值,并考慮評(píng)論因素,給出博文的靜態(tài)得分。然后考慮博文新鮮度,查詢相似性等多方面因素來對(duì)搜索結(jié)果進(jìn)行合理排序。 5)利用上述研究成果,設(shè)計(jì)并實(shí)現(xiàn)了一個(gè)能夠適應(yīng)用戶對(duì)評(píng)論偏好的Blog搜索引擎原型系統(tǒng),該原型系統(tǒng)同時(shí)提供了分類瀏覽功能。
[Abstract]:With the development of information technology, blog application is popularizing and deepening.The massive information source formed by the huge Blog user group makes Blog an extremely rich and valuable information resource.In the face of such huge information resources, the excellent Blog search engine is very important, which makes the Blog search engine attract more and more researchers' attention.In the related research, reasonable automatic summary can make the user judge the validity of the information quickly, and the good search sorting algorithm can give priority to the higher quality result of the user.These two parts of the Blog search engine is undoubtedly good or bad has a decisive role.This paper mainly focuses on the automatic summary and search sorting algorithm for Blog. The main research work is summarized as follows:1) the related concepts of Blog are described, and the research status of this paper is introduced, and the relevant methods of automatic summary and search sorting algorithm for Blog are analyzed in detail.2) according to the application requirement of this paper, the information in Blog is preprocessed in two aspects, including: identifying the comment as discussion comment, concern comment, garbage comment, and mining the corresponding value according to the type;This paper uses Bayesian text classification method to classify blog articles with three features: blog, label and comment.3) an automatic Blog summarization method based on feature information is proposed.On the basis of making full use of the Blog feature information, the method combines the concerns of comments based on the potential semantic correlation, and generates a more reader friendly summary. Meanwhile, the topic coverage and information redundancy are balanced by the method of summary check.4) evaluating the influence of bloggers by using various relationships of concern among bloggers, then calculating the content value of blog posts, and considering the factors of comment, the static scores of blog posts are given.Then consider the freshness of blog articles, query similarity and other factors to sort the search results.5) based on the above research results, a prototype system of Blog search engine is designed and implemented, which can adapt to users' preference for comments. The prototype system also provides classification browsing function.
【學(xué)位授予單位】:蘇州大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2012
【分類號(hào)】:TP391.3;TP393.09
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 王繼成 ,武港山 ,周源遠(yuǎn) ,張福炎;一種篇章結(jié)構(gòu)指導(dǎo)的中文Web文檔自動(dòng)摘要方法[J];計(jì)算機(jī)研究與發(fā)展;2003年03期
2 王萌,何婷婷,張偉;基于概念向量空間模型的中文自動(dòng)文摘系統(tǒng)[J];計(jì)算機(jī)工程與應(yīng)用;2005年01期
3 黃海英;林士敏;嚴(yán)小衛(wèi);;基于概念空間的文本分類研究[J];計(jì)算機(jī)科學(xué);2003年03期
4 盧剛;;一種基于多特征融合的博客文章排序算法[J];計(jì)算機(jī)工程;2009年02期
5 何海江;;一種適應(yīng)短文本的相關(guān)測(cè)度及其應(yīng)用[J];計(jì)算機(jī)工程;2009年06期
6 余正濤;樊孝忠;郭劍毅;耿增民;;基于潛在語(yǔ)義分析的漢語(yǔ)問答系統(tǒng)答案提取[J];計(jì)算機(jī)學(xué)報(bào);2006年10期
7 周立柱,林玲;聚焦爬蟲技術(shù)研究綜述[J];計(jì)算機(jī)應(yīng)用;2005年09期
8 蓋杰,王怡,武港山;潛在語(yǔ)義分析理論及其應(yīng)用[J];計(jì)算機(jī)應(yīng)用研究;2004年03期
9 王文欣,黃萱菁,吳立德;基于統(tǒng)計(jì)方法的漢語(yǔ)自動(dòng)文摘系統(tǒng)研究[J];計(jì)算機(jī)應(yīng)用與軟件;2000年09期
10 王建波,王開鑄;自然語(yǔ)言篇章理解及基于理解的自動(dòng)文摘研究[J];中文信息學(xué)報(bào);1992年02期
,本文編號(hào):1707691
本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/1707691.html