天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于移動(dòng)終端的Web信息檢索技術(shù)研究

發(fā)布時(shí)間:2018-04-04 01:26

  本文選題:移動(dòng)互聯(lián)網(wǎng) 切入點(diǎn):信息提取 出處:《浙江理工大學(xué)》2012年碩士論文


【摘要】:隨著移動(dòng)互聯(lián)網(wǎng)的快速發(fā)展,人們?cè)絹?lái)越習(xí)慣于隨時(shí)隨地通過(guò)手機(jī)等移動(dòng)終端來(lái)上網(wǎng)。在瀏覽網(wǎng)頁(yè)時(shí)經(jīng)常會(huì)看到網(wǎng)頁(yè)中會(huì)包含大量和我們所關(guān)心的內(nèi)容無(wú)關(guān)的導(dǎo)航條、廣告信息、版權(quán)信息以及其他一些信息等。對(duì)于移動(dòng)用戶(hù)來(lái)說(shuō),這些信息不僅讓他們被動(dòng)的去瀏覽而浪費(fèi)寶貴的時(shí)間,而且也因?yàn)闉g覽了這些信息造成不必要的流量浪費(fèi)。所以如何除去網(wǎng)頁(yè)中多余的信息,,讓網(wǎng)頁(yè)為用戶(hù)做出需求應(yīng)答時(shí)所展現(xiàn)的內(nèi)容只是用戶(hù)想看的內(nèi)容,這是非常有必要的。比如,用戶(hù)只想獲取一個(gè)詞的名詞解釋?zhuān)撬阉饕娣祷氐慕Y(jié)果就是單純的名詞解釋;谶@一點(diǎn),本文在研究了網(wǎng)頁(yè)凈化的相關(guān)技術(shù)和Lucene搜索引擎的基礎(chǔ)上,開(kāi)發(fā)設(shè)計(jì)了一套適合手機(jī)等移動(dòng)終端獲取主題文本信息的搜索系統(tǒng)。 首先,論文對(duì)本系統(tǒng)需要用到的相關(guān)技術(shù)作了大致的介紹。主要研究了網(wǎng)頁(yè)凈化領(lǐng)域的相關(guān)技術(shù),包括網(wǎng)頁(yè)適應(yīng)、網(wǎng)頁(yè)分割和網(wǎng)頁(yè)主題信息提取,同時(shí),對(duì)Lucene開(kāi)發(fā)工具包的技術(shù)和應(yīng)用特點(diǎn)作了重點(diǎn)介紹,主要涉及Lucene的索引和查詢(xún),還有分析了自動(dòng)摘要和正則表達(dá)式。 然后,論文針對(duì)本系統(tǒng)的兩個(gè)重要模塊分別作介紹。一個(gè)是網(wǎng)頁(yè)預(yù)處理模塊,基于對(duì)網(wǎng)頁(yè)凈化技術(shù)的研究,采用信息提取的方法實(shí)現(xiàn)對(duì)主題信息的獲;另一個(gè)是信息檢索模塊,所檢索的信息就是網(wǎng)頁(yè)預(yù)處理模塊得到的主題信息。在改進(jìn)的中文分詞的基礎(chǔ)上,采用Lucene搜索引擎包實(shí)現(xiàn)對(duì)信息的索引和查詢(xún)。 最后,論文對(duì)整個(gè)系統(tǒng)的設(shè)計(jì)進(jìn)行了介紹。系統(tǒng)實(shí)現(xiàn)了網(wǎng)頁(yè)搜集,網(wǎng)頁(yè)預(yù)處理和內(nèi)容服務(wù)三個(gè)模塊,完成了根據(jù)用戶(hù)輸入的關(guān)鍵字提供給用戶(hù)文本信息服務(wù)的功能,實(shí)驗(yàn)證明這種方法既能提高查詢(xún)的準(zhǔn)確率,也大大的減少了網(wǎng)絡(luò)流量。
[Abstract]:With the rapid development of mobile Internet, people are more and more accustomed to mobile terminals such as mobile phones.When you browse the web page, you often see that it contains a lot of navigation bars, advertising information, copyright information and other information that are not related to the content we are concerned about.For mobile users, this information not only allows them to passively browse and waste valuable time, but also caused unnecessary waste of traffic because of browsing the information.Therefore, it is necessary to remove the redundant information from the web page and make the content displayed when the web page is responding to the needs of the user, which is only what the user wants to see.For example, if a user only wants to get a noun explanation of a word, the search engine returns a simple noun explanation.Based on this, based on the research of the technology of web page purification and the Lucene search engine, this paper develops a search system which is suitable for mobile terminals such as mobile phones to obtain topic text information.First of all, the paper makes a general introduction to the relevant technologies that need to be used in this system.This paper mainly studies the related technologies in the field of web page purification, including web page adaptation, page segmentation and page subject information extraction. At the same time, the technology and application characteristics of Lucene development toolkit are introduced emphatically, mainly involving the index and query of Lucene.Automatic abstracts and regular expressions are also analyzed.Then, the paper introduces two important modules of the system.One is the web page preprocessing module, based on the research of the page purification technology, the method of information extraction is used to obtain the subject information; the other is the information retrieval module.The information retrieved is the topic information obtained by the web page preprocessing module.Based on the improved Chinese word segmentation, Lucene search engine package is used to index and query information.Finally, the design of the whole system is introduced.The system realizes three modules of web page collection, page preprocessing and content service, and completes the function of providing user text information service according to the keywords input by the user. The experiment proves that this method can improve the accuracy of query.Also greatly reduced the network traffic.
【學(xué)位授予單位】:浙江理工大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2012
【分類(lèi)號(hào)】:TP391.3

【參考文獻(xiàn)】

相關(guān)期刊論文 前5條

1 程曉偉;田東風(fēng);;基于樹(shù)及索引的HTML表格數(shù)據(jù)挖掘算法研究[J];電腦知識(shí)與技術(shù);2009年10期

2 李峰;陳達(dá);劉澤宏;彭青立;朱春梅;;手機(jī)瀏覽器技術(shù)與發(fā)展探討[J];電信技術(shù);2011年02期

3 潘以鋒;;基于Lucene的網(wǎng)站全文檢索系統(tǒng)的開(kāi)發(fā)[J];廣西教育學(xué)院學(xué)報(bào);2006年05期

4 王琦,唐世渭,楊冬青,王騰蛟;基于DOM的網(wǎng)頁(yè)主題信息自動(dòng)提取[J];計(jì)算機(jī)研究與發(fā)展;2004年10期

5 郭煒強(qiáng);戴天;文貴華;;基于領(lǐng)域知識(shí)的專(zhuān)利自動(dòng)分類(lèi)[J];計(jì)算機(jī)工程;2005年23期

相關(guān)博士學(xué)位論文 前1條

1 孫曉;中文詞法分析的研究及其應(yīng)用[D];大連理工大學(xué);2010年



本文編號(hào):1707764

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/wenyilunwen/guanggaoshejilunwen/1707764.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶(hù)50664***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com