Web信息抽取在書簽系統(tǒng)中的應(yīng)用研究與實(shí)現(xiàn)
[Abstract]:Social bookmarking system is an effective tool for Web information resource collection, management and sharing, but its social function depends on the number of users and resources. The main research content of this paper is how to apply the natural language related research such as Web information extraction to the bookmark system to solve the cold start problem of the bookmark system and improve the user experience. In this paper, we first study and implement the Web information extraction algorithm. Based on the Goose project, the Web information extraction algorithm in this paper improves the Web web page data capture, adds the automatic recognition to the web page coding, and optimizes the preprocessing of the web pages by observing and summarizing the HTML structure features of a large number of websites. Finally, the text is formatted to optimize the reading experience. Finally, the Web information extraction module based on ElementTree is implemented. This module can be used in production system and has strong practicability. At the same time, based on the results of Web information extraction and the metadata of Web pages, a resource-based label recommendation algorithm is implemented, and a simple function of web page summary is realized. In this paper, a bookmark system is designed and implemented. The infrastructure uses Tornado as Web server and Web development framework, MongoDB as database server, AngularJS, j Query as client, and BootStrap3 style. The client application of response layout and flat grid is realized, and the Chrome browser plug-in is implemented. The system integrates Web information extraction module, provides users with bookmark content reading and editing functions, effectively improve the user experience. Based on the result of information extraction, the search function of the bookmark system in this paper can be realized by full-text search, which avoids the limitation of traditional bookmark system which only searches for tags or titles. Also avoid the entire Web page full-text search for the existence of noise information. The system realized in this paper is different from the popular recommendation reading system. It pays more attention to bookmark management than reading. If we can combine bookmark system with note-taking system, we can effectively realize the secondary filtering of information.
【學(xué)位授予單位】:南京理工大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2014
【分類號(hào)】:TP393.092;TP391.3
【參考文獻(xiàn)】
相關(guān)期刊論文 前9條
1 龍樹全;趙正文;唐華;;中文分詞算法概述[J];電腦知識(shí)與技術(shù);2009年10期
2 齊鵬;李隱峰;宋玉偉;;基于Python的Web數(shù)據(jù)采集技術(shù)[J];電子科技;2012年11期
3 王利;劉宗田;王燕華;廖濤;;基于內(nèi)容相似度的網(wǎng)頁(yè)正文提取[J];計(jì)算機(jī)工程;2010年06期
4 趙欣欣;索紅光;劉玉樹;;基于標(biāo)記窗的網(wǎng)頁(yè)正文信息提取方法[J];計(jì)算機(jī)應(yīng)用研究;2007年03期
5 李亞君;李治森;;社會(huì)性書簽:一種新型的網(wǎng)絡(luò)服務(wù)[J];江西圖書館學(xué)刊;2008年01期
6 李觀金;;基于SEO的代碼優(yōu)化策略[J];科技致富向?qū)?2011年17期
7 孫承杰,關(guān)毅;基于統(tǒng)計(jì)的網(wǎng)頁(yè)正文信息抽取方法的研究[J];中文信息學(xué)報(bào);2004年05期
8 馮姚震;劉亞軍;;社會(huì)書簽在現(xiàn)代遠(yuǎn)程教育平臺(tái)中的應(yīng)用[J];寧波廣播電視大學(xué)學(xué)報(bào);2008年04期
9 喬綠茵;張敏;;我國(guó)基于Folksonomy的標(biāo)簽推薦方法研究綜述[J];信息資源管理學(xué)報(bào);2012年04期
相關(guān)博士學(xué)位論文 前1條
1 靳延安;社會(huì)標(biāo)簽推薦技術(shù)與方法研究[D];華中科技大學(xué);2011年
,本文編號(hào):2381182
本文鏈接:http://sikaile.net/guanlilunwen/ydhl/2381182.html