基于XML的自動(dòng)學(xué)習(xí)Web信息抽取

發(fā)布時(shí)間：2018-04-16 02:31

本文選題：信息提取 + 半結(jié)構(gòu)化�。� 參考：《計(jì)算機(jī)科學(xué)》2008年03期

【摘要】：因特網(wǎng)給我們提供了巨大的信息量,在信息量極其豐富的Web資源中,蘊(yùn)涵著大量有用的知識(shí)信息。信息爆炸而知識(shí)匱乏是當(dāng)今人們所面臨的一個(gè)很重要的問(wèn)題。通過(guò)搜索引擎來(lái)查找信息將不容易定位到用戶(hù)最感興趣的數(shù)據(jù)上。而通過(guò)Web信息抽取的自動(dòng)化實(shí)現(xiàn),可以提高信息獲得的效率。信息抽取可以從網(wǎng)絡(luò)上分析和發(fā)現(xiàn)有用的信息,廢棄冗余的數(shù)據(jù),提取用戶(hù)知識(shí)領(lǐng)域的知識(shí)。本文分析了基于XML的Web信息提取,討論了相關(guān)技術(shù)在Web信息抽取中的應(yīng)用并建立了相應(yīng)的Web信息抽取模型,通過(guò)自動(dòng)學(xué)習(xí)來(lái)獲取信息抽取規(guī)則,實(shí)現(xiàn)Web信息的自動(dòng)提取。
[Abstract]:The Internet provides us with a huge amount of information. In the abundant Web resources, it contains a lot of useful knowledge information.Information explosion and lack of knowledge is a very important problem that people are facing today.Search engines to find information will not be easy to locate the user's most interesting data.Through the automation of Web information extraction, the efficiency of information acquisition can be improved.Information extraction can analyze and find useful information from the network, discard redundant data, and extract user knowledge in the domain of knowledge.This paper analyzes the Web information extraction based on XML, discusses the application of related techniques in Web information extraction, and establishes the corresponding Web information extraction model. The rules of information extraction are obtained by automatic learning, and the automatic extraction of Web information is realized.
【作者單位】：中山大學(xué)計(jì)算機(jī)科學(xué)系中山大學(xué)計(jì)算機(jī)科學(xué)系中山大學(xué)計(jì)算機(jī)科學(xué)系中山大學(xué)計(jì)算機(jī)科學(xué)系中山大學(xué)計(jì)算機(jī)科學(xué)系中山大學(xué)計(jì)算機(jī)科學(xué)系
【基金】：國(guó)家自然科學(xué)基金項(xiàng)目(60373081,60673135) 廣東省自然科學(xué)基金項(xiàng)目(04105503,5003348) 教育部“新世紀(jì)優(yōu)秀人才支持計(jì)劃”資助項(xiàng)目
【分類(lèi)號(hào)】：TP312.2

【相似文獻(xiàn)】

相關(guān)期刊論文前10條

1 盧正鼎,董澤鋒;文法推斷與HMM相結(jié)合的信息提取[J];計(jì)算機(jī)工程與科學(xué);2005年08期

2 張友華;熊范綸;杭小樹(shù);;基于WEB的增量式數(shù)據(jù)挖掘的研究與應(yīng)用[J];模式識(shí)別與人工智能;2004年04期

3 鄭思婷;楊p芑，

本文編號(hào)：1756933

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會(huì)員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/sousuoyinqinglunwen/1756933.html

上一篇：林業(yè)專(zhuān)題動(dòng)態(tài)信息的搜索與集成
下一篇：海量多模式串匹配算法關(guān)鍵技術(shù)研究

論文發(fā)表

·知網(wǎng)|萬(wàn)方|維普|龍?jiān)磡省級(jí)|國(guó)家級(jí)|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于XML的自動(dòng)學(xué)習(xí)Web信息抽取