林產(chǎn)品商務(wù)Web信息源發(fā)現(xiàn)技術(shù)研究
發(fā)布時(shí)間:2018-11-28 08:57
【摘要】:為解決“信息過載”問題和滿足專業(yè)領(lǐng)域信息需求,面向主題的Web信息整合技術(shù)逐漸成為研究熱點(diǎn)。Web信息整合技術(shù)將位于不同信息源的分散的主題信息進(jìn)行整合并提供垂直信息服務(wù)。林產(chǎn)品商務(wù)網(wǎng)站中包含的林產(chǎn)品供求信息是一種重要的林業(yè)信息資源,整合互聯(lián)網(wǎng)上分散的林產(chǎn)品供求信息是實(shí)現(xiàn)高效的林產(chǎn)品商務(wù)信息服務(wù)的基礎(chǔ)。已有的研究對不同林產(chǎn)品商務(wù)網(wǎng)站中的信息進(jìn)行了整合,但整合信息來自的信息源是由人工獲取的,人工搜索不僅工作量大,而且獲取數(shù)量有限。林產(chǎn)品商務(wù)信息源數(shù)量眾多且分布廣泛,需要一種林產(chǎn)品商務(wù)信息源的自動發(fā)現(xiàn)方法。 本文首先介紹了現(xiàn)有的網(wǎng)站發(fā)現(xiàn)方法及相關(guān)技術(shù),然后結(jié)合林產(chǎn)品商務(wù)網(wǎng)站的特點(diǎn),提出了一種林產(chǎn)品商務(wù)信息源的自動發(fā)現(xiàn)方法,將信息源發(fā)現(xiàn)轉(zhuǎn)化為網(wǎng)絡(luò)搜索過程和網(wǎng)站分類過程。網(wǎng)絡(luò)搜索過程的目的是從“種子網(wǎng)站”出發(fā),在盡量少爬取網(wǎng)頁的同時(shí)發(fā)現(xiàn)站外鏈接,對“種子網(wǎng)站”進(jìn)行擴(kuò)展;網(wǎng)站分類的目的是將符合條件的林產(chǎn)品商務(wù)網(wǎng)站從所有網(wǎng)站中區(qū)分開來。在網(wǎng)站分類時(shí),研究建立了林產(chǎn)品商務(wù)網(wǎng)站特征詞庫,提出了一種基于關(guān)鍵資源的改進(jìn)的向量空間模型來描述網(wǎng)站主題,使用基于SVM的分類器對網(wǎng)站進(jìn)行分類判別。最后設(shè)計(jì)并實(shí)現(xiàn)了林產(chǎn)品商務(wù)信息源發(fā)現(xiàn)模塊,通過實(shí)驗(yàn)發(fā)現(xiàn)了110個(gè)林產(chǎn)品商務(wù)網(wǎng)站,驗(yàn)證了所提方法的有效性,能夠有效解決林產(chǎn)品商務(wù)Web信息整合中的信息源發(fā)現(xiàn)問題。
[Abstract]:To address the problem of "information overload" and to meet the information needs of specialized areas, Topic oriented Web information integration technology has gradually become a research hotspot. Web information integration technology integrates distributed topic information located in different information sources and provides vertical information services. The supply and demand information of forest products contained in the forest products commerce website is an important forestry information resource. Integrating the scattered forest product supply and demand information on the Internet is the basis to realize the efficient commercial information service of forest products. Existing studies have integrated the information in different forest products commercial websites, but the integrated information from the information source is obtained by artificial, manual search not only heavy workload, but also a limited amount of access. The commercial information sources of forest products are numerous and widely distributed, so it is necessary to find the commercial information sources of forest products automatically. This paper first introduces the existing methods of website discovery and related technologies, and then, according to the characteristics of forest product commerce website, puts forward an automatic discovery method of forest product business information source. The information source discovery is transformed into the web search process and the website classification process. The purpose of the network search process is to start from the "seed website", to find out the link outside the station while crawling the webpage as little as possible, and to expand the "seed website"; The purpose of website classification is to distinguish eligible forest products business websites from all websites. In the process of website classification, the characteristic lexicon of forest product commerce website is established, an improved vector space model based on key resources is proposed to describe the website topic, and the classifier based on SVM is used to classify and discriminate the website. Finally, the commercial information source discovery module of forest products is designed and implemented. 110 commercial forest products websites are found through experiments. The effectiveness of the proposed method is verified, and the problem of information source discovery in forest product commercial Web information integration is effectively solved.
【學(xué)位授予單位】:北京林業(yè)大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2014
【分類號】:TP393.092;F326.2;F724.6
本文編號:2362405
[Abstract]:To address the problem of "information overload" and to meet the information needs of specialized areas, Topic oriented Web information integration technology has gradually become a research hotspot. Web information integration technology integrates distributed topic information located in different information sources and provides vertical information services. The supply and demand information of forest products contained in the forest products commerce website is an important forestry information resource. Integrating the scattered forest product supply and demand information on the Internet is the basis to realize the efficient commercial information service of forest products. Existing studies have integrated the information in different forest products commercial websites, but the integrated information from the information source is obtained by artificial, manual search not only heavy workload, but also a limited amount of access. The commercial information sources of forest products are numerous and widely distributed, so it is necessary to find the commercial information sources of forest products automatically. This paper first introduces the existing methods of website discovery and related technologies, and then, according to the characteristics of forest product commerce website, puts forward an automatic discovery method of forest product business information source. The information source discovery is transformed into the web search process and the website classification process. The purpose of the network search process is to start from the "seed website", to find out the link outside the station while crawling the webpage as little as possible, and to expand the "seed website"; The purpose of website classification is to distinguish eligible forest products business websites from all websites. In the process of website classification, the characteristic lexicon of forest product commerce website is established, an improved vector space model based on key resources is proposed to describe the website topic, and the classifier based on SVM is used to classify and discriminate the website. Finally, the commercial information source discovery module of forest products is designed and implemented. 110 commercial forest products websites are found through experiments. The effectiveness of the proposed method is verified, and the problem of information source discovery in forest product commercial Web information integration is effectively solved.
【學(xué)位授予單位】:北京林業(yè)大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2014
【分類號】:TP393.092;F326.2;F724.6
【參考文獻(xiàn)】
相關(guān)期刊論文 前6條
1 陳龍;范瑞霞;高琪;;基于概念的文本表示模型[J];計(jì)算機(jī)工程與應(yīng)用;2008年20期
2 朱煒,李俊,王超,潘金貴;一個(gè)自動發(fā)現(xiàn)確定主題下資源的系統(tǒng)[J];計(jì)算機(jī)應(yīng)用研究;2004年11期
3 劉雪瓊;武剛;鄧厚平;;Web信息整合中的數(shù)據(jù)去重方法[J];計(jì)算機(jī)應(yīng)用;2013年09期
4 李有梅;基于詞義的關(guān)鍵詞抽取方法研究[J];情報(bào)理論與實(shí)踐;2000年02期
5 龐觀松;蔣盛益;;文本自動分類技術(shù)研究綜述[J];情報(bào)理論與實(shí)踐;2012年02期
6 李會;王立峰;;Web網(wǎng)頁文本特征選擇方法研究[J];計(jì)算機(jī)工程與設(shè)計(jì);2010年16期
相關(guān)博士學(xué)位論文 前1條
1 楊抒;基于WEB的林產(chǎn)品信息資源整合方法研究[D];北京林業(yè)大學(xué);2011年
,本文編號:2362405
本文鏈接:http://sikaile.net/guanlilunwen/ydhl/2362405.html
最近更新
教材專著