比價(jià)購(gòu)物平臺(tái)中網(wǎng)絡(luò)爬蟲(chóng)的設(shè)計(jì)與實(shí)現(xiàn)

發(fā)布時(shí)間：2018-12-31 08:10

【摘要】：隨著信息技術(shù)的普及與發(fā)展, Internet已深入到人們生活與工作的各個(gè)角落,搜索引擎已成為人們獲取信息最快捷的工具,網(wǎng)上購(gòu)物已成為一種生活方式,越來(lái)越被大多數(shù)人接受。但是網(wǎng)上商品種類繁多、價(jià)格高低不同和商家良莠不齊,消費(fèi)者不得不花費(fèi)大量的時(shí)間在各大購(gòu)物網(wǎng)站瀏覽商品、比較價(jià)格、權(quán)衡性價(jià)比,因此,用戶很希望擁有這樣一套系統(tǒng)來(lái)幫助他們完成對(duì)商品的選購(gòu),在這套系統(tǒng)中包含了各大主流購(gòu)物網(wǎng)站中熱賣產(chǎn)品的信息,通過(guò)簡(jiǎn)單的搜索就能夠知道哪個(gè)網(wǎng)站售賣的商品最便宜、性價(jià)比最高。比價(jià)購(gòu)物平臺(tái)是一個(gè)很好的解決方案,對(duì)于該平臺(tái)來(lái)說(shuō),如何獲取如此龐大的商品數(shù)據(jù)和價(jià)格信息是一個(gè)至關(guān)重要的問(wèn)題,正是基于以上背景,本文提出針對(duì)其數(shù)據(jù)來(lái)源的解決方案——網(wǎng)絡(luò)爬蟲(chóng)的設(shè)計(jì)與實(shí)現(xiàn)。本文主要圍繞如何設(shè)計(jì)和實(shí)現(xiàn)網(wǎng)絡(luò)爬蟲(chóng)功能進(jìn)行研究,在Heritrix網(wǎng)絡(luò)爬蟲(chóng)的基礎(chǔ)上,對(duì)某些功能做擴(kuò)展和定制化開(kāi)發(fā),本文主要就以下幾個(gè)問(wèn)題作了深入討論： (1)確定種子鏈接：為網(wǎng)絡(luò)爬蟲(chóng)提供一個(gè)爬行入口; (2)網(wǎng)頁(yè)抓取的方法：將符合要求的網(wǎng)頁(yè)保存到本地文件夾； (3)分析和抽取網(wǎng)頁(yè)內(nèi)容：提取網(wǎng)頁(yè)中與商品屬性有關(guān)的信息； (4)結(jié)構(gòu)化與存儲(chǔ)數(shù)據(jù)：將商品屬性逐條提取出來(lái)并存儲(chǔ)到數(shù)據(jù)庫(kù)中； (5)展現(xiàn)商品數(shù)據(jù),用于比價(jià)。
[Abstract]:With the popularization and development of information technology, Internet has penetrated into every corner of people's life and work. Search engine has become the quickest tool for people to obtain information. Online shopping has become a way of life and more accepted by most people. But there are many kinds of goods on the net, the price is different and the good are not the same, consumers have to spend a lot of time browsing the goods in the major shopping websites, comparing the prices, weighing the performance-to-price ratio, so, Users are keen to have a system to help them complete their shopping choices, which contain information about popular products from major shopping sites. A simple search can tell which sites sell the cheapest and most cost-effective products. Price comparison shopping platform is a good solution, for this platform, how to obtain such huge commodity data and price information is a crucial problem, it is based on the above background, This paper presents a solution for its data source, the design and implementation of web crawler. This paper mainly focuses on how to design and realize the function of web crawler. On the basis of Heritrix crawler, some functions are extended and customized. In this paper, the following problems are discussed: (1) to determine the seed link: to provide a crawling portal for the web crawler; (II) method of web page crawling: save pages that meet the requirements to a local folder; (3) analyzing and extracting web content: extracting information related to commodity attributes in web pages; (4) structuring and storing data: extracting commodity attributes one by one and storing them in database; (5) display commodity data for price comparison.
【學(xué)位授予單位】：華東理工大學(xué)
【學(xué)位級(jí)別】：碩士
【學(xué)位授予年份】：2013
【分類號(hào)】：TP391.3

【引證文獻(xiàn)】

相關(guān)期刊論文前1條

1 董浩然;謝歡;陳鵬;洪中華;童小華;;基于GIS主題爬蟲(chóng)的在線房產(chǎn)估價(jià)系統(tǒng)與優(yōu)化[J];地理信息世界;2016年02期

，

本文編號(hào)：2396304

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會(huì)員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2396304.html

上一篇：國(guó)內(nèi)中文分詞技術(shù)研究新進(jìn)展
下一篇：搜索引擎的定量評(píng)價(jià)

論文發(fā)表

·知網(wǎng)|萬(wàn)方|維普|龍?jiān)磡省級(jí)|國(guó)家級(jí)|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

比價(jià)購(gòu)物平臺(tái)中網(wǎng)絡(luò)爬蟲(chóng)的設(shè)計(jì)與實(shí)現(xiàn)