天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁 > 科技論文 > 搜索引擎論文 >

比價(jià)購物平臺(tái)中網(wǎng)絡(luò)爬蟲的設(shè)計(jì)與實(shí)現(xiàn)

發(fā)布時(shí)間:2018-12-31 08:10
【摘要】:隨著信息技術(shù)的普及與發(fā)展, Internet已深入到人們生活與工作的各個(gè)角落,搜索引擎已成為人們獲取信息最快捷的工具,網(wǎng)上購物已成為一種生活方式,越來越被大多數(shù)人接受。但是網(wǎng)上商品種類繁多、價(jià)格高低不同和商家良莠不齊,消費(fèi)者不得不花費(fèi)大量的時(shí)間在各大購物網(wǎng)站瀏覽商品、比較價(jià)格、權(quán)衡性價(jià)比,因此,用戶很希望擁有這樣一套系統(tǒng)來幫助他們完成對商品的選購,在這套系統(tǒng)中包含了各大主流購物網(wǎng)站中熱賣產(chǎn)品的信息,通過簡單的搜索就能夠知道哪個(gè)網(wǎng)站售賣的商品最便宜、性價(jià)比最高。比價(jià)購物平臺(tái)是一個(gè)很好的解決方案,對于該平臺(tái)來說,如何獲取如此龐大的商品數(shù)據(jù)和價(jià)格信息是一個(gè)至關(guān)重要的問題,正是基于以上背景,本文提出針對其數(shù)據(jù)來源的解決方案——網(wǎng)絡(luò)爬蟲的設(shè)計(jì)與實(shí)現(xiàn)。 本文主要圍繞如何設(shè)計(jì)和實(shí)現(xiàn)網(wǎng)絡(luò)爬蟲功能進(jìn)行研究,在Heritrix網(wǎng)絡(luò)爬蟲的基礎(chǔ)上,對某些功能做擴(kuò)展和定制化開發(fā),本文主要就以下幾個(gè)問題作了深入討論: (1)確定種子鏈接:為網(wǎng)絡(luò)爬蟲提供一個(gè)爬行入口; (2)網(wǎng)頁抓取的方法:將符合要求的網(wǎng)頁保存到本地文件夾; (3)分析和抽取網(wǎng)頁內(nèi)容:提取網(wǎng)頁中與商品屬性有關(guān)的信息; (4)結(jié)構(gòu)化與存儲(chǔ)數(shù)據(jù):將商品屬性逐條提取出來并存儲(chǔ)到數(shù)據(jù)庫中; (5)展現(xiàn)商品數(shù)據(jù),用于比價(jià)。
[Abstract]:With the popularization and development of information technology, Internet has penetrated into every corner of people's life and work. Search engine has become the quickest tool for people to obtain information. Online shopping has become a way of life and more accepted by most people. But there are many kinds of goods on the net, the price is different and the good are not the same, consumers have to spend a lot of time browsing the goods in the major shopping websites, comparing the prices, weighing the performance-to-price ratio, so, Users are keen to have a system to help them complete their shopping choices, which contain information about popular products from major shopping sites. A simple search can tell which sites sell the cheapest and most cost-effective products. Price comparison shopping platform is a good solution, for this platform, how to obtain such huge commodity data and price information is a crucial problem, it is based on the above background, This paper presents a solution for its data source, the design and implementation of web crawler. This paper mainly focuses on how to design and realize the function of web crawler. On the basis of Heritrix crawler, some functions are extended and customized. In this paper, the following problems are discussed: (1) to determine the seed link: to provide a crawling portal for the web crawler; (II) method of web page crawling: save pages that meet the requirements to a local folder; (3) analyzing and extracting web content: extracting information related to commodity attributes in web pages; (4) structuring and storing data: extracting commodity attributes one by one and storing them in database; (5) display commodity data for price comparison.
【學(xué)位授予單位】:華東理工大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2013
【分類號】:TP391.3

【引證文獻(xiàn)】

相關(guān)期刊論文 前1條

1 董浩然;謝歡;陳鵬;洪中華;童小華;;基于GIS主題爬蟲的在線房產(chǎn)估價(jià)系統(tǒng)與優(yōu)化[J];地理信息世界;2016年02期

,

本文編號:2396304

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2396304.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶83986***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請E-mail郵箱bigeng88@qq.com
日韩和欧美的一区二区三区| 日本熟女中文字幕一区| 日韩欧美一区二区久久婷婷| 国产中文另类天堂二区| 欧美一区二区三区性视频| 欧美日本精品视频在线观看| 欧美野外在线刺激在线观看| 97人妻精品一区二区三区男同| 人妻少妇久久中文字幕久久| 人妻久久这里只有精品| 超薄丝袜足一区二区三区| 日本少妇三级三级三级| 日本精品中文字幕在线视频| 国产极品粉嫩尤物一区二区| 大香蕉久草网一区二区三区| 最新69国产精品视频| 国产一级精品色特级色国产| 亚洲欧美日韩在线看片| 91偷拍视频久久精品| 激情国产白嫩美女在线观看| 久久黄片免费播放大全| 毛片在线观看免费日韩| 欧美精品中文字幕亚洲| 日韩精品第一区二区三区| 极品熟女一区二区三区| 中文字幕无线码一区欧美| 能在线看的视频你懂的| 日韩成人免费性生活视频| 日本东京热加勒比一区二区| 区一区二区三中文字幕| 国产午夜免费在线视频| 日韩中文字幕视频在线高清版| 99久久精品一区二区国产| 在线观看免费无遮挡大尺度视频| 91免费一区二区三区| 国产欧美另类激情久久久| 91精品国产品国语在线不卡| 狠色婷婷久久一区二区三区| 人妻偷人精品一区二区三区不卡| 国产老女人性生活视频| 嫩呦国产一区二区三区av|