當(dāng)前位置：主頁(yè) > 碩博論文 > 基礎(chǔ)科學(xué)碩士論文 >

互聯(lián)網(wǎng)地理信息爬蟲(chóng)技術(shù)研究與應(yīng)用

發(fā)布時(shí)間：2018-03-21 20:35

本文選題：地理信息　切入點(diǎn)：爬蟲(chóng)技術(shù)　出處：《山東農(nóng)業(yè)大學(xué)》2017年碩士論文　論文類(lèi)型：學(xué)位論文

【摘要】：傳統(tǒng)地理信息數(shù)據(jù)采集通常是通過(guò)國(guó)家地理信息普查、實(shí)地勘察等方式獲取數(shù)據(jù)。然而,隨著社會(huì)的不斷發(fā)展,居民區(qū)、道路等因素的不斷變化,這種數(shù)據(jù)采集形式中數(shù)據(jù)成本高、工作量大、效率和時(shí)效性低等問(wèn)題日漸突出�；ヂ�(lián)網(wǎng)的不斷發(fā)展,互聯(lián)網(wǎng)上交織的地理數(shù)據(jù)與日俱增,這些數(shù)據(jù)中隱藏著豐富的知識(shí)。從互聯(lián)網(wǎng)中抓取相關(guān)的地理數(shù)據(jù)成為了地理信息來(lái)源的一個(gè)新渠道�；ヂ�(lián)網(wǎng)中蘊(yùn)含著大量的地理信息數(shù)據(jù),爬蟲(chóng)技術(shù)的誕生在一定程度上解決了Web數(shù)據(jù)獲取的難題,但一般的通用爬蟲(chóng)很難對(duì)互聯(lián)網(wǎng)中存在的地理信息進(jìn)行有效的爬取�；ヂ�(lián)網(wǎng)地理信息爬行技術(shù)在總結(jié)歸納通用爬蟲(chóng)技術(shù)的基礎(chǔ)上,不追求大的覆蓋,將目標(biāo)定為抓取與互聯(lián)網(wǎng)地理信息內(nèi)容相關(guān)的網(wǎng)絡(luò)數(shù)據(jù),使抓取工作更具針對(duì)性,通過(guò)互聯(lián)網(wǎng)地理信息爬蟲(chóng)技術(shù)解決地理信息采集工作中數(shù)據(jù)成本高、工作量大、效率和時(shí)效性低等問(wèn)題。本文的主要研究如下:(1)分析歸納互聯(lián)網(wǎng)地理信息承載網(wǎng)站特點(diǎn)。結(jié)合瀏覽器工作原理,通過(guò)分析互聯(lián)網(wǎng)地理信息承載網(wǎng)站的信息交互和展示方式,按照瀏覽器工作原理,從爬蟲(chóng)信息采集角度將淺層地理信息承載網(wǎng)站主要分為了三種類(lèi)型:M-Dom類(lèi)型、M-Render類(lèi)型、M-Trigger類(lèi)型;結(jié)合具體實(shí)驗(yàn),對(duì)深層網(wǎng)絡(luò)地理信息承載網(wǎng)站分析,重點(diǎn)研究了深網(wǎng)POI地理信息的承載網(wǎng)站的特點(diǎn)。(2)互聯(lián)網(wǎng)地理信息獲取技術(shù)研究。針對(duì)淺層網(wǎng)絡(luò)地理信息采集場(chǎng)景,重點(diǎn)研究了單頁(yè)面和列表頁(yè)面的抓取方法;針對(duì)深網(wǎng)POI地理信息采集場(chǎng)景,總結(jié)了采集難點(diǎn)、采集技術(shù),設(shè)計(jì)了兩套內(nèi)容檢索詞,研究了相關(guān)的抓取策略。(3)技術(shù)驗(yàn)證與原型系統(tǒng)開(kāi)發(fā)。在方法、技術(shù)、策略的研究的基礎(chǔ)上,設(shè)計(jì)了互聯(lián)網(wǎng)地理信息采集原型系統(tǒng),從系統(tǒng)的架構(gòu)、功能、模塊、核心邏輯等方面介紹了設(shè)計(jì)的細(xì)節(jié),實(shí)現(xiàn)了原型系統(tǒng)并進(jìn)行應(yīng)用驗(yàn)證。
[Abstract]:The data of traditional geographic information collection is usually through the national geographic information survey, field survey data acquisition. However, with the continuous development of society, residential areas, changing roads and other factors, the data in the data collection form of high cost, heavy workload, low efficiency and timeliness issues have become increasingly prominent. The development of the Internet the Internet, geographic data interleaving data hidden in these grow with each passing day, rich knowledge. From the Internet to retrieve the related geographic data has become a new channel for geographic information sources. The Internet contains geographic information data, the birth of crawler technology to solve the problem of Web data acquisition to a certain extent, but the general crawler general difficult to exist in the Internet geographic information crawling effectively. Internet geographic information crawling technology after summarizing the general crawler On the basis of technology, not the pursuit of large coverage, set the target network data capture and Internet geographic information related to the content of the work, grab more targeted, through the Internet geographic information crawler technology to solve data geographic information collection work in high cost, heavy workload, low efficiency and timeliness of the research. Are as follows: (1) analyze the Internet geographic information website bearing characteristics. Combined with the working principle of the browser, through the information interaction analysis of Internet geographic information bearing site and display way, according to the working principle of the browser, from the perspective of shallow crawler information acquisition of geographic information bearing site is divided into three types: M-Dom type, M-Render type. M-Trigger type; combining with experiments, the site analysis of bearing Deep Web Geographic information, the website focuses on the bearing characteristics of deep web POI geographic information. (2) acquisition of Internet geographic information. For shallow network geographic information collection scene, focusing on the capture method of single page and list pages; for deep web POI geographic information collection scene, collection difficulties, summarized acquisition technology, design two sets of content retrieval words, and studied the related crawling strategy. (3) verification technology and prototype system development. In the method, technology, strategy research based on the design of the Internet geographic information acquisition prototype system, from system architecture, function module, the design details of the core logic, a prototype system is implemented and validated in application.

【學(xué)位授予單位】：山東農(nóng)業(yè)大學(xué)
【學(xué)位級(jí)別】：碩士
【學(xué)位授予年份】：2017
【分類(lèi)號(hào)】：TP391.3;P208

【參考文獻(xiàn)】

相關(guān)期刊論文前10條

1 李曉情;;基于動(dòng)態(tài)web的網(wǎng)絡(luò)爬蟲(chóng)的研究[J];科技展望;2015年17期

2 鄭承良;梁勇;刁海亭;井月欣;劉洪巖;;云計(jì)算下的蔬菜安全追溯與預(yù)警系統(tǒng)[J];測(cè)繪科學(xué);2015年04期

3 單杰;秦昆;黃長(zhǎng)青;胡翔云;余洋;胡慶武;林志勇;陳江平;賈濤;;眾源地理數(shù)據(jù)處理與分析方法探討[J];武漢大學(xué)學(xué)報(bào)(信息科學(xué)版);2014年04期

4 閆超;朱景福;李雪;;基于Arachnode.net的全文搜索引擎搭建[J];黑龍江八一農(nóng)墾大學(xué)學(xué)報(bào);2014年01期

5 張春菊;張雪英;朱少楠;徐希濤;;基于網(wǎng)絡(luò)爬蟲(chóng)的地名數(shù)據(jù)庫(kù)維護(hù)方法[J];地球信息科學(xué)學(xué)報(bào);2011年04期

6 顧玲華;;基于搜索引擎發(fā)現(xiàn)技術(shù)的網(wǎng)頁(yè)存儲(chǔ)[J];蘇州大學(xué)學(xué)報(bào)(工科版);2011年02期

7 陳曉慧;陳榮國(guó);衛(wèi)文學(xué);;基于網(wǎng)絡(luò)爬蟲(chóng)的Web服務(wù)抓取解析器的設(shè)計(jì)與實(shí)現(xiàn)[J];地理信息世界;2010年03期

8 陳燁彬;黃琳;;基于Lucene.Net的知識(shí)檢索系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)[J];現(xiàn)代計(jì)算機(jī)(專(zhuān)業(yè)版);2008年11期

9 尹江;尹治本;黃洪;;網(wǎng)絡(luò)爬蟲(chóng)效率瓶頸的分析與解決方案[J];計(jì)算機(jī)應(yīng)用;2008年05期

10 劉偉;孟小峰;孟衛(wèi)一;;Deep Web數(shù)據(jù)集成研究綜述[J];計(jì)算機(jī)學(xué)報(bào);2007年09期

相關(guān)會(huì)議論文前1條

1 夏詔杰;郭力;李曉霞;;化學(xué)主題網(wǎng)絡(luò)爬蟲(chóng)的研究[A];第十屆全國(guó)計(jì)算(機(jī))化學(xué)學(xué)術(shù)會(huì)議論文摘要集[C];2009年

相關(guān)博士學(xué)位論文前1條

1 白玉琪;空間信息搜索引擎研究[D];中國(guó)科學(xué)院研究生院（遙感應(yīng)用研究所）;2003年

相關(guān)碩士學(xué)位論文前10條

1 彭鑫;基于文本情感分析的企業(yè)在線(xiàn)聲譽(yù)研究[D];北京交通大學(xué);2015年

2 潘磊寧;基于Lucene的商品垂直搜索引擎研究與實(shí)現(xiàn)[D];東華大學(xué);2015年

3 程海;一云多屏的旅游搜索比價(jià)系統(tǒng)的研究與實(shí)現(xiàn)[D];中國(guó)計(jì)量學(xué)院;2014年

4 王婷婷;基于位置與屬性的多源POI數(shù)據(jù)融合的研究[D];中國(guó)海洋大學(xué);2014年

5 宋鴻浩;面向金融領(lǐng)域的分布式垂直搜索引擎研究與實(shí)現(xiàn)[D];山東財(cái)經(jīng)大學(xué);2014年

6 陳歡;面向垂直搜索引擎的聚焦網(wǎng)絡(luò)爬蟲(chóng)關(guān)鍵技術(shù)研究與實(shí)現(xiàn)[D];華中師范大學(xué);2014年

7 徐興元;Web時(shí)空數(shù)據(jù)挖掘及其地圖信息服務(wù)[D];華東師范大學(xué);2013年

8 王品;基于AJAX技術(shù)的飲料企業(yè)B/S ERP系統(tǒng)的實(shí)現(xiàn)[D];中國(guó)海洋大學(xué);2011年

9 孫洪宇;基于形式概念分析的教育Web資源聚類(lèi)研究[D];吉林大學(xué);2011年

10 李明銘;基于網(wǎng)絡(luò)信息提取和網(wǎng)絡(luò)空間服務(wù)的二手房產(chǎn)價(jià)格指數(shù)編制研究[D];南京師范大學(xué);2011年

，

本文編號(hào)：1645440

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會(huì)員下載

Download by Member

本文鏈接：http://sikaile.net/shoufeilunwen/benkebiyelunwen/1645440.html

上一篇：幾類(lèi)流感模型的動(dòng)力學(xué)性質(zhì)研究
下一篇：工程臺(tái)風(fēng)風(fēng)場(chǎng)模型的建立及其參數(shù)敏感性試驗(yàn)研究

論文發(fā)表

·知網(wǎng)|萬(wàn)方|維普|龍?jiān)磡省級(jí)|國(guó)家級(jí)|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

互聯(lián)網(wǎng)地理信息爬蟲(chóng)技術(shù)研究與應(yīng)用