面向web的文本地理信息挖掘技術(shù)研究

發(fā)布時(shí)間：2018-08-16 18:28

【摘要】：地理信息在民用、商用、國防等方面都有著重要的應(yīng)用,而地理信息的獲取卻受到多方面限制。目前,互聯(lián)網(wǎng)中存在著大量的地理信息,通過網(wǎng)絡(luò)獲取地理信息,突破傳統(tǒng)地理信息獲取手段的限制,已經(jīng)成為地理信息獲取的一種重要手段。但網(wǎng)絡(luò)數(shù)據(jù)海量、數(shù)據(jù)類型繁雜,導(dǎo)致從網(wǎng)絡(luò)獲取地理信息十分困難。為解決這一問題,本文對地理信息的獲取及地理信息的分類展開了研究。本文提出一種結(jié)合地理信息本體庫的主題網(wǎng)絡(luò)爬蟲算法,通過構(gòu)建地理信息本體庫,對網(wǎng)頁內(nèi)容相關(guān)度進(jìn)行評估;同時(shí)結(jié)合網(wǎng)頁鏈接過濾、網(wǎng)頁鏈接權(quán)威度評估,對網(wǎng)頁進(jìn)行網(wǎng)絡(luò)地理信息的篩選。實(shí)驗(yàn)結(jié)果表明,本文提出的算法能夠有效地過濾與地理信息不相關(guān)網(wǎng)頁,并提高了地理信息網(wǎng)頁獲取的準(zhǔn)確度。本文針對地理信息分類提出了一種融合距離閾值的最近鄰分類算法,該算法依據(jù)類別的重心與待分類樣本的空間距離,通過對比設(shè)定的距離閾值對分類樣本進(jìn)行類別劃分。實(shí)驗(yàn)結(jié)果表明,本文提出的算法能夠有效地對地理信息進(jìn)行分類,分類準(zhǔn)確度較高。同時(shí)利用Apriori算法實(shí)現(xiàn)了對地理信息關(guān)聯(lián)規(guī)則的挖掘。最后,利用提出的主題網(wǎng)絡(luò)爬蟲算法、最近鄰分類算法,實(shí)現(xiàn)了面向web的文本地理信息挖掘系統(tǒng)。該系統(tǒng)將網(wǎng)頁文本與地理信息本體庫中的本體進(jìn)行對比,評估網(wǎng)頁相關(guān)度。篩選并獲取地理信息相關(guān)度高的網(wǎng)頁文本,進(jìn)行預(yù)處理并提取網(wǎng)頁文本特征,利用網(wǎng)頁文本特征集將網(wǎng)頁文本轉(zhuǎn)換為空間向量并進(jìn)行分類處理。通過對比基礎(chǔ)地理信息關(guān)鍵詞、提取文本摘要對所需地名地點(diǎn)進(jìn)行信息抽取。利用Apriori算法實(shí)現(xiàn)對地理信息的關(guān)聯(lián)規(guī)則提取。系統(tǒng)測試結(jié)果表明,本文設(shè)計(jì)的Web地理信息挖掘系統(tǒng),實(shí)現(xiàn)了 web文本獲取、web文本分類、文本信息抽取及地理信息關(guān)聯(lián)規(guī)則挖掘的功能。
[Abstract]:Geographic information has important applications in civil, commercial, national defense and so on. However, the acquisition of geographic information is restricted by many aspects. At present, there are a lot of geographic information in the Internet. Getting geographic information through the network, breaking through the limitations of traditional means of geographic information acquisition, has become an important means of geographic information acquisition. In order to solve this problem, this paper studies the acquisition of geographic information and the classification of geographic information. In this paper, a topic-based web crawler algorithm based on geographic information ontology database is proposed. By constructing geographic information ontology database, it is very difficult to obtain geographic information from the network. The experimental results show that the algorithm proposed in this paper can effectively filter web pages that are not related to geographical information and improve the accuracy of geographic information web pages. A nearest neighbor classification algorithm based on distance threshold is proposed, which classifies the classified samples according to the space distance between the center of gravity of the class and the sample to be classified. The experimental results show that the proposed algorithm can effectively classify the geographic information with high classification accuracy. Finally, a Web-oriented textual geographic information mining system is implemented by using the proposed topic web crawler algorithm and the nearest neighbor classification algorithm. The system compares the web text with the ontology in the geographic information ontology database, and evaluates the web page correlation. Web page text with high geographic information correlation is preprocessed and extracted. Web page text is transformed into space vector by Web page text feature set and classified. By comparing the basic geographic information keywords, text summary is extracted to extract the information of the place and place needed. Apriori algorithm is used to realize the location. The system test results show that the Web Geographic Information Mining System designed in this paper achieves the functions of Web text acquisition, Web text classification, text information extraction and geographic information association rules mining.
【學(xué)位授予單位】：哈爾濱工程大學(xué)
【學(xué)位級別】：碩士
【學(xué)位授予年份】：2016
【分類號】：TP391.1

【參考文獻(xiàn)】

相關(guān)期刊論文前10條

1 魏運(yùn)運(yùn);李曉林;徐秀竹;;基于多約束推理的互聯(lián)網(wǎng)地理位置信息挖掘算法研究[J];計(jì)算機(jī)與數(shù)字工程;2015年04期

2 張順;;互聯(lián)網(wǎng)地理信息系統(tǒng)發(fā)展簡史[J];電子世界;2014年18期

3 何力;譚霜;賈焰;韓偉紅;;基于無標(biāo)記Web數(shù)據(jù)的層次式文本分類[J];智能系統(tǒng)學(xué)報(bào);2014年03期

4 耿麗娟;李星毅;;用于大數(shù)據(jù)分類的KNN算法研究[J];計(jì)算機(jī)應(yīng)用研究;2014年05期

5 李東暉;廖曉蘭;范輔橋;黃九鳴;陳雪剛;;一種主題知識自增長的聚焦網(wǎng)絡(luò)爬蟲[J];計(jì)算機(jī)應(yīng)用與軟件;2014年05期

6 蘇小英;胡彥鵬;楊竣輝;李明;;一種新的用于文本分類的概率分類器設(shè)計(jì)[J];計(jì)算機(jī)技術(shù)與發(fā)展;2014年03期

7 何翼;陳文娟;蒲天銀;;基于網(wǎng)絡(luò)爬蟲原理的Web內(nèi)容挖掘技術(shù)分析[J];計(jì)算機(jī)時(shí)代;2013年07期

8 張素琪;梁志剛;胡利娟;董永峰;;改進(jìn)的多維關(guān)聯(lián)規(guī)則算法研究及應(yīng)用[J];計(jì)算機(jī)工程與科學(xué);2012年09期

9 侯陽;劉揚(yáng);孫瑜;;本體研究綜述[J];計(jì)算機(jī)工程;2011年S1期

10 吳國祥;;網(wǎng)絡(luò)挖掘研究綜述[J];電腦知識與技術(shù);2011年32期

相關(guān)博士學(xué)位論文前3條

1 杜萍;基于本體的中國行政區(qū)劃地名識別與抽取研究[D];蘭州大學(xué);2011年

2 李衛(wèi);領(lǐng)域知識的獲取[D];北京郵電大學(xué);2008年

3 李宏偉;基于Ontology的地理信息服務(wù)研究[D];解放軍信息工程大學(xué);2007年

相關(guān)碩士學(xué)位論文前10條

1 徐秀竹;互聯(lián)網(wǎng)地理文本信息挖掘[D];武漢工程大學(xué);2014年

2 曾小虎;基于主題的微博網(wǎng)頁爬蟲研究[D];武漢理工大學(xué);2014年

3 王偉;Web挖掘技術(shù)及其在互聯(lián)網(wǎng)中的應(yīng)用研究[D];山東大學(xué);2013年

4 王曉飛;基于主題特征的Web信息挖掘模型的研究與實(shí)現(xiàn)[D];北京郵電大學(xué);2013年

5 伯明超;基于序列模式的Web挖掘的研究[D];長春理工大學(xué);2012年

6 孫曉璇;基于決策樹分類算法的高職學(xué)生就業(yè)分析與預(yù)測[D];云南大學(xué);2012年

7 王明爽;社會網(wǎng)絡(luò)中的地理數(shù)據(jù)挖掘方法研究[D];哈爾濱工程大學(xué);2012年

8 郭文政;通用數(shù)據(jù)挖掘系統(tǒng)平臺的設(shè)計(jì)與實(shí)現(xiàn)[D];南京信息工程大學(xué);2011年

9 盧革超;基于本體的主題搜索引擎技術(shù)研究[D];吉林大學(xué);2011年

10 常少春;高效頻繁項(xiàng)集發(fā)現(xiàn)方法與Apriori的改進(jìn)[D];江蘇科技大學(xué);2011年

，

本文編號：2186815

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2186815.html

上一篇：融合空時(shí)感知特性的無參考視頻質(zhì)量評估算法
下一篇：晉商銀行客戶信息管理系統(tǒng)設(shè)計(jì)與實(shí)現(xiàn)

論文發(fā)表

·知網(wǎng)|萬方|維普|龍?jiān)磡省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

面向web的文本地理信息挖掘技術(shù)研究