面向互聯(lián)網(wǎng)中文地址的地理要素解析方法的研究

發(fā)布時間：2018-03-27 07:04

本文選題：地理要素解析　切入點：條件隨機場　出處：《武漢工程大學(xué)》2016年碩士論文

【摘要】：由于位置信息服務(wù)的推廣與普及,越來越多的企業(yè)將自己所擁有的地址數(shù)據(jù)與軟件功能相結(jié)合,生成能為人們提供便利的位置服務(wù)應(yīng)用,如:手機地圖App。此過程需要大量由自然語言所描述的中文地址映射到地理坐標(biāo)上,使得能在電子地圖上精準(zhǔn)的定位,從而為人們的信息檢索、查詢以及定位服務(wù)提供便利。然而,互聯(lián)網(wǎng)上獲取的中文地址信息存在不完備,非標(biāo)準(zhǔn)化等問題,即這些地址數(shù)據(jù)沒有按照地理要素等級進行組織。為了建立空間信息與非空間信息的精確映射,研究互聯(lián)網(wǎng)所獲取的中文地址的地理要素解析與標(biāo)準(zhǔn)化具有重要的應(yīng)用價值。以網(wǎng)絡(luò)爬蟲獲取的中文地址作為研究對象。首先,本文采用條件隨機場的算法,該算法主要運用四字詞位標(biāo)注,并建立條件隨機場的概率模型對地址中的地理要素進行解析。之后,本文采用一種基于多因子計算行政區(qū)劃可信度的算法,該方法的主要目的在于識別地址地理要素中的行政區(qū)劃部分,首先利用行政區(qū)劃詞典匹配出多個行政區(qū)劃集,并給不同行政區(qū)劃設(shè)定位置匹配因子,之后根據(jù)各個因子之間的相互關(guān)系,計算出不同行政區(qū)劃的可信度,從而選取最優(yōu)的行政區(qū)劃結(jié)果。最后,本文采用了一種基于條件隨機場的規(guī)則改進算法,即基于經(jīng)驗轉(zhuǎn)移規(guī)則的地址解析算法,該算法能有效識別出中文地址中的行政區(qū)劃與其他部分的地理要素,首先建立特征字庫,依據(jù)標(biāo)準(zhǔn)地址語料庫制定一個基于單字的經(jīng)驗轉(zhuǎn)移矩陣,提取出地址串中的特征字,形成一個隨機場,借助經(jīng)驗轉(zhuǎn)移概率矩陣發(fā)現(xiàn)適用于地址要素解析的規(guī)則化表達,從而對待處理地址字符串進行地理要素的解析,由于該方法的特征字庫包含的特征字有限,對于一些出現(xiàn)頻率不高的特征字來說,并不能很好的進行判斷。但對于含有特征字庫中特征字的中文地址,該算法能高效的識別出其中的地理要素。本文中采用的三種算法分別用不同的地址庫進行測試,并將最終結(jié)果進行橫向與縱向?qū)Ρ�。實驗結(jié)果表明,多因子等算法具有較好的效果,能有效的劃分出各個不同地理要素,為基于位置的應(yīng)用開發(fā)奠定了基礎(chǔ)。
[Abstract]:Because of the promotion and popularization of location information service, more and more enterprises combine their own address data with the function of software to create a convenient location service application for people. For example, mobile phone map App. this process requires a large number of Chinese addresses described in natural languages to be mapped to geographical coordinates to enable accurate location on electronic maps, thereby retrieving information for people. Query and location services are convenient. However, the Chinese address information obtained on the Internet is incomplete, non-standardized and so on. That is, these address data are not organized according to the level of geographical elements. In order to create an accurate mapping of spatial and non-spatial information, It is of great value to study the geographical element analysis and standardization of the Chinese addresses obtained by the Internet. The Chinese addresses obtained by the web crawlers are taken as the object of study. Firstly, the conditional random field algorithm is used in this paper. The algorithm mainly uses four words to annotate, and establishes the probabilistic model of conditional random field to analyze the geographical elements in the address. After that, this paper uses a multi-factor algorithm to calculate the credibility of the administrative division. The main purpose of this method is to identify the administrative division in the geographical elements of the address. Firstly, the administrative division dictionary is used to match several sets of administrative divisions, and the location matching factors are set for different administrative divisions. Then, according to the relationship between the factors, the credibility of different administrative divisions is calculated, and the optimal results are selected. Finally, an improved rule algorithm based on conditional random field is proposed in this paper. This algorithm can effectively identify the geographical elements of administrative divisions and other parts of Chinese addresses. According to the standard address corpus, an empirical transfer matrix based on a single word is made, and the characteristic words in the address string are extracted to form a random field, and the regularized expression suitable for the address element analysis is found by means of the empirical transition probability matrix. Therefore, the processing of address strings is analyzed by geographical elements. Because the feature words contained in the feature font of this method are limited, for some feature words that do not appear frequently, But for the Chinese address which contains the feature words in the feature font, the algorithm can identify the geographical elements efficiently. The three algorithms used in this paper are tested with different address base. Finally, the final results are compared horizontally and vertically. The experimental results show that the multi-factor algorithm has a good effect and can effectively divide different geographical elements, which lays the foundation for location-based application development.
【學(xué)位授予單位】：武漢工程大學(xué)
【學(xué)位級別】：碩士
【學(xué)位授予年份】：2016
【分類號】：TP391.1

【參考文獻】

相關(guān)期刊論文前10條

1 朱艷輝;劉t，

本文編號：1670498

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/1670498.html

上一篇：基于強連通分量的個性化的網(wǎng)頁排名高效算法
下一篇：太倉公交卡結(jié)算系統(tǒng)的設(shè)計與實現(xiàn)

論文發(fā)表

·知網(wǎng)|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

面向互聯(lián)網(wǎng)中文地址的地理要素解析方法的研究