東南亞若干首都城市街景圖像漢字檢測與時空分布分析
發(fā)布時間:2017-12-27 03:17
本文關(guān)鍵詞:東南亞若干首都城市街景圖像漢字檢測與時空分布分析 出處:《南京大學(xué)》2017年碩士論文 論文類型:學(xué)位論文
更多相關(guān)文章: 東南亞 漢字空間分布 街景圖像 文字檢測 一帶一路
【摘要】:"一帶一路"建設(shè)主要內(nèi)容是實現(xiàn)沿線各國的"五通",即政策溝通、設(shè)施連通、貿(mào)易暢通、資金融通、民心相通。系統(tǒng)、有效、定量地評價"五通"建設(shè)的基本現(xiàn)狀,為科學(xué)決策、區(qū)域合作等提供重要的信息參考和數(shù)據(jù)支撐。"五通"的基礎(chǔ)是"語言互通",文字是語言的重要組成部分,"一帶一路"沿線國家的漢字使用情況,能夠有效反映不同國家與我國最真實的交流情況,反映"互聯(lián)互通",特別是能夠直觀了解沿線國家與我國在民心相通、文化相通等方面的現(xiàn)狀。東南亞是"一路"重點(diǎn)區(qū)域,從空間的角度定量研究東南亞漢字空間分布,能為"一帶一路"沿線國家漢字空間分布研究提供應(yīng)用示范。傳統(tǒng)數(shù)據(jù)獲取手段,難以獲取大范圍、空間化的漢字空間分布信息。街景地圖能展示街道立面細(xì)節(jié),包括城市內(nèi)文字使用的情況,且具有有地理位置、覆蓋范圍廣、用戶可免費(fèi)獲取的特點(diǎn),為漢字空間信息獲取提供了數(shù)據(jù)支撐。自然圖像文字檢測相關(guān)算法已較為成熟,但從多語言自然圖像中檢測漢字的研究不足。受制于數(shù)據(jù)獲取的制約,漢字空間分布相關(guān)研究尚且空白,如何科學(xué)系統(tǒng)地對漢字空間分布分析及評價值得研究。針對漢字空間分布信息獲取困難且相關(guān)空間分布研究不足,建立基于街景圖像漢字空間分布信息獲取技術(shù)流程,并構(gòu)建一套科學(xué)的漢字時空分布分析及評價體系。研究主要內(nèi)容包括:(1)街景圖像漢字檢測。基于街景地圖,提出"數(shù)據(jù)獲取——文字檢測——漢字判別"的街景圖像漢字檢測算法流程。利用網(wǎng)絡(luò)數(shù)據(jù)獲取技術(shù),采集東南亞若干首都城市具有地理坐標(biāo)的街景圖像。依據(jù)街景圖像的特點(diǎn)及文字檢測的技術(shù)難點(diǎn),利用連接文本建議網(wǎng)絡(luò)檢測、改進(jìn)的最大穩(wěn)定極值區(qū)域、基于筆畫寬度這三種方法對街景圖像進(jìn)行文本行檢測并對比結(jié)果,根據(jù)準(zhǔn)確率和召回率指標(biāo)選擇滿足要求的算法檢測結(jié)果作為漢字判別的數(shù)據(jù)源。最后,通過分析漢字的特點(diǎn)及與其他文字的區(qū)別,提出基于字符分割、字符特征計算的漢字判別方法,獲取東南亞若干首都城市漢字空間分布點(diǎn)數(shù)據(jù)。(2)漢字空間分布特征分析;诮志皥D像中解譯出的漢字空間分布點(diǎn)數(shù)據(jù),通過數(shù)理統(tǒng)計分析東南亞若干首都城市漢字分布數(shù)量、密度、人均數(shù)量差異;通過空間分析手段探索東南亞若干首都城市內(nèi)漢字空間分布特征,包括分布主方向、空間聚集度、空間均衡度。對漢字分布與道路網(wǎng)絡(luò)中心進(jìn)行相關(guān)分析,研究不同城市內(nèi)漢字所處區(qū)位優(yōu)勢。引入中心地理論對各城市內(nèi)部漢字標(biāo)牌空間輻射范圍及能力進(jìn)行計算和評估。系統(tǒng)了解漢字在東南亞若干首都城市的空間分布狀況,并對東南亞若干首都城市內(nèi)漢字分布特征進(jìn)行橫向?qū)Ρ取?3)新加坡漢字分布時序變化分析。對新加坡2008年、2013年、2015年間漢字空間分布變化分析,統(tǒng)計新加坡中心城區(qū)與各區(qū)縣漢字分布數(shù)量、密度變化情況;利用空間分析技術(shù)從分布主方向及重心、集聚度、均衡度、區(qū)位優(yōu)勢度及空間輻射度等角度,研究其在兩個時間段、三個時間節(jié)點(diǎn)上漢字分布的時空變化特征,揭示漢字空間分布變化規(guī)律和區(qū)域差異。研究結(jié)果表明:(1)東南亞7個首都城市中,漢字分布數(shù)量最多、密度最高的吉隆坡,分布最少的是雅加達(dá)。7個城市漢字均呈現(xiàn)集聚分布特征,吉隆坡中心城區(qū)漢字分布集聚性最高,曼谷中心城區(qū)漢字分布集聚程度最低。金邊漢字分布空間均衡性最好,馬尼拉最差。7個城市漢字均主要分布在居民服務(wù)類道路,且與道路網(wǎng)絡(luò)中心性正相關(guān),區(qū)位優(yōu)勢度最強(qiáng)為金邊,最低的為雅加達(dá)。就漢字空間輻射能力而言,金邊中心城區(qū)漢字輻射能力近乎全覆蓋;雅加達(dá)中心城區(qū)漢字輻射覆蓋程度最低,漢字對金邊中心城區(qū)居民的影響力最強(qiáng),對雅加達(dá)影響力最弱。(2)2008到2015年間,新加坡漢字標(biāo)牌數(shù)量在逐年增加,主要分布在中環(huán)區(qū)和加冷區(qū),各區(qū)面密度均有所提高,核密度高值區(qū)向中環(huán)區(qū)移動,整體分布重心向西南方向移動。新加坡漢字標(biāo)牌分布空間聚集性基本不變,空間均衡度略微下降。新增漢字標(biāo)牌主要分布在居民類道路,整體空間輻射范圍大幅增加,城市中居民接觸漢字的機(jī)會增加,漢字在新加坡的影響力增強(qiáng)。本文對東南亞若干首都城市漢字空間分布進(jìn)行了多角度的分析和探討,取得了較好研究成果,但論文也存在一些不足。基于字符特征的漢字判別對于日語或字符特征不明顯的拼音形文本無法完全剔除,且未對文本內(nèi)容識別,如何提高漢字判別有效性和識別文字內(nèi)容有待進(jìn)一步研究。此外,本文僅從空間、路網(wǎng)的角度對漢字空間分布進(jìn)行分析,對其空間分布差異的內(nèi)在原因研究不足,在后續(xù)研究中,可以引入唐人街、商業(yè)中心、華對外投資、政策影響、當(dāng)?shù)厝A族人變遷等因素,研究東南亞漢字分布差異機(jī)制。
[Abstract]:The main contents of The Belt and Road construction is the realization of the countries along the "five links", that is the policy of communication, communication facilities, trade flow, capital circulation, the people connected. Systematic, effective and quantitative evaluation of the basic status of the "five links" construction, providing important information reference and data support for scientific decision-making and regional cooperation. "Five" is the "language exchange", the text is an important part of language, "Chinese characters use The Belt and Road along the country, can effectively reflect the different countries and China's real exchange, reflecting the interoperability, especially in present people connected, cultural similarities. The intuitive solution along the countries and china. Southeast Asia is "all the way" key areas, from the perspective of spatial quantitative study on the spatial distribution of Chinese characters for Southeast Asia, "The Belt and Road along the country Chinese characters on the spatial distribution of application demonstration. It is difficult to obtain the spatial distribution information of large and spatial Chinese characters by means of traditional data acquisition. Street map can show the details of street facade, including the use of characters in the city, and has the characteristics of location, wide coverage and free access to users. It provides data support for the acquisition of Chinese spatial information. The algorithm of natural image text detection is more mature, but the research of Chinese character detection from multi language natural images is not enough. Subject to the constraints of data acquisition, the research on the spatial distribution of Chinese characters is still blank. How to analyze and evaluate the spatial distribution of Chinese characters scientifically and systematically is worth studying. Aiming at the difficulty of obtaining the spatial distribution information of Chinese characters and the lack of related spatial distribution, a technological process for acquiring the spatial distribution information of Chinese characters based on streetscape images is established, and a set of scientific analysis and evaluation system for Chinese characters spatio-temporal distribution is constructed. The main contents of the research include: (1) the detection of Chinese characters in the street view image. Based on the street view map, a Chinese character detection algorithm of "data acquisition - text detection - Chinese character discrimination" is proposed. The network data acquisition technology is used to collect street view images of some capital cities in Southeast Asia with geographical coordinates. According to the technical characteristic and the difficulty in detecting text Street image, the connected text proposal network detection, improved maximally stable extremal region, based on the stroke width of the three methods and compare the results of text line detection scene image, according to the accuracy and recall rate index selection algorithm to meet requirements of test results as the data source Chinese characters identification. Finally, by analyzing the characteristics of Chinese characters and the difference from other characters, a Chinese character recognition method based on character segmentation and character feature calculation is proposed to get the data of Chinese character spatial distribution in some Southeast Asian capital cities. (2) analysis of the spatial distribution characteristics of Chinese characters. The distribution of data points out the image of street scene in space based on Chinese characters, through the statistical analysis of the capital city of Southeast Asia Chinese characters distribution and density, the per capita quantity difference; through spatial analysis means to explore the characteristics of spatial distribution of the capital city of Southeast Asia Chinese characters, including the distribution of the main direction, spatial aggregation and spatial equilibrium. The relationship between the distribution of Chinese characters and the center of road network is analyzed, and the advantages of Chinese characters in different cities are studied. The center theory is introduced to calculate and evaluate the space radiation range and ability of Chinese character signs in each city. The systematic understanding of the spatial distribution of Chinese characters in some capital cities in Southeast Asia and the horizontal comparison of the characteristics of Chinese characters in some capital cities in Southeast Asia. (3) the analysis of the temporal variation of the distribution of Singaporean characters. Analysis of changes in Singapore in 2008, 2013, 2015 years Chinese characters of spatial distribution, Statistics Singapore City Center and the county Chinese characters distribution and quantity density changes; using the technique of spatial analysis from the distribution of the main direction and focus, concentration, equilibrium degree, location advantage and space radiation angles, on the two time and the three time nodes distribution characteristics of temporal and spatial variation of Chinese characters, Chinese characters reveal the pattern of the changes in spatial distribution and regional differences. The results show that: (1) 7 in the capital city of Southeast Asia, the largest number of the highest density distribution Chinese characters, Kuala Lumpur, distribution is the least in Jakarta. The characters of Chinese characters in the 7 cities all have the characteristics of concentration distribution. The distribution of Chinese characters in the central city of Kuala Lumpur is the highest, and the distribution of Chinese characters in the central city of Bangkok is the lowest. The distribution of Chinese characters in Phnom Penh is the best in spatial distribution, and the worst in Manila. 7 city residents are mainly distributed in the Chinese characters of road service, and it is related to the road network center, location advantage is the strongest in Phnom Penh, Jakarta is the lowest. It Chinese characters of space radiation ability, Phnom Penh City Center radiation almost full coverage of Chinese characters; Chinese characters of Jakarta city center radiation coverage and the lowest level of residents in the center of the city of Phnom Penh Chinese characters had the strongest influence, the weakest influence on Jakarta. (2) from 2008 to 2015, the number of Chinese signs in Singapore increased year by year, mainly in the central area and the cold area. The density of each area increased. The high density area of the nuclear power moved to the central area, and the center of gravity moved southward. The spatial aggregation of the Chinese character signs in Singapore is basically unchanged, and the spatial equilibrium is slightly decreased. The newly added Chinese character signs are mainly distributed on residential roads. The radiation scope of the whole space has increased significantly, and the opportunities for Chinese residents to touch Chinese characters have increased. The influence of Chinese characters in Singapore has increased. This paper analyzes and explores the spatial distribution of Chinese characters in some capital cities of Southeast Asia, and achieves good results, but there are still some shortcomings. Chinese character recognition based on character features can not be completely eliminated for Japanese or character based Pinyin text, but not for text content recognition. How to improve the validity of Chinese character recognition and identify text content needs further research. In addition, this article only
【學(xué)位授予單位】:南京大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:P208;H12
【相似文獻(xiàn)】
相關(guān)期刊論文 前8條
1 譚學(xué)厚;;熱學(xué)教學(xué)軟件的漢字化[J];南京大學(xué)學(xué)報(自然科學(xué)版);1986年04期
2 潘以鋒;計算機(jī)在漢字自動注音中的應(yīng)用[J];上海師范大學(xué)學(xué)報(自然科學(xué)版);1996年04期
3 李行健;漢字的規(guī)范和改革[J];百科知識;1994年07期
4 康言午;;新世紀(jì)呼喚漢字的完整解決方案[J];科學(xué)新聞;2003年05期
5 ;“黃鐘信息”:開創(chuàng)漢字信息產(chǎn)業(yè)的新境界[J];華東科技;2004年Z1期
6 吳佑壽;漢字計算機(jī)自動識別研究的進(jìn)展[J];科學(xué)通報;1991年04期
7 張p樦,
本文編號:1340024
本文鏈接:http://sikaile.net/shoufeilunwen/benkebiyelunwen/1340024.html
最近更新
教材專著