互聯(lián)網(wǎng)網(wǎng)頁(yè)蘊(yùn)含高動(dòng)態(tài)交通信息的實(shí)時(shí)搜索與語(yǔ)義理解技術(shù)研究
發(fā)布時(shí)間:2018-02-26 01:32
本文關(guān)鍵詞: 實(shí)時(shí)搜索 交通信息 網(wǎng)絡(luò)爬蟲 自然語(yǔ)言理解 出處:《浙江工業(yè)大學(xué)》2014年碩士論文 論文類型:學(xué)位論文
【摘要】:隨著互聯(lián)網(wǎng)的飛速發(fā)展,人們迫切希望獲取一種方法,能夠根據(jù)自己的需求,高效、快速地從海量的互聯(lián)網(wǎng)網(wǎng)頁(yè)內(nèi)容中搜索出有價(jià)值的實(shí)時(shí)交通信息。然而,大量自然語(yǔ)言描述的交通信息受自然語(yǔ)言理解技術(shù)的限制,難以被現(xiàn)在計(jì)算機(jī)系統(tǒng)直接利用。 本文專門針對(duì)互聯(lián)網(wǎng)網(wǎng)頁(yè)蘊(yùn)含的動(dòng)態(tài)交通信息,對(duì)實(shí)時(shí)搜索與語(yǔ)義理解的方法進(jìn)行研究。首先采用網(wǎng)絡(luò)爬蟲技術(shù),實(shí)時(shí)地從互聯(lián)網(wǎng)網(wǎng)頁(yè)(包括微博)上抓取實(shí)時(shí)交通信息,并根據(jù)已構(gòu)建的詞典內(nèi)容對(duì)實(shí)時(shí)交通信息進(jìn)行分詞,然后把分詞結(jié)果按已建好的規(guī)則庫(kù)進(jìn)行匹配,實(shí)現(xiàn)語(yǔ)義理解,最后通過實(shí)驗(yàn)進(jìn)行實(shí)例驗(yàn)證。本文的主要工作和成果如下: 1.研究了互聯(lián)網(wǎng)網(wǎng)頁(yè)蘊(yùn)含動(dòng)態(tài)交通信息的實(shí)時(shí)搜索方法。針對(duì)互聯(lián)網(wǎng)網(wǎng)頁(yè)蘊(yùn)含動(dòng)態(tài)交通信息,從以下幾個(gè)交通信息來(lái)源:官方網(wǎng)站、論壇和微博,分別采用不同的抓取方法進(jìn)行交通信息的抓取,并保存到數(shù)據(jù)庫(kù)中,為后續(xù)自然語(yǔ)言理解提供數(shù)據(jù)基礎(chǔ)。 2.提出了面向交通信息的自然語(yǔ)言理解方法。針對(duì)交通信息的特征和語(yǔ)義理解的應(yīng)用需求,采用改進(jìn)的最大匹配分詞算法,同時(shí)將具有定性、模糊特征的交通信息形式化概括成一個(gè)規(guī)范的參考模版。并從語(yǔ)義層面將實(shí)時(shí)交通信息與已有的模版規(guī)則進(jìn)行匹配,從而解決了自然語(yǔ)言形式描述的交通信息難以被現(xiàn)有計(jì)算機(jī)系統(tǒng)直接理解和利用的問題。 3.分別對(duì)交通信息實(shí)時(shí)搜索與語(yǔ)義理解方法進(jìn)行實(shí)現(xiàn)。針對(duì)交通信息的不同來(lái)源,驗(yàn)證了本研究中所提出方法的正確性和有效性。 本文對(duì)面向交通信息的搜索與語(yǔ)義理解技術(shù)進(jìn)行了深入的理論研究,并通過實(shí)驗(yàn)驗(yàn)證了方法的正確性與效率,解決了自然語(yǔ)言描述的交通信息無(wú)法直接被現(xiàn)在計(jì)算機(jī)理解和利用的問題,具有一定的實(shí)際應(yīng)用價(jià)值,為動(dòng)態(tài)導(dǎo)航與位置服務(wù)提供重要的數(shù)據(jù)支撐。
[Abstract]:With the rapid development of Internet, people are eager to acquire a method, according to their needs, high efficiency, fast Internet search from massive web content in real-time traffic information. However, traffic information from natural language understanding technology limit the amount of natural language, is difficult to directly use the computer system now.
This paper specifically addresses the dynamic traffic information contained in web pages on the Internet, this paper studies a method of real-time search and semantic understanding. The web crawler technology, real time from the Internet (including micro-blog) captures the real-time traffic information, and according to the segmentation of the real-time traffic information is constructed in the dictionary, then the segmentation results are matched according to the rule base has been built, the realization of semantic understanding, finally through the experiment is verified. The main work and achievements are as follows:
1. to study the real-time search method of dynamic traffic information contains the Internet pages. According to the dynamic traffic information contained in the Internet ", from the following sources of traffic information: official website, forum and micro-blog, respectively, using different methods to capture traffic information capture, and save to the database, to provide data basis for the follow-up of natural language understanding.
2. this paper puts forward the theory of natural language understanding for traffic information. According to the application demand of traffic information features and semantic understanding, using the improved maximum matching algorithm, at the same time will have a qualitative, fuzzy feature of traffic information in a formal specification of the reference template. And from the semantic level of real-time traffic information and the a template rule for matching, so as to solve the traffic information described in the form of natural language is difficult to directly understand the existing computer system and use.
3., traffic information real-time search and semantic understanding methods are implemented respectively. Aiming at different sources of traffic information, the correctness and effectiveness of the proposed method is verified.
This paper makes a thorough theoretical study to search and semantic understanding technology of traffic information, and the accuracy and efficiency of the method was verified through experiments, to solve the traffic information described in natural language can not be directly absorbed by the computer to understand and use the problem now, has a certain practical value, provides important data for dynamic navigation and the location of the service.
【學(xué)位授予單位】:浙江工業(yè)大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2014
【分類號(hào)】:TP393.092
【參考文獻(xiàn)】
相關(guān)期刊論文 前4條
1 文庭孝;漢語(yǔ)自動(dòng)分詞研究進(jìn)展[J];圖書與情報(bào);2005年05期
2 周紅;自然語(yǔ)言理解中的語(yǔ)義分析問題[J];濱州師專學(xué)報(bào);2001年03期
3 張林曼;吳升;;地理編碼系統(tǒng)中地名地址分詞算法研究[J];測(cè)繪科學(xué);2010年02期
4 陳傳彬;陸鋒;勵(lì)惠國(guó);王欽敏;;城市路網(wǎng)信息融合的關(guān)鍵技術(shù)[J];地球信息科學(xué)學(xué)報(bào);2009年04期
,本文編號(hào):1536005
本文鏈接:http://sikaile.net/guanlilunwen/ydhl/1536005.html
最近更新
教材專著