天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁 > 科技論文 > 軟件論文 >

基于新浪微博的冰雹實況信息提取方法研究

發(fā)布時間:2018-09-04 17:39
【摘要】:冰雹作為一種強破壞性天氣,給人們帶來巨大的創(chuàng)害,所以冰雹的研究關(guān)系重大。目前已有冰雹識別預(yù)測的研究,但預(yù)測結(jié)果的準(zhǔn)確與否需要實際冰雹發(fā)生事件來驗證。但是傳統(tǒng)的這實際的冰雹實況信息都是單純的依靠專門的氣象人員,而這種方法存在時間和地域的局限性。為更加方便快捷地搜集冰雹實況信息,我們將目光轉(zhuǎn)移到現(xiàn)代互聯(lián)網(wǎng)。其中,新浪微博時全國用戶使用量最大、活躍度最高的微博平臺。加之作為一種罕見極端天氣,人們傾向于在微博上發(fā)表冰雹天氣的相關(guān)信息,于是我們選擇從新浪微博搜集所需信息。目前有許多關(guān)于新浪微博數(shù)據(jù)采集的方法,總結(jié)來看這些方法有:基于第三方軟件或者第三方微博數(shù)據(jù)集的方法、基于新浪公開API的方法和網(wǎng)絡(luò)爬蟲抓取的方法。鑒于本課題需要用到新浪微博的高級搜索接口,而新浪又無該接口的公開獲取途徑,最后采用網(wǎng)絡(luò)爬蟲技術(shù)抓取設(shè)定搜索條件的頁面,進(jìn)而抓取含有“冰雹”這一關(guān)鍵字的微博數(shù)據(jù)。采集到的微博數(shù)據(jù)并非都是描述冰雹發(fā)生信息的數(shù)據(jù),根據(jù)觀察,一部分?jǐn)?shù)據(jù)是描述冰雹發(fā)生事件,一部分是天氣預(yù)報信息可能發(fā)生冰雹天氣,其他則是不含有冰雹發(fā)生事件的數(shù)據(jù),為從這些數(shù)據(jù)中獲得冰雹實際放生的數(shù)據(jù),為將實際含有冰雹實況的數(shù)據(jù)識別出來,本文采用文本分類技術(shù)。文本分類之前采用人工標(biāo)注的方法構(gòu)建了三類數(shù)據(jù)的樣本空間。其中文本分類的關(guān)鍵在于文本特征的提取,本文對目前文本特征主要的幾種方法進(jìn)行了說明并在其基礎(chǔ)上進(jìn)行調(diào)整,最后將各種方法綜合起來使用,通過實驗驗證了綜合使用的結(jié)果比使用單一方法更好。之后對傳統(tǒng)單純的詞語特征擴(kuò)展,將詞組也作為文本分類的特征。本文采用貝葉斯,K近鄰,和支持向量機(jī)三種分類器,給出了基于3分類器的組合分類方案。測試結(jié)果表明,本文方法能夠?qū)㈦[含在新浪微博中的降雹事件的89.5%提取出來,誤識信息低于13.4%。最后利用基于規(guī)則的模板匹配法對識別出包含冰雹事件的微博文本進(jìn)行基于句子級的冰雹發(fā)生時間、地點、大小信息的提取。
[Abstract]:Hail, as a kind of strong destructive weather, brings great harm to people, so the research of hail is of great importance. At present, hail recognition and prediction have been studied, but the accuracy of the prediction results need to be verified by actual hail events. But the traditional actual hail information is simply dependent on specialized meteorological personnel, and this method has the limitation of time and region. In order to collect hail information more conveniently and quickly, we turned our eyes to the modern Internet. Among them, Sina Weibo when the use of users in the country the largest, the highest level of activity Weibo platform. In addition, as a rare extreme weather, people tend to publish information on hail weather on Weibo, so we choose to collect the necessary information from Sina Weibo. At present, there are many methods about data collection of Sina Weibo. These methods are summarized as follows: methods based on third-party software or third-party Weibo datasets, methods based on Sina open API and methods of crawler crawling. In view of the fact that this subject needs to use Sina Weibo's advanced search interface, and Sina does not have the open access to this interface, finally, the web crawler technology is used to grab the pages that set the search conditions. And then grab the keyword containing "hail" Weibo data. Weibo's data collected are not all data describing hail occurrence information. According to observation, some of the data describe hailstorm events, and part are weather forecast information that may occur hail weather. In order to obtain the actual release data of hail from these data and to recognize the actual hail events, the text classification technique is used in this paper. The sample space of three kinds of data is constructed by manual annotation before text classification. The key of text classification is the extraction of text feature. This paper explains and adjusts the main methods of text feature. The experimental results show that the result of comprehensive use is better than that of single method. After that, the traditional simple word features are extended, and the phrase is also used as the feature of text classification. In this paper, three kinds of classifiers, Bayesian K-nearest neighbor and support vector machine, are used to give a combined classification scheme based on 3-classifier. The test results show that this method can extract 89.5% of hail events hidden in Sina Weibo, and the misinformation is less than 13.4. Finally, the rule based template matching method is used to extract the information of hail occurrence time, place and size based on sentence level for Weibo text which includes hail events.
【學(xué)位授予單位】:天津大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2016
【分類號】:TP393.092;TP391.1

【相似文獻(xiàn)】

相關(guān)期刊論文 前5條

1 喬蓉;韓通;;蘭州冰雹特征的統(tǒng)計分析[J];成都信息工程學(xué)院學(xué)報;2007年02期

2 李哲;周筠s,

本文編號:2222876


資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2222876.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶792e8***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com