中文民航安全信息關(guān)鍵詞提取關(guān)鍵技術(shù)研究
本文選題:民航安全信息 + 關(guān)鍵詞提取 ; 參考:《中國民航大學》2017年碩士論文
【摘要】:民航業(yè)的快速發(fā)展使得民航安全信息呈爆炸性增長,民航從業(yè)人員在安全信息挖掘與處理工作方面也因此捉襟見肘,人工處理民航安全信息效率較低,而且容易被個人的主觀認識干擾到,將關(guān)鍵詞自動提取等數(shù)據(jù)挖掘技術(shù)應(yīng)用于民航業(yè)乃當務(wù)之急。為了方便安全信息工作的順利展開,需要設(shè)計與開發(fā)一種能有效進行文本挖掘的民航安全信息處理系統(tǒng)。在文本預(yù)處理方面,采用更加先進的分詞技術(shù),另外建立專業(yè)民航詞典進行輔助分詞,使得分詞的準確率大大提高,降低了分詞過程對信息處理工作的干擾。在研究關(guān)鍵詞提取及相關(guān)技術(shù)的基礎(chǔ)上,針對民航安全信息這一特定對象進行了研究分析與對比,結(jié)合民航領(lǐng)域詞特征,提出了一種以樸素貝葉斯模型為基礎(chǔ)的關(guān)鍵詞提取模型,性能實驗中所提方法與傳統(tǒng)算法相比,準確率與民航詞匯識別率都有了顯著提升,關(guān)鍵詞提取數(shù)實驗中,提取數(shù)設(shè)置為5比提取數(shù)設(shè)置為3時的提取效果要好。隨后對關(guān)鍵詞提取技術(shù)在民航安全信息分類以及在主題相似性計算方面的應(yīng)用進行研究,實驗中以關(guān)鍵詞作為特征項能有效降低特征空間維度,簡化了特征計算的同時能夠保持提取性能;提出的改進權(quán)重計算方法與傳統(tǒng)算法相比分類性能得到了大幅度提升,針對民航安全信息的各個類別分類效果良好。最后,提出了改進的基于VSM模型的民航安全信息主題相似度計算方法,有效避免傳統(tǒng)方法特征項較多、計算復(fù)雜、特征信息冗余等缺陷,能夠高效快速的計算民航安全信息的主題相似度,為民航安全信息管理工作提供了新的思路和方法。
[Abstract]:The rapid development of civil aviation industry makes civil aviation safety information explosive growth, civil aviation practitioners in the safety information mining and processing work is also overstretched, manual processing of civil aviation safety information efficiency is low.Moreover, it is easy to be interfered by personal subjective knowledge. It is urgent to apply data mining technology such as keyword automatic extraction to civil aviation industry.In order to facilitate the smooth development of security information work, it is necessary to design and develop a civil aviation security information processing system which can effectively carry out text mining.In the aspect of text preprocessing, using more advanced word segmentation technology and establishing professional civil aviation dictionary to assist word segmentation, the accuracy of word segmentation is greatly improved, and the interference of word segmentation process to information processing is reduced.Based on the research of keyword extraction and related technology, this paper analyzes and compares the specific object of civil aviation security information, combining with the features of civil aviation word.A keyword extraction model based on naive Bayesian model is proposed. Compared with the traditional algorithm, the accuracy and the recognition rate of civil aviation vocabulary are improved significantly in the performance experiment.The extraction effect is better when the number of extraction is set to 5 than when the number of extraction is set to 3.Then, the application of keyword extraction technology in civil aviation security information classification and topic similarity calculation is studied. In the experiment, using keywords as feature items can effectively reduce the dimension of feature space.It simplifies the feature calculation and can maintain the extraction performance. Compared with the traditional algorithm, the proposed improved weight calculation method can greatly improve the classification performance, and the classification effect of each category of civil aviation security information is good.Finally, an improved method for calculating the similarity of civil aviation safety information based on VSM model is proposed, which effectively avoids the shortcomings of traditional methods, such as more feature items, complex computation, redundancy of feature information, and so on.It can efficiently and quickly calculate the subject similarity of civil aviation security information, and provide a new way of thinking and method for civil aviation security information management.
【學位授予單位】:中國民航大學
【學位級別】:碩士
【學位授予年份】:2017
【分類號】:V328;TP391.1
【參考文獻】
相關(guān)期刊論文 前10條
1 曹曉;;文本聚類研究綜述[J];情報探索;2016年01期
2 李建中;;前學科與后現(xiàn)代:關(guān)鍵詞研究的前世今生[J];長江學術(shù);2015年04期
3 章志華;陸海良;郁鋼;;基于TFIDF算法的關(guān)鍵詞提取方法[J];信息技術(shù)與信息化;2015年08期
4 李強;白建榮;李振林;張黎明;;基于Python的數(shù)據(jù)批處理技術(shù)探討及實現(xiàn)[J];地理空間信息;2015年02期
5 劉端陽;王良芳;;結(jié)合語義擴展度和詞匯鏈的關(guān)鍵詞提取算法[J];計算機科學;2013年12期
6 王立霞;淮曉永;;基于語義的中文文本關(guān)鍵詞提取算法[J];計算機工程;2012年01期
7 史亞杰;陳艷秋;;航空安全信息管理的問題與對策[J];中國安全生產(chǎn)科學技術(shù);2010年03期
8 王海鵑;韓立新;甄志龍;;基于索引項權(quán)重的文本特征選擇方法[J];計算機工程與設(shè)計;2010年05期
9 孫鐵利;劉延吉;;中文分詞技術(shù)的研究現(xiàn)狀與困難[J];信息技術(shù);2009年07期
10 劉群;;機器翻譯研究新進展[J];當代語言學;2009年02期
相關(guān)碩士學位論文 前2條
1 孟繁超;基于Python的嵌入式開發(fā)工具的設(shè)計與實現(xiàn)[D];鄭州大學;2014年
2 荊路;基于本體的文本相似度研究與實現(xiàn)[D];沈陽工業(yè)大學;2009年
,本文編號:1768248
本文鏈接:http://sikaile.net/shoufeilunwen/xixikjs/1768248.html