基于馬爾科夫邏輯網(wǎng)的柬埔寨語(yǔ)復(fù)雜組織機(jī)構(gòu)名識(shí)別
發(fā)布時(shí)間:2017-12-27 19:14
本文關(guān)鍵詞:基于馬爾科夫邏輯網(wǎng)的柬埔寨語(yǔ)復(fù)雜組織機(jī)構(gòu)名識(shí)別 出處:《昆明理工大學(xué)》2017年碩士論文 論文類型:學(xué)位論文
更多相關(guān)文章: 柬埔寨語(yǔ) Tri-training 特征選擇 馬爾科夫邏輯網(wǎng) 一階邏輯
【摘要】:隨著我國(guó)與柬埔寨國(guó)家的交流合作日益頻繁,進(jìn)行柬埔寨的自然語(yǔ)言處理工作變得尤為重要。由于不同語(yǔ)言之間存在較大的差異,因此,其他語(yǔ)言的命名實(shí)體識(shí)別方法無(wú)法直接移植到柬埔寨語(yǔ)中。為了提高柬埔寨語(yǔ)組織機(jī)構(gòu)名識(shí)別的準(zhǔn)確率,本文圍繞柬埔寨語(yǔ)組織機(jī)構(gòu)名識(shí)別模型構(gòu)建,擴(kuò)充組織機(jī)構(gòu)名語(yǔ)料庫(kù)等關(guān)鍵問(wèn)題展開(kāi)研究,并取得了以下成果:(1)提出了一種基于Tri-training的柬埔寨語(yǔ)組織機(jī)構(gòu)名的識(shí)別方法。該方法首先利用改進(jìn)的Tri-training算法,將基于條件隨機(jī)場(chǎng)、支持向量機(jī)和最大熵模型三個(gè)不同的分類器組合成一個(gè)分類體系,然后利用少量的已標(biāo)注語(yǔ)料,依據(jù)最優(yōu)化樣本選擇策略對(duì)新加入樣本進(jìn)行選擇,結(jié)合柬埔寨語(yǔ)的語(yǔ)言特點(diǎn)進(jìn)行實(shí)驗(yàn)。結(jié)果表明該方法能夠通過(guò)利用少量的已標(biāo)注語(yǔ)料來(lái)實(shí)現(xiàn)對(duì)柬埔寨語(yǔ)組織機(jī)構(gòu)名的識(shí)別。(2)提出了一種基于馬爾科夫邏輯網(wǎng)的柬埔寨語(yǔ)復(fù)雜組織機(jī)構(gòu)名識(shí)別方法。該方法首先采用條件隨機(jī)場(chǎng)模型對(duì)簡(jiǎn)單的組織機(jī)構(gòu)名進(jìn)行識(shí)別,然后結(jié)合柬埔寨語(yǔ)的語(yǔ)言特點(diǎn),得到一階邏輯規(guī)則,將一階邏輯規(guī)則融入到馬爾科夫邏輯網(wǎng)中,并利用LazySAT推理算法來(lái)進(jìn)行復(fù)雜組織機(jī)構(gòu)名的識(shí)別。結(jié)果表明該方法能夠使柬埔寨語(yǔ)復(fù)雜組織機(jī)構(gòu)名達(dá)到更好的識(shí)別效果。(3)設(shè)計(jì)并實(shí)現(xiàn)了柬埔寨語(yǔ)組織機(jī)構(gòu)名識(shí)別原型系統(tǒng),為柬埔寨語(yǔ)命名實(shí)體識(shí)別的研究提供了有力支撐。
[Abstract]:With the increasingly frequent exchanges and cooperation between China and Kampuchea countries, Natural Language Processing work in Kampuchea is becoming more and more important. Because of the great difference between different languages, the method of naming entity recognition in other languages can not be directly transplanted into Kampuchea language. In order to improve the recognition accuracy of Kampuchea language organization names, this paper constructed around the Kampuchea language organization name recognition model, carried out research on key issues of extension organization name corpus, and has achieved the following results: (1) propose a recognition method based on the Kampuchea language organization name Tri-training. This method uses the improved Tri-training algorithm, the CRFs, support vector machine and maximum entropy model for three different classifiers are combined into a classification system based on corpus, and then use a small amount of sample selection, on the basis of the optimization strategy to select the newly added samples, combined with the linguistic features of Kampuchea language experiment. The results show that the method can realize the identification of the name of the Kampuchea language organization by using a small number of tagged corpus. (2) a method of identifying the name of Kampuchea language complex organization based on Markoff logic network is proposed. This method first uses conditional random field model of simple organization name recognition, and then combined with the linguistic features of Kampuchea language, get the first-order logic rules into first-order logic rules to Markov logic network, and to identify complex organizations using LazySAT inference algorithm. The results show that the method can make the name of Kampuchea language complex organization achieve a better recognition effect. (3) the prototype system of the name recognition of the Kampuchea language organization is designed and realized, which provides a strong support for the study of the name entity recognition of the Kampuchea language.
【學(xué)位授予單位】:昆明理工大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:TP391.1;H613
【相似文獻(xiàn)】
相關(guān)重要報(bào)紙文章 前2條
1 本報(bào)記者 楊玲;銀悅西:將民歌傳唱到東盟的使者[N];南寧日?qǐng)?bào);2008年
2 記者 李新雄 實(shí)習(xí)生 韋錦星 黃政合;推介東盟十國(guó) 學(xué)習(xí)東盟語(yǔ)言[N];廣西日?qǐng)?bào);2004年
相關(guān)碩士學(xué)位論文 前4條
1 王若蘭;基于馬爾科夫邏輯網(wǎng)的柬埔寨語(yǔ)復(fù)雜組織機(jī)構(gòu)名識(shí)別[D];昆明理工大學(xué);2017年
2 李小龍(TRY RATANAK);柬埔寨語(yǔ)新聞評(píng)論文本情感分類研究[D];昆明理工大學(xué);2017年
3 楊穎;柬埔寨語(yǔ)詞綴研究[D];云南民族大學(xué);2013年
4 潘華山;基于條件隨機(jī)場(chǎng)的柬埔寨語(yǔ)詞法分析方法研究[D];昆明理工大學(xué);2014年
,本文編號(hào):1342840
本文鏈接:http://sikaile.net/shoufeilunwen/zaizhiboshi/1342840.html
最近更新
教材專著