哈薩克語(yǔ)基本動(dòng)詞短語(yǔ)自動(dòng)識(shí)別研究
本文選題:哈薩克語(yǔ)基本動(dòng)詞短語(yǔ) 切入點(diǎn):短語(yǔ)分析 出處:《新疆大學(xué)》2013年碩士論文
【摘要】:哈薩克語(yǔ)自然語(yǔ)言信息處理技術(shù)在文字處理、詞法分析、文本校對(duì)等階段均取得了一定的成果,目前可以考慮句子的處理階段,即如何自動(dòng)分析短語(yǔ)結(jié)構(gòu)、短語(yǔ)定界、短語(yǔ)內(nèi)部句法關(guān)系、結(jié)構(gòu)成分之間的語(yǔ)義關(guān)系的不同等。面對(duì)如此豐富的網(wǎng)上信息,越來(lái)越多的人們需要對(duì)自然語(yǔ)言進(jìn)行深入分析,例如機(jī)器翻譯、搜索引擎、文本分類(lèi)、信息提取等方面。 本研究首先明確提出了哈薩克語(yǔ)基本動(dòng)詞短語(yǔ)的定義、性質(zhì)、分類(lèi)、結(jié)構(gòu),并且確立了哈薩克語(yǔ)基本動(dòng)詞短語(yǔ)的句法功能分類(lèi)框架,初步描述現(xiàn)代哈薩克語(yǔ)短語(yǔ)結(jié)構(gòu)所需要的句法體系和比較完整短語(yǔ)功能分類(lèi)體系。其二,對(duì)哈薩克語(yǔ)基本動(dòng)詞短語(yǔ)的結(jié)構(gòu)進(jìn)行了統(tǒng)計(jì)與分析;接著確定哈薩克語(yǔ)基本動(dòng)詞短語(yǔ)的定界確定規(guī)則,最終識(shí)別哈薩克語(yǔ)基本動(dòng)詞短語(yǔ)。基于規(guī)則的方法中存在一些沒(méi)有被考慮到的規(guī)則,而且基本動(dòng)詞短語(yǔ)跟其他短語(yǔ)之間存在一些歧義,所以得到的準(zhǔn)確率并不高。其三,使用基于最大熵的方法對(duì)哈薩克語(yǔ)基本動(dòng)詞短語(yǔ)識(shí)別,其利用哈薩克語(yǔ)的單詞、詞性、詞綴等上下文信息來(lái)設(shè)計(jì)最大熵模型的特征模板,通過(guò)GIS算法來(lái)對(duì)特征集合進(jìn)行參數(shù)估計(jì),最終輸出最優(yōu)的動(dòng)詞短語(yǔ)識(shí)別結(jié)果�;诮y(tǒng)計(jì)的方法在在封閉測(cè)試環(huán)境下可以得到較高的準(zhǔn)確率,,在開(kāi)放測(cè)試環(huán)境下卻無(wú)法得到很好的結(jié)果,這種方法要求的訓(xùn)練語(yǔ)料庫(kù)規(guī)模較大。其四,細(xì)致分析了哈薩克語(yǔ)基本動(dòng)詞短語(yǔ)結(jié)構(gòu)歧義類(lèi)型與消除策略分析,對(duì)哈薩克語(yǔ)基本動(dòng)詞短語(yǔ)結(jié)構(gòu)中無(wú)歧義格式使用規(guī)則的方法識(shí)別方法基礎(chǔ)上,針對(duì)一些典型的歧義格式使用統(tǒng)計(jì)的方法。 本系統(tǒng)對(duì)實(shí)驗(yàn)室現(xiàn)有的“新疆日?qǐng)?bào)”語(yǔ)料中抽取30天語(yǔ)料(規(guī)模為20MB)中進(jìn)行哈薩克語(yǔ)基本動(dòng)詞短語(yǔ)(KzBaseVP)識(shí)別。從實(shí)驗(yàn)結(jié)果可以得出:以上三種方法對(duì)哈薩克語(yǔ)基本動(dòng)詞短語(yǔ)識(shí)別是可行的,并且基本動(dòng)詞短語(yǔ)搭配規(guī)則和特征模板的選擇是正確,在封閉和開(kāi)發(fā)測(cè)試條件下可以得到令人滿(mǎn)意的效果。
[Abstract]:Kazakh natural language information processing technology has achieved some results in the stages of word processing, lexical analysis, text proofreading, etc. At present, we can consider the processing stage of sentences, that is, how to analyze phrase structure automatically, and how to delimit phrase. In the face of so much information on the Internet, more and more people need to conduct in-depth analysis of natural language, such as machine translation, search engine, text classification. Information extraction and other aspects. In this study, the definition, nature, classification and structure of basic verb phrases in Kazakh language are proposed, and the syntactic functional classification framework of basic verb phrases in Kazakh language is established. A preliminary description of the syntactic system and the relatively complete functional classification system of the phrase structure of the Kazakh language is given. Secondly, the structure of the basic verb phrase in the Kazakh language is statistically analyzed. Then the basic verb phrases of Kazakh language are determined and the basic verb phrases are identified. There are some rules that have not been considered in the rule-based approach. Moreover, there are some ambiguities between the basic verb phrases and other phrases, so the accuracy is not high. Thirdly, the basic verb phrases in Kazakh are recognized by the method based on maximum entropy, which uses the Kazakh words and parts of speech. The feature template of the maximum entropy model is designed with the context information such as affix, and the parameters of the feature set are estimated by GIS algorithm. The method based on statistics can get high accuracy in closed test environment, but it can not get good result in open test environment. The training corpus required by this method is large. Fourthly, the paper analyzes the types of structural ambiguity of basic verb phrases in Kazakh language and the strategies of eliminating them. On the basis of the method of recognizing the rules of the use of the unambiguous format in the basic verb phrase structure of Kazakh, the statistical method is used for some typical ambiguous forms. In this system, the basic verb phrase KzBaseVPP of Kazakh language is identified from 30 days' data (20 MBs) extracted from the existing data of Xinjiang Daily in our laboratory. From the experimental results, it can be concluded that the above three methods can be used to identify Kazakh bases. This verb phrase recognition is feasible, Moreover, the selection of basic verb phrase collocation rules and feature templates is correct, and satisfactory results can be obtained under closed and developed test conditions.
【學(xué)位授予單位】:新疆大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2013
【分類(lèi)號(hào)】:TP391.1
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 周雅倩,郭以昆,黃萱菁,吳立德;基于最大熵方法的中英文基本名詞短語(yǔ)識(shí)別[J];計(jì)算機(jī)研究與發(fā)展;2003年03期
2 劉艷;古麗拉.阿東別克;伊力亞爾;;哈薩克語(yǔ)詞性自動(dòng)標(biāo)注研究初探[J];計(jì)算機(jī)工程與應(yīng)用;2008年20期
3 白妙青;鄭家恒;;基于最大熵方法進(jìn)行動(dòng)詞搭配的自動(dòng)標(biāo)注[J];計(jì)算機(jī)工程與應(yīng)用;2009年03期
4 艾山·吾買(mǎi)爾;吐?tīng)柛ひ啦嚼?;基于最大熵的維吾爾語(yǔ)句子邊界識(shí)別模型[J];計(jì)算機(jī)工程;2010年06期
5 霍亞格;黃廣君;;基于最大熵的漢語(yǔ)短語(yǔ)結(jié)構(gòu)識(shí)別方法[J];計(jì)算機(jī)工程;2011年16期
6 李素建,劉群,楊志峰;基于最大熵模型的組塊分析[J];計(jì)算機(jī)學(xué)報(bào);2003年12期
7 趙軍,黃昌寧;漢語(yǔ)基本名詞短語(yǔ)結(jié)構(gòu)分析模型[J];計(jì)算機(jī)學(xué)報(bào);1999年02期
8 孫瑞娜;古麗拉·阿東別克;;基于規(guī)則的哈薩克語(yǔ)基本名詞短語(yǔ)識(shí)別研究[J];計(jì)算機(jī)應(yīng)用研究;2010年12期
9 玉素甫·艾白都拉,吾守爾·斯拉木;維語(yǔ)中心語(yǔ)驅(qū)動(dòng)文法句法分析器中的上下文相關(guān)處理[J];計(jì)算機(jī)應(yīng)用與軟件;1999年06期
10 干俊偉,黃德根;漢語(yǔ)介詞短語(yǔ)的自動(dòng)識(shí)別[J];中文信息學(xué)報(bào);2005年04期
相關(guān)博士學(xué)位論文 前1條
1 達(dá)胡白乙拉;蒙古語(yǔ)基本動(dòng)詞短語(yǔ)自動(dòng)識(shí)別研究[D];內(nèi)蒙古大學(xué);2005年
相關(guān)碩士學(xué)位論文 前1條
1 祖麗皮亞·買(mǎi)買(mǎi)提明;維吾爾語(yǔ)基本動(dòng)詞短語(yǔ)自動(dòng)識(shí)別研究[D];北京郵電大學(xué);2012年
本文編號(hào):1658524
本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/1658524.html