基于音節(jié)切分的維吾爾人名漢字音譯研究與實(shí)現(xiàn)
發(fā)布時(shí)間:2018-05-17 05:05
本文選題:維吾爾語 + 音節(jié)切分 ; 參考:《新疆師范大學(xué)》2014年碩士論文
【摘要】:維吾爾人名漢字音譯是少數(shù)民族語言信息處理中需要解決的重要問題,并且在機(jī)器翻譯、信息檢索等應(yīng)用中很重要的作用。近年來,因?yàn)樾陆贁?shù)民族人名漢字音譯轉(zhuǎn)寫缺少統(tǒng)一標(biāo)準(zhǔn),維吾爾人名漢字音譯轉(zhuǎn)寫時(shí),在戶口上是一種寫法,在身份證上另一種寫法,在護(hù)照上更不一樣的用字寫法、匯款單等又是一種寫法。為此解決這些問題,本文主要對基于字形的DOM音譯框架及維吾爾語音節(jié)分解的相關(guān)問題進(jìn)行了較全面的分析,并在此基礎(chǔ)上針對維吾爾人名漢字音譯問題進(jìn)行研究,論文的主要內(nèi)容包括以下幾個方面: 1.本文首先介紹了基于字形的DOM音譯框架,探討了維吾爾人名漢字音譯在該音譯框架的可行性?芍,該音譯框架將源語言中的字直接匹配到目標(biāo)語言中的字的特點(diǎn),并且維吾爾人名漢字音譯,其實(shí)是維吾爾文字母或音節(jié)直接匹配到對應(yīng)漢字的過程,因而充分利用該音譯框架實(shí)現(xiàn)了維吾爾文字母及音節(jié)到漢字的映射。 2.本文在研究維吾爾語音節(jié)切分相關(guān)的理論和關(guān)鍵技術(shù)的基礎(chǔ)上,總結(jié)了維吾爾語音節(jié)分解原理,并實(shí)現(xiàn)維吾爾語音節(jié)分解統(tǒng)計(jì)系統(tǒng),對5000人名進(jìn)行音節(jié)分解的統(tǒng)計(jì),給出了維吾爾人名中常用音節(jié)分布情況,并提出了20個常用的構(gòu)成維吾爾人名的音節(jié)。 3.在基于字形的框架下,設(shè)計(jì)出音節(jié)分切的維吾爾人名漢字音譯的基本思想和總體框架,并在分析維吾爾人名漢字對音表結(jié)構(gòu)的基礎(chǔ)上,提出了維吾爾人名的字母或音節(jié)對漢字映射的最快、最有效的方法,基于矩陣的維吾爾人名對漢字映射的方法。實(shí)現(xiàn)了基于音節(jié)切分的維吾爾人名漢字音譯系統(tǒng),并對系統(tǒng)進(jìn)行測試,使用5000個隨機(jī)人名進(jìn)行音譯實(shí)驗(yàn),得到了僅52%的準(zhǔn)確率。 4.本文為提高音譯準(zhǔn)確率,通過對大量維吾爾人名進(jìn)行調(diào)研,找出106構(gòu)成維吾爾人名詞綴,并構(gòu)建基于人名詞綴的補(bǔ)充規(guī)則,因而能夠區(qū)分維吾爾人名性別。將規(guī)則用在維吾爾人名漢字音譯系統(tǒng),進(jìn)行二次測試,音譯準(zhǔn)確率提高了30%,,最終達(dá)到了86%音譯準(zhǔn)確率,從而顯示了本文提出的方法和規(guī)則的可行性,有效性。
[Abstract]:The transliteration of Uygur names is an important problem in the information processing of minority languages and plays an important role in the applications of machine translation and information retrieval. In recent years, because of the lack of a unified standard for transliteration and writing of ethnic minority names in Xinjiang, the transliteration of Uygur names is one form of writing on the hukou, another on the identity card, and a more different way of writing in the passport. Money order and so on is another way of writing. In order to solve these problems, this paper makes a comprehensive analysis of the DOM transliteration framework based on glyph and the syllable decomposition of Uygur language, and on this basis studies the transliteration of Uygur names. The main contents of the thesis include the following aspects: 1. This paper first introduces the DOM transliteration framework based on glyph, and discusses the feasibility of Uygur character transliteration. It can be seen that the transliteration frame directly matches the characters in the source language to the characters in the target language, and the transliteration of Uygur names is actually the process of directly matching the Uygur letters or syllables to the corresponding Chinese characters. Therefore, the transliteration framework is used to realize the mapping of Uygur letters and syllables to Chinese characters. 2. On the basis of studying the theory and key technology of Uygur syllable segmentation, this paper summarizes the principle of Uygur syllable decomposition, and realizes the Uygur syllable decomposition statistical system. The distribution of common syllable in Uygur names is given, and 20 syllables that constitute Uyghur names are put forward. 3. Based on the framework of glyph, this paper designs the basic idea and the overall frame of the transliteration of Uygur names, and analyzes the structure of the phonetic table of Uygur names. This paper puts forward the fastest and most effective method of mapping Uygur names to Chinese characters with letters or syllables, and the method of mapping Uygur names to Chinese characters based on matrix. A Chinese character transliteration system based on syllable segmentation is implemented, and the system is tested, and only 52% of the accuracy is obtained by using 5000 random names in the transliteration experiment. 4. In order to improve the accuracy of transliteration, this paper investigates a large number of Uygur names, finds out 106 Uygur affixes, and constructs supplementary rules based on suffixes, so as to distinguish the gender of Uighur names. The rules are used in the Uygur system of transliteration of Chinese characters. The transliteration accuracy rate is increased by 30%, and the accuracy of transliteration is 86%, which shows the feasibility and effectiveness of the method and rule proposed in this paper.
【學(xué)位授予單位】:新疆師范大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2014
【分類號】:H215
【參考文獻(xiàn)】
相關(guān)期刊論文 前2條
1 申文明;劉連芳;黃家裕;溫家凱;;基于概率模型的漢語和越南語的人名音譯方法[J];廣西科學(xué)院學(xué)報(bào);2010年04期
2 艾山·吾買爾;吐爾根·伊布拉音;;英文維文人名機(jī)器翻譯算法的研究與實(shí)現(xiàn)[J];新疆大學(xué)學(xué)報(bào)(自然科學(xué)版);2007年01期
本文編號:1900024
本文鏈接:http://sikaile.net/wenyilunwen/hanyulw/1900024.html
最近更新
教材專著