信息處理用藏語諺語語料庫構(gòu)建研究

發(fā)布時(shí)間：2018-11-24 08:22

【摘要】：本文首先以收集整理錄入的方法,以安多、康巴、衛(wèi)藏三大方言藏語諺語及《格薩爾諺語》為基礎(chǔ),建立了藏語諺語語料庫,并對語料進(jìn)行自動分詞和人工校對,確定諺語詞匯切分原則,從而構(gòu)建了藏語諺語語料庫及分詞庫。在內(nèi)容上將藏族諺語按相關(guān)文獻(xiàn)的基礎(chǔ)上再細(xì)分為十二種類型。在搜集整理的過程中,諺語在形式的劃分上新增加至三十二種。從諺語中詞條數(shù)量分布、詞匯的頻度和頻率三方面對《藏語諺語》進(jìn)行了研究。最后根據(jù)藏族三大地區(qū)方言藏漢對照、按字母順序排序和內(nèi)容分類三種方法分別進(jìn)行排序和檢索。其作用主要應(yīng)用于兩個(gè)方面：一是作為計(jì)算機(jī)藏文信息處理系統(tǒng)的藏語諺語語料庫構(gòu)建,為藏文信息處理服務(wù)。二是作為藏語文學(xué)習(xí)的工具書及藏語諺語詞語研究的基本資源,供藏語文學(xué)習(xí)者和研究者使用。本文的研究目的在為未來藏文信息處理領(lǐng)域中的句法分類標(biāo)注、自動分詞、句法研究、短語研究、機(jī)器翻譯、搜索引擎、電子詞典編纂等方面做基礎(chǔ)工作；為今后藏族文學(xué)研究提供了一種新的研究方法和手段。其創(chuàng)新點(diǎn)在于：一是搜集整理了大量零散的藏語諺語,到目前為止是整理最多的;二是針對計(jì)算機(jī)信息處理進(jìn)行了分類及標(biāo)注；三是建立了藏語諺語雙語對照語料庫；四是對藏語諺語構(gòu)建了檢索程序,為今后學(xué)習(xí)、研究雙語教學(xué)提供了便利條件。下一步的工作是把所搜集整理的藏語諺語詞條進(jìn)行翻譯；在混合排序中,把內(nèi)容、形式、段落和音節(jié)停頓的標(biāo)注在點(diǎn)擊相關(guān)詞條時(shí)可同時(shí)在該詞條中出現(xiàn),是進(jìn)一步學(xué)習(xí)和研究的任務(wù)。本文認(rèn)為構(gòu)建高質(zhì)量的藏語諺語庫不僅能夠更好的掌握和利用藏語諺語這塊瑰寶,為研究藏語言文學(xué)領(lǐng)域提供不可或缺的語言材料,從而也擴(kuò)充藏語自然語言處理相關(guān)文本資料庫。
[Abstract]:Based on Tibetan proverbs and Gesar proverbs in Ando, Kangba and Weizang dialects, the corpus of Tibetan proverbs is established, and the corpus is automatically partitioned and artificially proofread. The principle of lexical segmentation of proverbs is established, and the corpus and thesaurus of Tibetan proverbs are constructed. Tibetan proverbs are subdivided into twelve types on the basis of relevant literature. In the process of collecting and sorting, proverbs have been added to 32 kinds of forms. This paper studies Tibetan proverbs from the following three aspects: number distribution of proverbs, frequency and frequency of vocabulary. Finally, according to Tibetan dialect Tibetan and Chinese contrast, according to alphabetical order and content classification three methods respectively sort and search. Its function is mainly applied in two aspects: the first is the construction of Tibetan proverbs corpus as a computer Tibetan information processing system to serve Tibetan information processing. Second, as the reference book of Tibetan language learning and the basic resource of Tibetan proverbs study, it is used by Tibetan language learners and researchers. The purpose of this paper is to do some basic work in the field of Tibetan information processing, such as syntactic classification and tagging, automatic word segmentation, syntactic research, phrase research, machine translation, search engine, electronic dictionary compilation and so on. It provides a new research method and means for Tibetan literature research in the future. The innovation lies in the following aspects: first, collecting and sorting out a large number of scattered Tibetan proverbs, up to now, most; second, classifying and tagging the computer information processing; third, establishing a bilingual comparative corpus of Tibetan proverbs; Fourth, the retrieval program of Tibetan proverbs is constructed, which provides convenient conditions for future study and study of bilingual teaching. The next step is to translate the Tibetan proverbs. In mixed sorting, the tagging of content, form, paragraph and syllable pause can appear at the same time when clicking the relevant entry, which is the task of further study and research. This paper holds that the construction of a high-quality Tibetan proverbs database can not only better grasp and utilize the treasure of Tibetan proverbs, but also provide indispensable language materials for the study of Tibetan language and literature. Thus also expand the Tibetan natural language processing related text database.
【學(xué)位授予單位】：西北民族大學(xué)
【學(xué)位級別】：碩士
【學(xué)位授予年份】：2016
【分類號】：H214

【相似文獻(xiàn)】

相關(guān)期刊論文前8條

1 梁社會;陳小荷;;先秦文獻(xiàn)《孟子》自動分詞方法研究[J];南京師范大學(xué)文學(xué)院學(xué)報(bào);2013年03期

2 徐艷華;;新詞語結(jié)構(gòu)分析在自動分詞中的作用[J];煙臺職業(yè)學(xué)院學(xué)報(bào);2007年04期

3 程節(jié)華;自動分詞中的歧義字段分析及處理[J];安徽農(nóng)業(yè)技術(shù)師范學(xué)院學(xué)報(bào);2000年03期

4 李迅;;自動分詞與分詞規(guī)范——關(guān)于《信息處理現(xiàn)代漢語分詞規(guī)范》的重新思考[J];山東文學(xué);2010年01期

5 殷峰,何克抗;語句級拼音┐漢字轉(zhuǎn)換系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)[J];計(jì)算機(jī)研究與發(fā)展;1997年05期

6 葛偉;;從計(jì)算機(jī)自動分詞的障礙談漢語書面語改革[J];語文學(xué)刊;2008年02期

7 祁坤鈺;;信息處理用藏文自動分詞研究[J];西北民族大學(xué)學(xué)報(bào)(哲學(xué)社會科學(xué)版);2006年04期

8 ;[J];;年期

相關(guān)會議論文前7條

1 黃昌寧;高劍峰;李沐;;對自動分詞的反思[A];語言計(jì)算與基于內(nèi)容的文本處理——全國第七屆計(jì)算語言學(xué)聯(lián)合學(xué)術(shù)會議論文集[C];2003年

2 鄭澤之;;中文自動分詞的一些問題[A];內(nèi)容計(jì)算的研究與應(yīng)用前沿——第九屆全國計(jì)算語言學(xué)學(xué)術(shù)會議論文集[C];2007年

3 徐潤華;陳小荷;;一種利用注疏的《左傳》分詞新方法[A];中國計(jì)算語言學(xué)研究前沿進(jìn)展（2009-2011）[C];2011年

4 黃昌寧;林娟;孫承杰;;何謂金本位[A];全國第八屆計(jì)算語言學(xué)聯(lián)合學(xué)術(shù)會議（JSCL-2005）論文集[C];2005年

5 陳曉;;中文文本自動分詞研究述要[A];第四屆全國語言文字應(yīng)用學(xué)術(shù)研討會論文集[C];2005年

6 劉懷t，

本文編號：2352964

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2352964.html

上一篇：基于統(tǒng)計(jì)和特征相結(jié)合的查詢糾錯(cuò)方法研究
下一篇：基于Prefuse和層次聚類的信息檢索主題知識圖譜研究

論文發(fā)表

·知網(wǎng)|萬方|維普|龍?jiān)磡省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

信息處理用藏語諺語語料庫構(gòu)建研究