信息處理用藏語諺語語料庫構(gòu)建研究
[Abstract]:Based on Tibetan proverbs and Gesar proverbs in Ando, Kangba and Weizang dialects, the corpus of Tibetan proverbs is established, and the corpus is automatically partitioned and artificially proofread. The principle of lexical segmentation of proverbs is established, and the corpus and thesaurus of Tibetan proverbs are constructed. Tibetan proverbs are subdivided into twelve types on the basis of relevant literature. In the process of collecting and sorting, proverbs have been added to 32 kinds of forms. This paper studies Tibetan proverbs from the following three aspects: number distribution of proverbs, frequency and frequency of vocabulary. Finally, according to Tibetan dialect Tibetan and Chinese contrast, according to alphabetical order and content classification three methods respectively sort and search. Its function is mainly applied in two aspects: the first is the construction of Tibetan proverbs corpus as a computer Tibetan information processing system to serve Tibetan information processing. Second, as the reference book of Tibetan language learning and the basic resource of Tibetan proverbs study, it is used by Tibetan language learners and researchers. The purpose of this paper is to do some basic work in the field of Tibetan information processing, such as syntactic classification and tagging, automatic word segmentation, syntactic research, phrase research, machine translation, search engine, electronic dictionary compilation and so on. It provides a new research method and means for Tibetan literature research in the future. The innovation lies in the following aspects: first, collecting and sorting out a large number of scattered Tibetan proverbs, up to now, most; second, classifying and tagging the computer information processing; third, establishing a bilingual comparative corpus of Tibetan proverbs; Fourth, the retrieval program of Tibetan proverbs is constructed, which provides convenient conditions for future study and study of bilingual teaching. The next step is to translate the Tibetan proverbs. In mixed sorting, the tagging of content, form, paragraph and syllable pause can appear at the same time when clicking the relevant entry, which is the task of further study and research. This paper holds that the construction of a high-quality Tibetan proverbs database can not only better grasp and utilize the treasure of Tibetan proverbs, but also provide indispensable language materials for the study of Tibetan language and literature. Thus also expand the Tibetan natural language processing related text database.
【學(xué)位授予單位】:西北民族大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2016
【分類號】:H214
【相似文獻(xiàn)】
相關(guān)期刊論文 前8條
1 梁社會;陳小荷;;先秦文獻(xiàn)《孟子》自動分詞方法研究[J];南京師范大學(xué)文學(xué)院學(xué)報(bào);2013年03期
2 徐艷華;;新詞語結(jié)構(gòu)分析在自動分詞中的作用[J];煙臺職業(yè)學(xué)院學(xué)報(bào);2007年04期
3 程節(jié)華;自動分詞中的歧義字段分析及處理[J];安徽農(nóng)業(yè)技術(shù)師范學(xué)院學(xué)報(bào);2000年03期
4 李迅;;自動分詞與分詞規(guī)范——關(guān)于《信息處理現(xiàn)代漢語分詞規(guī)范》的重新思考[J];山東文學(xué);2010年01期
5 殷峰,何克抗;語句級拼音┐漢字轉(zhuǎn)換系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)[J];計(jì)算機(jī)研究與發(fā)展;1997年05期
6 葛偉;;從計(jì)算機(jī)自動分詞的障礙談漢語書面語改革[J];語文學(xué)刊;2008年02期
7 祁坤鈺;;信息處理用藏文自動分詞研究[J];西北民族大學(xué)學(xué)報(bào)(哲學(xué)社會科學(xué)版);2006年04期
8 ;[J];;年期
相關(guān)會議論文 前7條
1 黃昌寧;高劍峰;李沐;;對自動分詞的反思[A];語言計(jì)算與基于內(nèi)容的文本處理——全國第七屆計(jì)算語言學(xué)聯(lián)合學(xué)術(shù)會議論文集[C];2003年
2 鄭澤之;;中文自動分詞的一些問題[A];內(nèi)容計(jì)算的研究與應(yīng)用前沿——第九屆全國計(jì)算語言學(xué)學(xué)術(shù)會議論文集[C];2007年
3 徐潤華;陳小荷;;一種利用注疏的《左傳》分詞新方法[A];中國計(jì)算語言學(xué)研究前沿進(jìn)展(2009-2011)[C];2011年
4 黃昌寧;林娟;孫承杰;;何謂金本位[A];全國第八屆計(jì)算語言學(xué)聯(lián)合學(xué)術(shù)會議(JSCL-2005)論文集[C];2005年
5 陳曉;;中文文本自動分詞研究述要[A];第四屆全國語言文字應(yīng)用學(xué)術(shù)研討會論文集[C];2005年
6 劉懷t,
本文編號:2352964
本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2352964.html