計(jì)算語言學(xué)視角下的語料庫標(biāo)注探析
[Abstract]:The emergence of Corpus and the birth of Corpus Linguistics have epoch-making significance in linguistic research. Since the emergence of corpus, the capacity and function of corpus have been expanded rapidly, and the scope of research and application has also been expanded. In this process, corpus annotation plays an important role. Corpus tagging is an important part of corpus and has become a hot topic in corpus research. Corpus annotation, which can reveal the deep information of language and expand the function of corpus, is the precondition of corpus resources for computational language research. At present, there is no comprehensive discussion on corpus annotation. In the past, the research of corpus annotation focused on the construction of practical annotation system and isolated research on a certain annotation type. In the technical specifications scattered in large corpora, there is a lack of thinking and discussion on relevant theories. This paper discusses the concept and significance of corpus annotation from the perspective of computational linguistics. Principles? A series of problems such as structure annotation and semantic annotation are introduced, and a structure annotation model and semantic annotation model are put forward. The introduction summarizes the current research situation of corpus annotation in China and the content of the research. The research method is explained and the key points of the article are pointed out. In the second chapter, the concept of corpus annotation is summed up in relation to the features of corpus, and the significance of corpus annotation is expounded from two aspects. On the basis of explaining the principles of corpus tagging proposed by Leech, a corpus linguist, this paper adds four principles to the demand for new type of corpus: (1) designing a practical annotation system based on the main uses of the corpus; (2) pay attention to the compatibility among different levels of corpus annotation; 3) attach importance to the support of corpus-based annotation to related software; 4) design a corpus annotation that is easy to share. The third chapter introduces the annotation patterns of the old and new corpora, and explains a series of concepts around the TEI annotation pattern. A standard universal markup language closely related to TEI schema is introduced. Several kinds of annotation types are summarized. Chapter four analyzes the grammar tagging of corpus, mainly discusses the structure tagging in grammar tagging, and introduces two main kinds of structure tagging corpus: phrase structure tree database and dependent structure tree database. According to the features of Chinese grammatical structure, the simplest tagging model of syntactic structure is proposed. This model takes the direct component analysis method as the annotation theory, describes the grammatical structure of the sentence by a simple symbolic system, and realizes the structure tagging in the form similar to the part of speech tagging, which has certain reference value for the Chinese structural tagging. The fifth chapter takes semantic annotation as the main content, and puts forward a sentence meaning annotation model based on the previous researches. The model includes part of reference case grammar of sentence meaning annotation, which includes parts of speech tagging, structure tagging, sentence meaning annotation. The information capacity is large and easy to be realized in the machine, which provides a new model for Chinese sentence meaning tagging. Chapter 6 generalizes the features of Chinese corpus annotation from two aspects: grammar annotation and semantic annotation. The seventh chapter is the conclusion, reviewing the full text and pointing out that further improvement is needed in the future.
【學(xué)位授予單位】:華中科技大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2012
【分類號(hào)】:H087;H146;H136
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 陶明忠;馬玉蕾;;框架語義學(xué)——格語法的第三階段[J];當(dāng)代語言學(xué);2008年01期
2 王躍龍;姬東鴻;;漢語樹庫綜述[J];當(dāng)代語言學(xué);2009年01期
3 俞士汶,段慧明,朱學(xué)鋒,孫斌;北京大學(xué)現(xiàn)代漢語語料庫基本加工規(guī)范[J];中文信息學(xué)報(bào);2002年05期
4 周強(qiáng);漢語句法樹庫標(biāo)注體系[J];中文信息學(xué)報(bào);2004年04期
5 金澎;吳云芳;俞士汶;;詞義標(biāo)注語料庫建設(shè)綜述[J];中文信息學(xué)報(bào);2008年03期
6 丁偉偉;常寶寶;;基于語義組塊分析的漢語語義角色標(biāo)注[J];中文信息學(xué)報(bào);2009年05期
7 周明,黃昌寧;面向語料庫標(biāo)注的漢語依存體系的探討[J];中文信息學(xué)報(bào);1994年03期
8 劉海濤;趙懌怡;;基于樹庫的漢語依存句法分析[J];模式識(shí)別與人工智能;2009年01期
9 崔剛,盛永梅;語料庫中語料的標(biāo)注[J];清華大學(xué)學(xué)報(bào)(哲學(xué)社會(huì)科學(xué)版);2000年01期
10 袁毓林;;論元角色的層級(jí)關(guān)系和語義特征[J];世界漢語教學(xué);2002年03期
相關(guān)博士學(xué)位論文 前1條
1 李軍輝;中文句法語義分析及其聯(lián)合學(xué)習(xí)機(jī)制研究[D];蘇州大學(xué);2010年
本文編號(hào):2414425
本文鏈接:http://sikaile.net/wenyilunwen/hanyulw/2414425.html