計(jì)算語言學(xué)視角下的語料庫標(biāo)注探析

發(fā)布時(shí)間：2019-01-24 11:08

【摘要】：語料庫的出現(xiàn)以及語料庫語言學(xué)的誕生，在語言學(xué)研究中具有劃時(shí)代的意義。語料庫出現(xiàn)后發(fā)展迅猛，容量不斷擴(kuò)大，功能不斷增強(qiáng)，研究和應(yīng)用的范圍也不斷擴(kuò)展。在這個(gè)過程中，語料庫標(biāo)注發(fā)揮了巨大作用。語料庫標(biāo)注是語料庫的重要組成部分，已成為語料庫研究的熱點(diǎn)。語料庫標(biāo)注能夠揭示語言深層信息，拓展語料庫的功能，是語料庫資源用于計(jì)算語言研究的前提條件。目前尚未有文獻(xiàn)全面論述語料庫標(biāo)注，以往對(duì)語料庫標(biāo)注的研究側(cè)重于構(gòu)建實(shí)用的標(biāo)注系統(tǒng)，孤立地研究某一種標(biāo)注類型，散見于大型語料庫的技術(shù)規(guī)范中缺乏對(duì)相關(guān)理論的思考和探討。文章從計(jì)算語言學(xué)的角度，論述語料庫標(biāo)注的概念、意義原則類型等一系列問題，側(cè)重介紹結(jié)構(gòu)標(biāo)注和語義標(biāo)注這兩種標(biāo)注類型，重點(diǎn)提出了一種結(jié)構(gòu)標(biāo)注模型和語義標(biāo)注模型。引言部分總結(jié)了目前國內(nèi)對(duì)語料庫標(biāo)注的研究現(xiàn)狀，對(duì)研究內(nèi)容研究方法做出說明，指明文章的重點(diǎn)。第二章聯(lián)系語料庫的特征歸納出語料庫標(biāo)注的概念，從兩方面闡述語料庫標(biāo)注的意義。在闡釋語料庫語言學(xué)家Leech提出的語料庫標(biāo)注原則基礎(chǔ)上，針對(duì)新型語料庫的標(biāo)注需求補(bǔ)充了四條標(biāo)注原則：①以語料庫的主要用途為導(dǎo)向設(shè)計(jì)實(shí)用的標(biāo)注系統(tǒng)；②注意不同層次語料庫標(biāo)注之間的的兼容性；③重視語料庫標(biāo)注對(duì)相關(guān)軟件的支持；④設(shè)計(jì)便于共享的語料庫標(biāo)注。第三章介紹新舊兩種語料庫的標(biāo)注模式，闡明一系列圍繞TEI標(biāo)注模式的概念。引入與TEI模式聯(lián)系緊密的標(biāo)準(zhǔn)通用置標(biāo)語言。對(duì)幾種標(biāo)注類型做出總結(jié)。第四章分析語料庫的語法標(biāo)注，重點(diǎn)論述語法標(biāo)注中的結(jié)構(gòu)標(biāo)注，介紹兩種主要的結(jié)構(gòu)標(biāo)注語料庫:短語結(jié)構(gòu)樹庫以及依存結(jié)構(gòu)樹庫，并針對(duì)漢語語法結(jié)構(gòu)特點(diǎn)提出句法結(jié)構(gòu)最簡標(biāo)注模型。該模型以直接成分分析法作為標(biāo)注理論，通過簡單的符號(hào)系統(tǒng)描寫句子的語法結(jié)構(gòu)，用類似詞性標(biāo)注的形式實(shí)現(xiàn)了結(jié)構(gòu)標(biāo)注，對(duì)漢語結(jié)構(gòu)標(biāo)注有一定的參考價(jià)值。第五章以語義標(biāo)注為主要內(nèi)容，在前人研究基礎(chǔ)之上，提出了一種句義標(biāo)注模型，該模型句義標(biāo)注部分參考格語法制訂標(biāo)注集，標(biāo)注種類包括詞性標(biāo)注，結(jié)構(gòu)標(biāo)注，，句義標(biāo)注，信息容量大且易于在機(jī)器中實(shí)現(xiàn)，為漢語句義標(biāo)注提供全新的可供參考的模型。第六章從語法標(biāo)注和語義標(biāo)注兩個(gè)方面概括歸納漢語語料庫標(biāo)注的特點(diǎn)。第七章為結(jié)語，回顧全文同時(shí)指出日后需要進(jìn)一步完善之處。
[Abstract]:The emergence of Corpus and the birth of Corpus Linguistics have epoch-making significance in linguistic research. Since the emergence of corpus, the capacity and function of corpus have been expanded rapidly, and the scope of research and application has also been expanded. In this process, corpus annotation plays an important role. Corpus tagging is an important part of corpus and has become a hot topic in corpus research. Corpus annotation, which can reveal the deep information of language and expand the function of corpus, is the precondition of corpus resources for computational language research. At present, there is no comprehensive discussion on corpus annotation. In the past, the research of corpus annotation focused on the construction of practical annotation system and isolated research on a certain annotation type. In the technical specifications scattered in large corpora, there is a lack of thinking and discussion on relevant theories. This paper discusses the concept and significance of corpus annotation from the perspective of computational linguistics. Principles? A series of problems such as structure annotation and semantic annotation are introduced, and a structure annotation model and semantic annotation model are put forward. The introduction summarizes the current research situation of corpus annotation in China and the content of the research. The research method is explained and the key points of the article are pointed out. In the second chapter, the concept of corpus annotation is summed up in relation to the features of corpus, and the significance of corpus annotation is expounded from two aspects. On the basis of explaining the principles of corpus tagging proposed by Leech, a corpus linguist, this paper adds four principles to the demand for new type of corpus: (1) designing a practical annotation system based on the main uses of the corpus; (2) pay attention to the compatibility among different levels of corpus annotation; 3) attach importance to the support of corpus-based annotation to related software; 4) design a corpus annotation that is easy to share. The third chapter introduces the annotation patterns of the old and new corpora, and explains a series of concepts around the TEI annotation pattern. A standard universal markup language closely related to TEI schema is introduced. Several kinds of annotation types are summarized. Chapter four analyzes the grammar tagging of corpus, mainly discusses the structure tagging in grammar tagging, and introduces two main kinds of structure tagging corpus: phrase structure tree database and dependent structure tree database. According to the features of Chinese grammatical structure, the simplest tagging model of syntactic structure is proposed. This model takes the direct component analysis method as the annotation theory, describes the grammatical structure of the sentence by a simple symbolic system, and realizes the structure tagging in the form similar to the part of speech tagging, which has certain reference value for the Chinese structural tagging. The fifth chapter takes semantic annotation as the main content, and puts forward a sentence meaning annotation model based on the previous researches. The model includes part of reference case grammar of sentence meaning annotation, which includes parts of speech tagging, structure tagging, sentence meaning annotation. The information capacity is large and easy to be realized in the machine, which provides a new model for Chinese sentence meaning tagging. Chapter 6 generalizes the features of Chinese corpus annotation from two aspects: grammar annotation and semantic annotation. The seventh chapter is the conclusion, reviewing the full text and pointing out that further improvement is needed in the future.
【學(xué)位授予單位】：華中科技大學(xué)
【學(xué)位級(jí)別】：碩士
【學(xué)位授予年份】：2012
【分類號(hào)】：H087;H146;H136

【參考文獻(xiàn)】

相關(guān)期刊論文前10條

1 陶明忠;馬玉蕾;;框架語義學(xué)——格語法的第三階段[J];當(dāng)代語言學(xué);2008年01期

2 王躍龍;姬東鴻;;漢語樹庫綜述[J];當(dāng)代語言學(xué);2009年01期

3 俞士汶,段慧明,朱學(xué)鋒,孫斌;北京大學(xué)現(xiàn)代漢語語料庫基本加工規(guī)范[J];中文信息學(xué)報(bào);2002年05期

4 周強(qiáng);漢語句法樹庫標(biāo)注體系[J];中文信息學(xué)報(bào);2004年04期

5 金澎;吳云芳;俞士汶;;詞義標(biāo)注語料庫建設(shè)綜述[J];中文信息學(xué)報(bào);2008年03期

6 丁偉偉;常寶寶;;基于語義組塊分析的漢語語義角色標(biāo)注[J];中文信息學(xué)報(bào);2009年05期

7 周明，黃昌寧;面向語料庫標(biāo)注的漢語依存體系的探討[J];中文信息學(xué)報(bào);1994年03期

8 劉海濤;趙懌怡;;基于樹庫的漢語依存句法分析[J];模式識(shí)別與人工智能;2009年01期

9 崔剛,盛永梅;語料庫中語料的標(biāo)注[J];清華大學(xué)學(xué)報(bào)(哲學(xué)社會(huì)科學(xué)版);2000年01期

10 袁毓林;;論元角色的層級(jí)關(guān)系和語義特征[J];世界漢語教學(xué);2002年03期

相關(guān)博士學(xué)位論文前1條

1 李軍輝;中文句法語義分析及其聯(lián)合學(xué)習(xí)機(jī)制研究[D];蘇州大學(xué);2010年

本文編號(hào)：2414425

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會(huì)員下載

Download by Member

本文鏈接：http://sikaile.net/wenyilunwen/hanyulw/2414425.html

上一篇：時(shí)間順序原則與華文語序教學(xué)
下一篇：《牡丹亭》愛情描寫詞語研究

論文發(fā)表

·知網(wǎng)|萬方|維普|龍?jiān)磡省級(jí)|國家級(jí)|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

計(jì)算語言學(xué)視角下的語料庫標(biāo)注探析