天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁 > 文藝論文 > 語言學(xué)論文 >

漢語依存圖庫的構(gòu)建

發(fā)布時(shí)間:2018-02-11 02:26

  本文關(guān)鍵詞: 句法語義 依存語法 圖結(jié)構(gòu) 標(biāo)注 圖庫 出處:《南京師范大學(xué)》2015年碩士論文 論文類型:學(xué)位論文


【摘要】:汁算機(jī)自然語言處理需要從線性的句子中獲取詞語之間的語義關(guān)系,樹形的句法結(jié)構(gòu)可以推導(dǎo)出句子成分之間主要的語義關(guān)系,在自然語言處理中起著重要作用,但隨著近年來語料庫建設(shè)規(guī)模的不斷擴(kuò)大,研究者發(fā)現(xiàn)用投影樹無法完整地描寫句法結(jié)構(gòu),并且還發(fā)現(xiàn)有相當(dāng)數(shù)量的非投影樹結(jié)構(gòu)和圖結(jié)構(gòu)。同時(shí)由于漢語自身的特點(diǎn),長期以來,漢語句法分析精度較低,現(xiàn)有的句法分析技術(shù)不適合處理漢語中的一些特殊句式(連動(dòng)句、兼語句、動(dòng)詞拷貝、長句等),,亟需尋找新的技術(shù)手段解決非這一難題。一些研究者提出了AMR這種基于圖的句子語義表示方法,用來分析英語。本文則嘗試借鑒這一方法來探究基于依存語法對(duì)漢語進(jìn)行句法語義一體化標(biāo)注(簡稱依存圖標(biāo)注),講而構(gòu)建漢語依存圖庫。本文的主要內(nèi)容如下:第一步,梳理并分析了句法理論和句法結(jié)構(gòu)表示方法的發(fā)展過程,在這個(gè)過程中發(fā)現(xiàn)在句法分析和論元分析的過程中經(jīng)常出現(xiàn)了超出樹結(jié)構(gòu)的現(xiàn)象,這是引進(jìn)圖結(jié)構(gòu)的一個(gè)重要原因,然后,進(jìn)一步統(tǒng)計(jì)分析CoNLL2009評(píng)測(cè)的中文數(shù)據(jù),結(jié)果表明了根據(jù)樹結(jié)構(gòu)難以推導(dǎo)出所有的語義結(jié)構(gòu),這就需要探索漢語句子的基于圖的句法語義一體化標(biāo)注新方案;第二步,基于以上的理論準(zhǔn)備,通過實(shí)際標(biāo)注和反復(fù)的驗(yàn)證修改,逐步構(gòu)建出基于依存圖標(biāo)注的標(biāo)記集體系和具體的標(biāo)注規(guī)范,這也是本研究的創(chuàng)新之處:第三步是實(shí)際操作部分,使用第二步確定的標(biāo)記集和標(biāo)注規(guī)范對(duì)已有的CoNLL2009評(píng)測(cè)的中文數(shù)據(jù)中的一部分?jǐn)?shù)據(jù)進(jìn)行依存圖標(biāo)注,一共標(biāo)注了1230句,并記錄了標(biāo)注過程中遇到的一些問題;第四步則是對(duì)第三步的標(biāo)注結(jié)果進(jìn)行統(tǒng)計(jì)和分析,統(tǒng)計(jì)發(fā)現(xiàn)在標(biāo)注好的1230句的語料中形成圖結(jié)構(gòu)的句子有795句,占到語料的64.6%。這部分就主要分析了標(biāo)注中形成圖結(jié)構(gòu)的一些特殊的語言現(xiàn)象,例如,兼語句、連動(dòng)句、二價(jià)名詞等,對(duì)這些特殊殊子的樸理正是依存圖相對(duì)干依存樹的優(yōu)勢(shì)所在,也是構(gòu)建依存圖庫的關(guān)鍵所在。本文的創(chuàng)新之處在于,首先是提出用圖結(jié)構(gòu)來表示漢語句法語義分析結(jié)果;其次是提出一套新的漢語句法語義一體化標(biāo)注的標(biāo)記集合標(biāo)注規(guī)范,另外還將依存語法和框架語義學(xué)結(jié)合起來對(duì)漢語進(jìn)行分析。本文通過逐步的研究、分析發(fā)現(xiàn),漢語中存在一定數(shù)量的需要用圖結(jié)構(gòu)表示才能完全揭示其句法語義關(guān)系的句子,這類句子往往就是影響漢語句法分析精度的夫鍵;而標(biāo)注的實(shí)際操作過程和統(tǒng)計(jì)分析的結(jié)果也證明了,圖結(jié)構(gòu)相對(duì)于樹結(jié)構(gòu)在揭示句子句法語義關(guān)系方面有明顯的優(yōu)勢(shì)。
[Abstract]:Juicing machine natural language processing needs to obtain the semantic relationship between words from linear sentences. The tree syntax structure can deduce the main semantic relations among sentence components, and it plays an important role in natural language processing. However, with the expansion of corpus construction in recent years, researchers have found that projective trees can not describe syntactic structures completely, and that there are quite a number of non-projective tree structures and graph structures. For a long time, the accuracy of Chinese syntactic analysis has been low, and the existing syntactic analysis techniques are not suitable for dealing with some special sentence patterns in Chinese. It is urgent to find new technical means to solve this problem. Some researchers have proposed AMR, a graph-based semantic representation of sentences. This paper tries to use this method for reference to explore the syntactic and semantic integration tagging of Chinese based on dependency grammar. The main contents of this paper are as follows: first, This paper analyzes the development of syntactic theory and syntactic structure representation. In this process, it is found that in the process of syntactic analysis and argument analysis, there are phenomena beyond tree structure, which is an important reason for the introduction of graph structure. Then, further statistical analysis of the Chinese data assessed by CoNLL2009 shows that it is difficult to deduce all semantic structures according to tree structure, so we need to explore a new scheme of syntactic and semantic integration tagging based on graph in Chinese sentences. Based on the above theoretical preparation, through practical annotation and repeated verification and modification, a label set system and specific label specification based on dependency graph annotation are constructed step by step. This is also the innovation of this study: the third step is the practical operation part. The second step is used to determine the mark set and label specification to annotate some of the existing Chinese data evaluated by CoNLL2009. A total of 1230 sentences are annotated, and some problems encountered in the process of annotation are recorded. The 4th step is a statistical analysis of the result of the third step. The statistics show that there are 795 sentences in the tagged 1230 sentence corpus that form the graph structure. This part mainly analyzes some special linguistic phenomena that form the graph structure in the tagging, such as concurrent sentences, continuous sentences, bivalent nouns, etc. It is the advantage of dependency graph relative to dry dependency tree and the key to construct dependency graph library. The innovation of this paper lies in that, first of all, graph structure is proposed to represent the result of syntactic and semantic analysis in Chinese. Secondly, we propose a new set of tagging specifications for Chinese syntactic and semantic tagging. In addition, we combine dependency grammar and frame semantics to analyze Chinese. There are a certain number of sentences in Chinese that need to be represented by graph structure to fully reveal their syntactic and semantic relations. These sentences are often the keys that affect the accuracy of Chinese syntactic analysis. The actual operation process and statistical analysis also prove that graph structure has obvious advantages over tree structure in revealing syntactic and semantic relations of sentences.
【學(xué)位授予單位】:南京師范大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2015
【分類號(hào)】:H146

【參考文獻(xiàn)】

相關(guān)期刊論文 前1條

1 游汝杰;現(xiàn)代漢語兼語句的句法和語義特征[J];漢語學(xué)習(xí);2002年06期



本文編號(hào):1502001

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/wenyilunwen/yuyanxuelw/1502001.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶ffcc8***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com