中文語(yǔ)義角色標(biāo)注語(yǔ)料庫(kù)的構(gòu)建及統(tǒng)計(jì)分析
本文關(guān)鍵詞:中文語(yǔ)義角色標(biāo)注語(yǔ)料庫(kù)的構(gòu)建及統(tǒng)計(jì)分析 出處:《魯東大學(xué)》2017年碩士論文 論文類型:學(xué)位論文
更多相關(guān)文章: 語(yǔ)義角色 語(yǔ)料庫(kù) 格框架 句模 標(biāo)注規(guī)則
【摘要】:隨著信息科技的迅猛發(fā)展,自然語(yǔ)言處理對(duì)人類生活的影響越來(lái)越大。在自然語(yǔ)言處理中,如何讓計(jì)算機(jī)理解人類語(yǔ)言從而實(shí)現(xiàn)人機(jī)交互,是一個(gè)亟待解決的重要問(wèn)題。漢語(yǔ)的自動(dòng)分詞和詞性標(biāo)注雖運(yùn)用較低層面的語(yǔ)言知識(shí)和一定統(tǒng)計(jì)方法已經(jīng)取得較高的正確率,但對(duì)于一些歧義問(wèn)題還無(wú)法處理,需要留待句法和語(yǔ)義分析階段才能徹底解決。對(duì)于自然語(yǔ)言理解,句法分析只是其中的一種手段,語(yǔ)義分析則是其中的關(guān)鍵和難點(diǎn),沒(méi)有語(yǔ)義分析的支撐,自動(dòng)句法分析也將舉步維艱。在實(shí)現(xiàn)人工智能的過(guò)程中,語(yǔ)義分析表現(xiàn)出前所未有的重要性和迫切性,要使自然語(yǔ)言處理系統(tǒng)兼?zhèn)溆?jì)算機(jī)的速度和人類的智能,就不能不進(jìn)行一定深度的語(yǔ)義分析。本文在已有的句法樹(shù)庫(kù)的基礎(chǔ)上,構(gòu)建了一定規(guī)模的語(yǔ)義角色標(biāo)注語(yǔ)料庫(kù)。首先,依據(jù)HowNet格框架詞典和《現(xiàn)代漢語(yǔ)謂詞語(yǔ)義角色標(biāo)注語(yǔ)料庫(kù)規(guī)范》對(duì)該語(yǔ)料庫(kù)進(jìn)行了語(yǔ)義角色標(biāo)注(主要包括人工標(biāo)注和人工校對(duì)兩個(gè)環(huán)節(jié));其次,通過(guò)人工標(biāo)注,對(duì)本文標(biāo)注體系進(jìn)行了修改和完善,對(duì)語(yǔ)義角色標(biāo)注規(guī)則進(jìn)行了歸納并對(duì)該規(guī)則進(jìn)行了有效性檢測(cè);最后,對(duì)本文的研究?jī)?nèi)容及研究成果進(jìn)行了總結(jié)。本文共分為六個(gè)部分,各部分主要內(nèi)容介紹如下:第一部分,緒論。主要介紹本文研究的理論背景、研究現(xiàn)狀、研究方法以及研究意義。理論背景主要包括配價(jià)理論、論元理論、語(yǔ)義角色等。研究現(xiàn)狀主要是從語(yǔ)義角色的關(guān)系類型、語(yǔ)義角色語(yǔ)料庫(kù)的構(gòu)建及語(yǔ)義角色標(biāo)注方案幾個(gè)方面進(jìn)行闡述。在研究方法上,本文主要采用了語(yǔ)料庫(kù)的方法、人機(jī)互助的方法、基于規(guī)則與基于統(tǒng)計(jì)相結(jié)合的方法以及定性與定量相結(jié)合等方法。本文研究旨在對(duì)句法結(jié)構(gòu)不同、基本邏輯語(yǔ)義相同的句子給出一致標(biāo)注,建立具有一定規(guī)模的語(yǔ)義角色標(biāo)注語(yǔ)料庫(kù),從而對(duì)語(yǔ)義分析、自然語(yǔ)言理解做出一定貢獻(xiàn)。第一章,語(yǔ)義角色標(biāo)注語(yǔ)料庫(kù)。本章主要介紹語(yǔ)義角色標(biāo)注語(yǔ)料庫(kù)的語(yǔ)料來(lái)源及規(guī)模、前期句法庫(kù)的構(gòu)建、語(yǔ)義角色關(guān)系類型和HowNet格框架詞典、語(yǔ)義角色標(biāo)注平臺(tái)以及語(yǔ)義角色標(biāo)注方案等基礎(chǔ)性工作。本文語(yǔ)料庫(kù)的語(yǔ)料來(lái)源于《人民日?qǐng)?bào)》,共計(jì)4萬(wàn)句;語(yǔ)義角色標(biāo)注語(yǔ)料庫(kù)的構(gòu)建是在前期依存句法樹(shù)庫(kù)的基礎(chǔ)上進(jìn)行的,是對(duì)自然語(yǔ)言的進(jìn)一步處理,標(biāo)注平臺(tái)是在前期句法樹(shù)庫(kù)標(biāo)注平臺(tái)的基礎(chǔ)上改造而成,可以在句法標(biāo)注和語(yǔ)義角色標(biāo)注之間相互轉(zhuǎn)換;語(yǔ)義角色關(guān)系類型和標(biāo)注方案的依據(jù)是《現(xiàn)代漢語(yǔ)謂詞語(yǔ)義角色標(biāo)注語(yǔ)料庫(kù)規(guī)范》,但與該規(guī)不同的是本文采用hownet格框架詞典輔助標(biāo)注的方法,標(biāo)注的客觀性和準(zhǔn)確性有所保障。第二章,語(yǔ)義角色標(biāo)注過(guò)程中的常見(jiàn)問(wèn)題及處理方法。本章主要總結(jié)在人工標(biāo)注語(yǔ)義角色過(guò)程中存在的問(wèn)題,并針對(duì)這些問(wèn)題提出相應(yīng)的解決辦法。標(biāo)注問(wèn)題主要分三個(gè)方面:漏標(biāo)、多標(biāo)和錯(cuò)標(biāo),每個(gè)方面又分別從謂詞性成分的標(biāo)注問(wèn)題和謂詞論元的標(biāo)注問(wèn)題兩個(gè)方面分別進(jìn)行歸納和分析。最后根據(jù)存在的問(wèn)題提出了相應(yīng)的解決方法:正確掛靠同義詞、根據(jù)語(yǔ)境選擇動(dòng)詞義項(xiàng)等。第三章,格框架詞典中存在的問(wèn)題及解決對(duì)策。在對(duì)語(yǔ)料進(jìn)行人工標(biāo)注的基礎(chǔ)上,對(duì)格框架詞典中動(dòng)詞的義項(xiàng)及其格框架存在的問(wèn)題進(jìn)行歸納,分析問(wèn)題產(chǎn)生的原因,提出相應(yīng)的解決對(duì)策。格框架存在的問(wèn)題主要有動(dòng)詞語(yǔ)義類的格框架不正確、動(dòng)詞給定語(yǔ)義類不正確、動(dòng)詞給定語(yǔ)義類不全以及未登錄詞四個(gè)方面。其中,動(dòng)詞語(yǔ)義類的格框架不正確包括格框架語(yǔ)義角色不全、格框架必要角色設(shè)置錯(cuò)誤兩個(gè)方面;動(dòng)詞給定語(yǔ)義類不全包括動(dòng)詞的語(yǔ)義類歸納不全面、同一語(yǔ)義類的格框架對(duì)其中的所有義項(xiàng)并不完全適用兩個(gè)方面。對(duì)于格框架存在問(wèn)題的原因,主要從格框架詞典的設(shè)置、詞義的演變、同一語(yǔ)義類中動(dòng)詞義項(xiàng)之間的差異、新詞的產(chǎn)生等幾個(gè)方面分別作了詳細(xì)的闡述。最后,針對(duì)問(wèn)題提出的解決方法是采用句式變換的方法檢測(cè)格框架以及近義詞掛靠。對(duì)于格框架不正確的動(dòng)詞語(yǔ)義類及同一語(yǔ)義類的格框架不適用于其中的所有動(dòng)詞的情況,本文采用句式變換的方法對(duì)動(dòng)詞的格框架進(jìn)行驗(yàn)證,其他問(wèn)題則采用掛靠近義詞的方法進(jìn)行修正。第四章,句式與句模的對(duì)應(yīng)關(guān)系及語(yǔ)義角色標(biāo)注規(guī)則。根據(jù)語(yǔ)義角色人工標(biāo)注及校對(duì)的結(jié)果,以內(nèi)省的方式歸納出各種句式的典型句模。這些句式主要是主謂句,包括動(dòng)詞謂語(yǔ)句、名詞謂語(yǔ)句和形容詞謂語(yǔ)句。其中,動(dòng)詞謂語(yǔ)句包括一般動(dòng)詞謂語(yǔ)句、“把”字句、“被”字句、兼語(yǔ)句、連謂句、雙賓句、“比”字句等句式。其次,根據(jù)有無(wú)標(biāo)記,將句式的典型句模進(jìn)行規(guī)整,總結(jié)出一套語(yǔ)義角色標(biāo)注規(guī)則。最后,在測(cè)試集中檢測(cè)規(guī)則的有效性并總結(jié)規(guī)則覆蓋范圍之外的情況,提出解決策略。在有效性較好的前提下,將該規(guī)則應(yīng)用到后期語(yǔ)義角色標(biāo)注中,一方面可以發(fā)揮規(guī)則方法正確率高的優(yōu)點(diǎn),降低人工標(biāo)注的工作量,另一方面可利用這些規(guī)則自動(dòng)檢查出純?nèi)斯?biāo)注過(guò)程中的錯(cuò)誤,提高語(yǔ)義角色標(biāo)注的準(zhǔn)確率。最后部分,結(jié)語(yǔ)。概括本文的主要研究?jī)?nèi)容、研究成果;總結(jié)本文對(duì)中文信息處理以及漢語(yǔ)語(yǔ)法、語(yǔ)義研究的意義;最后,分析本文研究的不足之處并對(duì)下一步工作進(jìn)行規(guī)劃。
[Abstract]:With the rapid development of information technology, Natural Language Processing's impact on human life more and more. In Natural Language Processing, how to make the computer in order to achieve human-computer interaction to understand human language, is an important problem to be solved. Although the rate of correct use of language knowledge and some statistical methods for lower level have achieved higher Chinese automatic segmentation and part of speech tagging, but some questions are not ambiguous, need for syntactic and semantic analysis can be completely resolved. For natural language understanding, syntactic parsing is a kind of means of the semantic analysis is the key and difficult one, no semantic analysis support, automatic syntactic parsing will also be difficult. In the process of implementation of artificial intelligence, semantic analysis showed a hitherto unknown importance and urgency, to make the Natural Language Processing system with computer The speed and human intelligence, semantic analysis can not in certain depth. Based on the existing syntactic Treebank on the construction of a certain scale corpus semantic role labeling. First of all, based on the HowNet framework and the "modern Chinese Dictionary" semantic role labeling corpus specification of semantic role labeling of the corpus (including manual annotation and proofreading of the two links); secondly, through manual labeling, the annotation system was modified and improved, the semantic role labeling rules were summed up and the rules of the effectiveness of detection; finally, the research content and the research results of this paper are summarized in this paper. Is divided into six parts, the main contents of each part as follows: the first part is introduction. The research status and theoretical background, mainly introduces the research, research methods and research significance of the theoretical background. Including the valence theory, argument theory, semantic role. Research is mainly from the relationship between the types of semantic roles, project construction and several semantic roles of semantic roles of corpus annotation are expounded. In research methods, this paper mainly adopts a corpus based approach, method of man-machine interactive method based on rules, and based on statistics and the combination of qualitative and quantitative methods. The aims of this study are different on the syntactic structure, basic logic semantics the same sentence given consistent annotation, establish semantic role with a certain scale of corpus, semantic analysis of natural language understanding to make some contribution. In the first chapter, corpus based semantic role labeling. The origin and scale of corpus. This chapter mainly introduces the corpus of semantic role labeling, construction of sentence semantic role relation type library, and the HowNet framework of semantic dictionary. Role tagging platform and semantic role labeling scheme and other infrastructure work. The corpus comes from the "people's Daily", a total of 40 thousand sentences; semantic role labeling corpus construction is based on the dependency Treebank in the early on, for further processing of natural language, annotation platform is the basic platform in early syntactic annotation the Treebank transform into, can be changed between syntactic annotation and semantic role labeling; semantic role relation type and annotation scheme is based on the "modern Chinese corpus that specification of semantic role labeling, but unlike the gauge is HowNet this paper uses the method of case frame dictionary assisted annotation, objectivity and accuracy of annotation the security. In the second chapter, common problems and treatment methods in the process of semantic role labeling. This chapter mainly summarizes the existing in manual annotation semantic roles in the process of asking Questions, and puts forward the corresponding solutions for these problems. The annotation problem is mainly divided into three aspects: leakage standard, multi standard and wrong standard, each part respectively from two aspects of predicate predicate argument annotation and annotation are summarized and analyzed. Finally, according to the existing problems and corresponding solutions put forward: right anchored synonymous verbs according to the context selection. The third chapter is case frame dictionary, problems and countermeasures. Based on manual annotation of the corpus, the existing meaning and the lattice framework verbal case frame Dictionary of the problem are summarized, analyzes the causes of the problems, put forward the corresponding countermeasures. The main problems are the framework of lattice frame verb semantic class is not correct, the verb to attributive semantic class is not correct, not all verbs to semantic classes and unknown words in four aspects. In the framework of incorrect verb semantic classes including frame semantic roles is not complete, lattice framework necessary character set two aspects of error; verb attributive semantic class to not include the semantics of the verb class induction is not comprehensive, frame the same semantic class does not end on the whole for all the senses of the two aspects. The reason for the framework of existing problems, mainly from the case frame dictionary settings, the evolution of the meaning of the difference between the verbs, the same semantic class, several aspects of the emergence of new words are described in detail. Finally, the solving method and puts forward the method of using sentence transformation detection framework and Synonyms the lattice framework anchored. Verb semantic class lattice framework does not correct and the same semantic class does not apply to all the verbs, this paper uses the method of case frame of the verb sentence transformation is verified, The other problem is corrected by the method of anchored near synonyms. In the fourth chapter, correspondence between sentence and sentence model and semantic role annotation rules. According to the semantic role annotation and proofreading results within the province, summed up the way of the sentence types typical sentence model. These sentences are subject predicate sentences, including verb predicate, noun predicate statement and adjective predicate sentences. The verb predicate sentence including general verb predicate sentences, "Ba", "Bei", and statements, even that sentence, double object sentence, the sentence pattern of "Bi". Secondly, according to the marked, the sentence sentence model of typical structured, summed up a set of semantic role labeling rules. Finally, in the effectiveness of the test set of detection rules and summarize the rules out of the area, proposes the solution strategy. In the premise of effective, apply the rule to the late semantic role labeling, You can play the advantages of high rate of correct rules on the one hand, reduce the workload of manual annotation, on the other hand can use these rules to automatically check out the pure manual annotation errors, improve the accuracy of semantic role labeling. The last part, the conclusion. The main research contents, summarizes the research results of this paper Chinese information; and Chinese grammar, semantic meaning of research; finally, analysis of the inadequacies of the study and plan the next step.
【學(xué)位授予單位】:魯東大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:H146.3
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 何保榮;邱立坤;徐德寬;;基于規(guī)則的“把”字句語(yǔ)義角色標(biāo)注[J];中文信息學(xué)報(bào);2017年01期
2 鄭麗娟;邵艷秋;;基于語(yǔ)義依存圖庫(kù)的兼語(yǔ)句句模研究[J];中文信息學(xué)報(bào);2015年06期
3 邱立坤;金澎;王厚峰;;基于依存語(yǔ)法構(gòu)建多視圖漢語(yǔ)樹(shù)庫(kù)[J];中文信息學(xué)報(bào);2015年03期
4 范曉,朱曉亞;論句模研究的方法[J];徐州師范大學(xué)學(xué)報(bào);1999年04期
5 徐昌火;試論句模研究的對(duì)象、起點(diǎn)和基本原則——句模研究系列之一[J];南京師大學(xué)報(bào)(社會(huì)科學(xué)版);1999年04期
6 徐烈炯,沈陽(yáng);題元理論與漢語(yǔ)配價(jià)問(wèn)題[J];當(dāng)代語(yǔ)言學(xué);1998年03期
7 周強(qiáng),張偉,俞士汶;漢語(yǔ)樹(shù)庫(kù)的構(gòu)建[J];中文信息學(xué)報(bào);1997年04期
8 吳為章;漢語(yǔ)動(dòng)詞配價(jià)研究述評(píng)[J];三明大學(xué)學(xué)報(bào)(綜合版);1996年S2期
9 周明,黃昌寧;面向語(yǔ)料庫(kù)標(biāo)注的漢語(yǔ)依存體系的探討[J];中文信息學(xué)報(bào);1994年03期
10 張普;信息處理用現(xiàn)代漢語(yǔ)語(yǔ)義分析的理論與方法[J];中文信息學(xué)報(bào);1991年03期
,本文編號(hào):1408756
本文鏈接:http://sikaile.net/shoufeilunwen/zaizhiboshi/1408756.html