天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

中文語義角色標注語料庫的構建及統(tǒng)計分析

發(fā)布時間:2018-01-11 08:11

  本文關鍵詞:中文語義角色標注語料庫的構建及統(tǒng)計分析 出處:《魯東大學》2017年碩士論文 論文類型:學位論文


  更多相關文章: 語義角色 語料庫 格框架 句模 標注規(guī)則


【摘要】:隨著信息科技的迅猛發(fā)展,自然語言處理對人類生活的影響越來越大。在自然語言處理中,如何讓計算機理解人類語言從而實現(xiàn)人機交互,是一個亟待解決的重要問題。漢語的自動分詞和詞性標注雖運用較低層面的語言知識和一定統(tǒng)計方法已經(jīng)取得較高的正確率,但對于一些歧義問題還無法處理,需要留待句法和語義分析階段才能徹底解決。對于自然語言理解,句法分析只是其中的一種手段,語義分析則是其中的關鍵和難點,沒有語義分析的支撐,自動句法分析也將舉步維艱。在實現(xiàn)人工智能的過程中,語義分析表現(xiàn)出前所未有的重要性和迫切性,要使自然語言處理系統(tǒng)兼?zhèn)溆嬎銠C的速度和人類的智能,就不能不進行一定深度的語義分析。本文在已有的句法樹庫的基礎上,構建了一定規(guī)模的語義角色標注語料庫。首先,依據(jù)HowNet格框架詞典和《現(xiàn)代漢語謂詞語義角色標注語料庫規(guī)范》對該語料庫進行了語義角色標注(主要包括人工標注和人工校對兩個環(huán)節(jié));其次,通過人工標注,對本文標注體系進行了修改和完善,對語義角色標注規(guī)則進行了歸納并對該規(guī)則進行了有效性檢測;最后,對本文的研究內(nèi)容及研究成果進行了總結。本文共分為六個部分,各部分主要內(nèi)容介紹如下:第一部分,緒論。主要介紹本文研究的理論背景、研究現(xiàn)狀、研究方法以及研究意義。理論背景主要包括配價理論、論元理論、語義角色等。研究現(xiàn)狀主要是從語義角色的關系類型、語義角色語料庫的構建及語義角色標注方案幾個方面進行闡述。在研究方法上,本文主要采用了語料庫的方法、人機互助的方法、基于規(guī)則與基于統(tǒng)計相結合的方法以及定性與定量相結合等方法。本文研究旨在對句法結構不同、基本邏輯語義相同的句子給出一致標注,建立具有一定規(guī)模的語義角色標注語料庫,從而對語義分析、自然語言理解做出一定貢獻。第一章,語義角色標注語料庫。本章主要介紹語義角色標注語料庫的語料來源及規(guī)模、前期句法庫的構建、語義角色關系類型和HowNet格框架詞典、語義角色標注平臺以及語義角色標注方案等基礎性工作。本文語料庫的語料來源于《人民日報》,共計4萬句;語義角色標注語料庫的構建是在前期依存句法樹庫的基礎上進行的,是對自然語言的進一步處理,標注平臺是在前期句法樹庫標注平臺的基礎上改造而成,可以在句法標注和語義角色標注之間相互轉換;語義角色關系類型和標注方案的依據(jù)是《現(xiàn)代漢語謂詞語義角色標注語料庫規(guī)范》,但與該規(guī)不同的是本文采用hownet格框架詞典輔助標注的方法,標注的客觀性和準確性有所保障。第二章,語義角色標注過程中的常見問題及處理方法。本章主要總結在人工標注語義角色過程中存在的問題,并針對這些問題提出相應的解決辦法。標注問題主要分三個方面:漏標、多標和錯標,每個方面又分別從謂詞性成分的標注問題和謂詞論元的標注問題兩個方面分別進行歸納和分析。最后根據(jù)存在的問題提出了相應的解決方法:正確掛靠同義詞、根據(jù)語境選擇動詞義項等。第三章,格框架詞典中存在的問題及解決對策。在對語料進行人工標注的基礎上,對格框架詞典中動詞的義項及其格框架存在的問題進行歸納,分析問題產(chǎn)生的原因,提出相應的解決對策。格框架存在的問題主要有動詞語義類的格框架不正確、動詞給定語義類不正確、動詞給定語義類不全以及未登錄詞四個方面。其中,動詞語義類的格框架不正確包括格框架語義角色不全、格框架必要角色設置錯誤兩個方面;動詞給定語義類不全包括動詞的語義類歸納不全面、同一語義類的格框架對其中的所有義項并不完全適用兩個方面。對于格框架存在問題的原因,主要從格框架詞典的設置、詞義的演變、同一語義類中動詞義項之間的差異、新詞的產(chǎn)生等幾個方面分別作了詳細的闡述。最后,針對問題提出的解決方法是采用句式變換的方法檢測格框架以及近義詞掛靠。對于格框架不正確的動詞語義類及同一語義類的格框架不適用于其中的所有動詞的情況,本文采用句式變換的方法對動詞的格框架進行驗證,其他問題則采用掛靠近義詞的方法進行修正。第四章,句式與句模的對應關系及語義角色標注規(guī)則。根據(jù)語義角色人工標注及校對的結果,以內(nèi)省的方式歸納出各種句式的典型句模。這些句式主要是主謂句,包括動詞謂語句、名詞謂語句和形容詞謂語句。其中,動詞謂語句包括一般動詞謂語句、“把”字句、“被”字句、兼語句、連謂句、雙賓句、“比”字句等句式。其次,根據(jù)有無標記,將句式的典型句模進行規(guī)整,總結出一套語義角色標注規(guī)則。最后,在測試集中檢測規(guī)則的有效性并總結規(guī)則覆蓋范圍之外的情況,提出解決策略。在有效性較好的前提下,將該規(guī)則應用到后期語義角色標注中,一方面可以發(fā)揮規(guī)則方法正確率高的優(yōu)點,降低人工標注的工作量,另一方面可利用這些規(guī)則自動檢查出純?nèi)斯俗⑦^程中的錯誤,提高語義角色標注的準確率。最后部分,結語。概括本文的主要研究內(nèi)容、研究成果;總結本文對中文信息處理以及漢語語法、語義研究的意義;最后,分析本文研究的不足之處并對下一步工作進行規(guī)劃。
[Abstract]:With the rapid development of information technology, Natural Language Processing's impact on human life more and more. In Natural Language Processing, how to make the computer in order to achieve human-computer interaction to understand human language, is an important problem to be solved. Although the rate of correct use of language knowledge and some statistical methods for lower level have achieved higher Chinese automatic segmentation and part of speech tagging, but some questions are not ambiguous, need for syntactic and semantic analysis can be completely resolved. For natural language understanding, syntactic parsing is a kind of means of the semantic analysis is the key and difficult one, no semantic analysis support, automatic syntactic parsing will also be difficult. In the process of implementation of artificial intelligence, semantic analysis showed a hitherto unknown importance and urgency, to make the Natural Language Processing system with computer The speed and human intelligence, semantic analysis can not in certain depth. Based on the existing syntactic Treebank on the construction of a certain scale corpus semantic role labeling. First of all, based on the HowNet framework and the "modern Chinese Dictionary" semantic role labeling corpus specification of semantic role labeling of the corpus (including manual annotation and proofreading of the two links); secondly, through manual labeling, the annotation system was modified and improved, the semantic role labeling rules were summed up and the rules of the effectiveness of detection; finally, the research content and the research results of this paper are summarized in this paper. Is divided into six parts, the main contents of each part as follows: the first part is introduction. The research status and theoretical background, mainly introduces the research, research methods and research significance of the theoretical background. Including the valence theory, argument theory, semantic role. Research is mainly from the relationship between the types of semantic roles, project construction and several semantic roles of semantic roles of corpus annotation are expounded. In research methods, this paper mainly adopts a corpus based approach, method of man-machine interactive method based on rules, and based on statistics and the combination of qualitative and quantitative methods. The aims of this study are different on the syntactic structure, basic logic semantics the same sentence given consistent annotation, establish semantic role with a certain scale of corpus, semantic analysis of natural language understanding to make some contribution. In the first chapter, corpus based semantic role labeling. The origin and scale of corpus. This chapter mainly introduces the corpus of semantic role labeling, construction of sentence semantic role relation type library, and the HowNet framework of semantic dictionary. Role tagging platform and semantic role labeling scheme and other infrastructure work. The corpus comes from the "people's Daily", a total of 40 thousand sentences; semantic role labeling corpus construction is based on the dependency Treebank in the early on, for further processing of natural language, annotation platform is the basic platform in early syntactic annotation the Treebank transform into, can be changed between syntactic annotation and semantic role labeling; semantic role relation type and annotation scheme is based on the "modern Chinese corpus that specification of semantic role labeling, but unlike the gauge is HowNet this paper uses the method of case frame dictionary assisted annotation, objectivity and accuracy of annotation the security. In the second chapter, common problems and treatment methods in the process of semantic role labeling. This chapter mainly summarizes the existing in manual annotation semantic roles in the process of asking Questions, and puts forward the corresponding solutions for these problems. The annotation problem is mainly divided into three aspects: leakage standard, multi standard and wrong standard, each part respectively from two aspects of predicate predicate argument annotation and annotation are summarized and analyzed. Finally, according to the existing problems and corresponding solutions put forward: right anchored synonymous verbs according to the context selection. The third chapter is case frame dictionary, problems and countermeasures. Based on manual annotation of the corpus, the existing meaning and the lattice framework verbal case frame Dictionary of the problem are summarized, analyzes the causes of the problems, put forward the corresponding countermeasures. The main problems are the framework of lattice frame verb semantic class is not correct, the verb to attributive semantic class is not correct, not all verbs to semantic classes and unknown words in four aspects. In the framework of incorrect verb semantic classes including frame semantic roles is not complete, lattice framework necessary character set two aspects of error; verb attributive semantic class to not include the semantics of the verb class induction is not comprehensive, frame the same semantic class does not end on the whole for all the senses of the two aspects. The reason for the framework of existing problems, mainly from the case frame dictionary settings, the evolution of the meaning of the difference between the verbs, the same semantic class, several aspects of the emergence of new words are described in detail. Finally, the solving method and puts forward the method of using sentence transformation detection framework and Synonyms the lattice framework anchored. Verb semantic class lattice framework does not correct and the same semantic class does not apply to all the verbs, this paper uses the method of case frame of the verb sentence transformation is verified, The other problem is corrected by the method of anchored near synonyms. In the fourth chapter, correspondence between sentence and sentence model and semantic role annotation rules. According to the semantic role annotation and proofreading results within the province, summed up the way of the sentence types typical sentence model. These sentences are subject predicate sentences, including verb predicate, noun predicate statement and adjective predicate sentences. The verb predicate sentence including general verb predicate sentences, "Ba", "Bei", and statements, even that sentence, double object sentence, the sentence pattern of "Bi". Secondly, according to the marked, the sentence sentence model of typical structured, summed up a set of semantic role labeling rules. Finally, in the effectiveness of the test set of detection rules and summarize the rules out of the area, proposes the solution strategy. In the premise of effective, apply the rule to the late semantic role labeling, You can play the advantages of high rate of correct rules on the one hand, reduce the workload of manual annotation, on the other hand can use these rules to automatically check out the pure manual annotation errors, improve the accuracy of semantic role labeling. The last part, the conclusion. The main research contents, summarizes the research results of this paper Chinese information; and Chinese grammar, semantic meaning of research; finally, analysis of the inadequacies of the study and plan the next step.

【學位授予單位】:魯東大學
【學位級別】:碩士
【學位授予年份】:2017
【分類號】:H146.3

【參考文獻】

相關期刊論文 前10條

1 何保榮;邱立坤;徐德寬;;基于規(guī)則的“把”字句語義角色標注[J];中文信息學報;2017年01期

2 鄭麗娟;邵艷秋;;基于語義依存圖庫的兼語句句模研究[J];中文信息學報;2015年06期

3 邱立坤;金澎;王厚峰;;基于依存語法構建多視圖漢語樹庫[J];中文信息學報;2015年03期

4 范曉,朱曉亞;論句模研究的方法[J];徐州師范大學學報;1999年04期

5 徐昌火;試論句模研究的對象、起點和基本原則——句模研究系列之一[J];南京師大學報(社會科學版);1999年04期

6 徐烈炯,沈陽;題元理論與漢語配價問題[J];當代語言學;1998年03期

7 周強,張偉,俞士汶;漢語樹庫的構建[J];中文信息學報;1997年04期

8 吳為章;漢語動詞配價研究述評[J];三明大學學報(綜合版);1996年S2期

9 周明,黃昌寧;面向語料庫標注的漢語依存體系的探討[J];中文信息學報;1994年03期

10 張普;信息處理用現(xiàn)代漢語語義分析的理論與方法[J];中文信息學報;1991年03期



本文編號:1408756

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/shoufeilunwen/zaizhiboshi/1408756.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權申明:資料由用戶435f3***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com
九九热国产这里只有精品| 在线观看那种视频你懂的| 国产亚洲中文日韩欧美综合网| 精品久久少妇激情视频| 日韩亚洲激情在线观看| 国产亚洲欧美日韩精品一区 | 成人精品亚洲欧美日韩| 国产一级性生活录像片| 久久国产精品热爱视频| 日韩一区二区三区在线欧洲| 国产一级特黄在线观看| 亚洲婷婷开心色四房播播| 日本加勒比中文在线观看| 久久综合日韩精品免费观看| 婷婷色网视频在线播放| 日韩欧美亚洲综合在线| 国产熟女一区二区精品视频| 激情五月激情婷婷丁香| 国产91麻豆精品成人区| 日本午夜精品视频在线观看| 国产乱人伦精品一区二区三区四区| 麻豆印象传媒在线观看| 欧美日本道一区二区三区| 日本丁香婷婷欧美激情| 中文字幕日产乱码一区二区| 欧美欧美欧美欧美一区| 欧美成人国产精品高清| 国产成人精品99在线观看| 国产原创中文av在线播放| 精品人妻少妇二区三区| 91亚洲熟女少妇在线观看| 国产成人精品视频一二区| 精品少妇人妻av免费看| 日韩欧美好看的剧情片免费| 97人摸人人澡人人人超碰| 欧美成人久久久免费播放| 黄色片一区二区三区高清| 国产肥妇一区二区熟女精品| 国内欲色一区二区三区| 青青久久亚洲婷婷中文网| 91久久国产福利自产拍|