基于標(biāo)記隱現(xiàn)規(guī)則和關(guān)聯(lián)特征的復(fù)句層次結(jié)構(gòu)分析研究
本文選題:復(fù)句層次結(jié)構(gòu) + 標(biāo)記隱現(xiàn)模式 ; 參考:《華中師范大學(xué)》2017年碩士論文
【摘要】:中文信息處理作為計(jì)算語言學(xué)的一個(gè)分支,在人工智能、搜索引擎等互聯(lián)網(wǎng)技術(shù)飛速發(fā)展的今天顯得越來越重要。漢語的使用范圍也隨著中國在國際上的影響力逐漸擴(kuò)大,而漢語復(fù)句作為漢語語言的重要組成部分,已經(jīng)成為計(jì)算機(jī)解決的核心對象,也是目前中文信息處理的研究難點(diǎn)之一。目前,對復(fù)句的研究主要包括關(guān)系標(biāo)記的自動(dòng)標(biāo)識(shí)、分句和非分句的判斷、復(fù)句層次的自動(dòng)劃分和復(fù)句關(guān)系的識(shí)別。其中,關(guān)系標(biāo)記的自動(dòng)標(biāo)識(shí)和分句劃分已經(jīng)有了較多的研究,而復(fù)句層次的自動(dòng)劃分和關(guān)系識(shí)別研究較少。鑒于關(guān)系標(biāo)記的自動(dòng)標(biāo)識(shí)技術(shù)已經(jīng)基本成熟,且關(guān)系標(biāo)記本身具有標(biāo)明復(fù)句層次結(jié)構(gòu)和分句間邏輯語義的作用,因此,在對復(fù)句的層次結(jié)構(gòu)進(jìn)行分析時(shí)也要緊緊抓住關(guān)系標(biāo)記這一重要形式標(biāo)志。然而,由于漢語表達(dá)方式的多樣性,分句內(nèi)總會(huì)出現(xiàn)關(guān)系標(biāo)記的缺省,即關(guān)系標(biāo)記的隱現(xiàn),這就導(dǎo)致僅僅依靠關(guān)系標(biāo)記實(shí)現(xiàn)復(fù)句層次的識(shí)別困難重重。為此,本文采取“分而治之”的策略,將研究對象(三句式有標(biāo)復(fù)句)分成充盈態(tài)和非充盈態(tài)兩種類型;同時(shí),為解決標(biāo)記缺省的問題,構(gòu)建了標(biāo)記配位類型表和標(biāo)記隱現(xiàn)規(guī)則,實(shí)現(xiàn)對復(fù)句內(nèi)的標(biāo)記隱現(xiàn)模式的自動(dòng)提取;另外,在復(fù)句的依存句法分析基礎(chǔ)上,提出了利用句法成分復(fù)現(xiàn)進(jìn)行分句間關(guān)聯(lián)度的計(jì)算。最終通過構(gòu)建基于標(biāo)記隱現(xiàn)規(guī)則和關(guān)聯(lián)特征的復(fù)句層次結(jié)構(gòu)識(shí)別模型,達(dá)到對復(fù)句的層次結(jié)構(gòu)進(jìn)行自動(dòng)劃分的目的。本文的工作從以下幾個(gè)方面開展。首先,本文利用依存句法和標(biāo)點(diǎn)符號(hào)對復(fù)句內(nèi)的分句進(jìn)行劃分;其次,在剔除偽分句的基礎(chǔ)上,對分句內(nèi)的關(guān)系標(biāo)記進(jìn)行標(biāo)注和提取,以期獲得復(fù)句的關(guān)系標(biāo)記序列;然后,構(gòu)建標(biāo)記配位類型表,并在此基礎(chǔ)上提出了標(biāo)記隱現(xiàn)模式確定算法,獲得給定復(fù)句的標(biāo)記隱現(xiàn)模式;同時(shí),在依存句法分析的基礎(chǔ)上,提出利用句法成分復(fù)現(xiàn)對分句間的關(guān)聯(lián)度進(jìn)行計(jì)算的方法;最后,構(gòu)建基于標(biāo)記隱現(xiàn)規(guī)則和關(guān)聯(lián)特征進(jìn)行復(fù)句層次結(jié)構(gòu)判斷的模型,通過標(biāo)記隱現(xiàn)規(guī)則對充盈態(tài)三句式有標(biāo)復(fù)句的層次進(jìn)行識(shí)別和分析,對于非充盈態(tài)三句式有標(biāo)復(fù)句則利用關(guān)聯(lián)特征進(jìn)行層次結(jié)構(gòu)判斷。通過實(shí)驗(yàn),標(biāo)記隱現(xiàn)模式獲得的正確率達(dá)91.5%,復(fù)句層次結(jié)構(gòu)分析的正確率達(dá)90.6%。該結(jié)果表明,本文提出的方法對復(fù)句層次結(jié)構(gòu)的分析是行之有效的。
[Abstract]:As a branch of computational linguistics, Chinese information processing is becoming more and more important with the rapid development of artificial intelligence, search engine and other Internet technologies. The scope of use of Chinese has gradually expanded with the international influence of China. As an important part of Chinese language, Chinese complex sentence has become the core object of computer solution and one of the difficulties in the research of Chinese information processing. At present, the research on complex sentences mainly includes the automatic identification of relational markers, the judgment of clauses and non-clauses, the automatic division of complex sentence levels and the recognition of complex sentence relationships. Among them, more research has been done on automatic identification and clause division of relational markers, but less on automatic classification and relationship recognition of complex sentence levels. In view of the fact that the automatic identification technology of relational tags is basically mature, and that relational tags themselves have the function of indicating the hierarchical structure of complex sentences and the logical semantics between clauses, so, In the analysis of the hierarchical structure of complex sentences, we should also grasp the important formal mark of relational markers. However, due to the diversity of Chinese expressions, the default of relational markers, that is, the concealment of relational markers, always occurs in clauses, which leads to difficulties in realizing the recognition of complex sentence levels by only relying on relational markers. In order to solve the problem of marking default, this paper adopts the strategy of "divide and conquer", and divides the research object into two types: filling state and non-filling state. In this paper, the label coordination type table and marker concealment rule are constructed to realize the automatic extraction of the marker concealment pattern in complex sentences, in addition, based on the analysis of the dependency syntax of complex sentences, the correlation degree between clauses is calculated by using syntactic component repetition. Finally, the hierarchical structure recognition model of complex sentence based on label hidden rule and association feature is constructed to achieve the purpose of automatic division of complex sentence hierarchy. The work of this paper is carried out from the following aspects. First of all, this paper uses dependency syntax and punctuation to divide the clauses in complex sentences. Secondly, on the basis of eliminating pseudo-clauses, we annotate and extract the relational markers in clauses in order to obtain the sequence of relational markers of complex sentences. On the basis of constructing the marker coordination type table, a new algorithm for determining the label hidden pattern is proposed to obtain the marker hidden pattern of a given complex sentence, and at the same time, on the basis of the analysis of dependent syntax, This paper proposes a method of calculating the correlation degree between clauses by using syntactic component repetition. Finally, a model for judging the hierarchical structure of complex sentences based on marker concealment rules and associated features is constructed. This paper identifies and analyzes the levels of marked complex sentences in three sentence types of filling state by the rule of marked concealment, and judges the hierarchical structure of three marked complex sentences of non-filling state of three sentences by using the correlation feature. The experimental results show that the correct rate of marker concealment pattern is 91.5% and the correct rate of complex sentence hierarchy analysis is 90.6%. The results show that the method proposed in this paper is effective in analyzing the hierarchical structure of complex sentences.
【學(xué)位授予單位】:華中師范大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:TP391.1
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 秦川;;復(fù)句關(guān)系的意象圖式——以系聯(lián)圖示為例[J];語文學(xué)刊;2015年14期
2 鄭曉曉;段雪璐;;現(xiàn)代漢語容認(rèn)性讓步句的形式分類[J];忻州師范學(xué)院學(xué)報(bào);2014年03期
3 趙舸;;漢語復(fù)句三分法與對外漢語教學(xué)淺談[J];邢臺(tái)學(xué)院學(xué)報(bào);2014年02期
4 石翠;;依存句法分析研究綜述[J];智能計(jì)算機(jī)與應(yīng)用;2013年06期
5 ;易混標(biāo)點(diǎn)符號(hào)用法比較[J];新疆地方志;2012年04期
6 羅進(jìn)軍;;有標(biāo)假設(shè)復(fù)句的語義關(guān)系特征[J];華中師范大學(xué)學(xué)報(bào)(人文社會(huì)科學(xué)版);2012年05期
7 段瀟雪;;現(xiàn)代漢語語義角色研究述評(píng)[J];文教資料;2012年27期
8 鄭偉發(fā);;漢語句法分析研究綜述[J];信息技術(shù);2012年07期
9 吳鋒文;;面向信息處理的“一標(biāo)三句式”復(fù)句層次關(guān)系判定[J];北方論叢;2012年01期
10 吳鋒文;;面向信息處理的“二標(biāo)三句式”復(fù)句層次關(guān)系判定[J];信陽師范學(xué)院學(xué)報(bào)(哲學(xué)社會(huì)科學(xué)版);2012年01期
相關(guān)博士學(xué)位論文 前1條
1 舒江波;面向中文信息處理的復(fù)句關(guān)系詞自動(dòng)標(biāo)識(shí)研究[D];華中師范大學(xué);2011年
相關(guān)碩士學(xué)位論文 前2條
1 洪鹿平;漢語復(fù)句關(guān)系自動(dòng)判定研究[D];南京師范大學(xué);2008年
2 俞小娟;面向中文信息處理的漢語復(fù)句中書讀短語的自動(dòng)識(shí)別研究[D];華中師范大學(xué);2008年
,本文編號(hào):2087386
本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2087386.html