天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁 > 文藝論文 > 漢語言論文 >

兒童語言習(xí)得的計(jì)算模型研究

發(fā)布時(shí)間:2018-12-13 03:09
【摘要】:語言習(xí)得的計(jì)算模型研究基于計(jì)算技術(shù)的語言知識(shí)獲取,是高質(zhì)量自然語言處理應(yīng)用中不可或缺的部分。兒童時(shí)期是語言知識(shí)習(xí)得的關(guān)鍵期,人類在兒童時(shí)期就習(xí)得了基本的語言知識(shí),因此發(fā)展兒童語言習(xí)得的計(jì)算模型,對(duì)于語言知識(shí)習(xí)得計(jì)算模型的研究具有重要的價(jià)值。同時(shí),發(fā)展兒童語言習(xí)得計(jì)算模型,尤其是發(fā)展能有效引入各種認(rèn)知過程的計(jì)算模型是研究和評(píng)估兒童語言習(xí)得過程中各種認(rèn)知假設(shè)的一個(gè)非常有效的途徑,對(duì)于揭示兒童語言發(fā)展的機(jī)理具有重要價(jià)值。為此,人們從計(jì)算語言學(xué)、認(rèn)知心理學(xué)、發(fā)展語言學(xué)等不同領(lǐng)域出發(fā)開展了豐富的兒童語言習(xí)得計(jì)算模型的研究。 然而,現(xiàn)有的兒童語言習(xí)得計(jì)算模型還存在一些缺陷。例如:詞匯范疇習(xí)得模型沒有統(tǒng)一的評(píng)測(cè)方法,并需要預(yù)設(shè)范疇數(shù)目;句法習(xí)得模型對(duì)長(zhǎng)距相依現(xiàn)象的描述能力弱;在模型中引入兒童語言習(xí)得的認(rèn)知研究成果還不夠。 本文針對(duì)語言習(xí)得計(jì)算模型存在的上述不足之處,在兒童語料庫建設(shè)、兒童詞匯范疇習(xí)得和句法習(xí)得的計(jì)算模型等幾個(gè)方面開展了多項(xiàng)研究工作,論文的主要工作和研究成果有: (1)建立了一個(gè)兒童以及兒向口語漢字語料庫,并在字、詞以及句子三個(gè)層面上,對(duì)兒童語言、兒向語言(成人向兒童說的語言CDS:Children Directed Language)以及成人語言進(jìn)行了統(tǒng)計(jì)、對(duì)比和分析。兒童的語言能力體現(xiàn)在兒童產(chǎn)生的兒童語言以及對(duì)兒向語言的理解能力之上,兒童語言和兒向語言與成人語言具有較大的差異,因此,構(gòu)建兒童和兒向語料庫是研究?jī)和Z言習(xí)得的基礎(chǔ)。兒童語言習(xí)得的計(jì)算模型需要基于兒童和兒向語料庫而建立,在訓(xùn)練或者評(píng)測(cè)時(shí),也應(yīng)基于兒童和兒向語料庫。為此,作為開展兒童語言習(xí)得計(jì)算模型研究的第一步,本文首先基于目前世界上最大的兒童口語語料庫CHILDES中的中文語料,通過轉(zhuǎn)寫、標(biāo)注和校正,建立了一個(gè)兒童及兒向口語語料庫。 (2)對(duì)于兒童詞匯范疇習(xí)得計(jì)算模型,本文從評(píng)測(cè)方法和計(jì)算模型兩個(gè)方面開展了研究。 提出了一種稱為一致度(Cohesivity)的新度量來評(píng)測(cè)詞匯范疇習(xí)得的性能,該度量能綜合考慮信息性、多樣性和精確性三個(gè)評(píng)測(cè)準(zhǔn)則,實(shí)驗(yàn)表明了其可行性和有效性。 提出了采用狄利克雷過程混合模型(Dirichlet Process Mixture Models, DPMMs)和近鄰傳播算法(Affinity Propagation, AP)進(jìn)行詞匯范疇習(xí)得,避免了以往研究中需要預(yù)定義范疇數(shù)量的問題。進(jìn)而,基于其它認(rèn)知通道可以為語言習(xí)得提供先驗(yàn)信息這一認(rèn)知過程,采用人工標(biāo)注的種子詞模擬來自其他通道的先驗(yàn)信息構(gòu)建了一種半監(jiān)督AP算法,實(shí)驗(yàn)結(jié)果表明了這種先驗(yàn)信息的有效性。 (3)本文提出了一種階段式的句法習(xí)得模型,建模兒童句法習(xí)得從簡(jiǎn)單到復(fù)雜、從具體到抽象的認(rèn)知過程,實(shí)驗(yàn)結(jié)果表明了模型的有效性。 該模型的句法習(xí)得分為三個(gè)階段,第一階段,習(xí)得連續(xù)的具體結(jié)構(gòu)。在這一階段,只考慮連續(xù)的終結(jié)符組成的句法結(jié)構(gòu);第二階段,習(xí)得長(zhǎng)距離依存結(jié)構(gòu)。在這一階段,仍然只考慮終結(jié)符,但是可以習(xí)得非連續(xù)的結(jié)構(gòu);第三階段,習(xí)得層次結(jié)構(gòu)。這一階段,習(xí)得終結(jié)符和非終結(jié)符混合的層次句法結(jié)構(gòu),最終完成句法結(jié)構(gòu)的習(xí)得。 (4)本文建模了兒童語言中詞匯范疇和句法結(jié)構(gòu)分階段增量式增長(zhǎng)這一認(rèn)知過程,將所提出的詞匯范疇習(xí)得模型分階段訓(xùn)練并結(jié)合到上述階段式句法習(xí)得模型中,提出了一個(gè)基于詞匯范疇的句法習(xí)得模型框架。并將模型應(yīng)用于語言生成中,將生成的語言與兒童語言、兒向語言進(jìn)行了對(duì)比,人工評(píng)測(cè)了模型所生成的語言。實(shí)驗(yàn)表明結(jié)合詞匯范疇信息能有效提高句法習(xí)得的性能,生成的語言具有較好的流暢性和可理解性。
[Abstract]:The calculation model of the language learning is based on the language knowledge acquisition of the computing technology and is an integral part of the high-quality natural language processing application. The period of the child is the key period of the study of the language knowledge, and the basic language knowledge is learned in the childhood, so the calculation model of the children's language study is developed, and the study of the language knowledge learning model is of great value. At the same time, developing the model of the child's language study, especially the calculation model that can effectively introduce various cognitive processes, is a very effective way to study and evaluate the various cognitive hypotheses in the course of children's language study, which is of great value to reveal the mechanism of the development of children's language. To this end, a study of the computational models of children's language learning has been carried out in various fields, such as computational linguistics, cognitive psychology, and development linguistics. However, there are still some shortcomings in the existing computing model of children's language learning For example, there is no uniform evaluation method of the model of the vocabulary category, and the number of the pre-set categories is required; the model of the syntax study is weak in the description ability of the long-distance dependent phenomenon; the cognitive research results that are introduced into the children's language study in the model do not This paper has carried out a number of research work in the aspects of children's corpus construction, children's vocabulary category and the calculation model of the syntax, and the main work and research of the paper. The results are as follows: (1) A child and a child-to-oral Chinese word corpus are established, and in the three dimensions of words, words and sentences, the language of the child, the language of the child to the child (the language CDS: Children's Directed Language) and the adult language are counted. By contrast and analysis, the children's language ability is reflected in the children's language and the understanding ability of the children to the language, and the language and the language of the children have a great difference in the language and the adult language. Therefore, the construction of the child and the child-oriented corpus is the study of the children's language The learning model for children's language learning needs to be established based on a corpus of children and children, and should also be based on children and children in training or evaluation To this end, as a first step in the study of the study of children's language learning, this paper first set up a child and a child based on the Chinese corpus in the largest child's spoken corpus CHILDES in the world. (2) For children's vocabulary, this paper is based on the evaluation method and the calculation model. In this paper, a new measure, called Cohesion, is presented to evaluate the performance of the vocabulary, which can comprehensively consider the three evaluation criteria of information, diversity and accuracy. The feasibility and effectiveness of the method are presented. The Dirichlet Process Mixed Models (DPMMs) and the nearest neighbor propagation (AP) are used to study the vocabulary, so as to avoid the need of the previous research. In order to solve the problem of the number of predefined categories, based on other cognitive channels, we can provide a priori information for language learning, and a semi-supervised AP algorithm is constructed by using the seed word of the artificial dimension to simulate the prior information from other channels. The experimental results show that The validity of this prior information is presented in this paper. (3) In this paper, a kind of stage-based syntax learning model is put forward, and the modeling of children's syntax is from simple to complex and from concrete to abstract cognitive process. The experimental results show the validity of the model. The syntax of the model is divided into three stages. In the first stage, a continuous concrete structure is learned. In this stage, only the syntactic structure of the continuous final form is considered. in that second stage, the long-distance dependency structure is learn. in this stage, only the final character is considered, but it can be learned that it is not continuous a structure; a third stage, a hierarchical structure, which is a hierarchical sentence mixed with a final and a non-final character. (4) The cognitive process of the lexical category and the sentence structure in children's language is modeled in this paper, and the model of the proposed vocabulary is selected in stages. in that model of the above-mentioned stage-type syntax, it is put forward A syntax-based model framework based on the vocabulary category. The model is applied to the language generation, and the generated language is compared with the children's language. By contrast, the language generated by the model is evaluated manually. The experiment shows that the combination of the lexical category information can effectively improve the performance of the syntactic study,
【學(xué)位授予單位】:北京郵電大學(xué)
【學(xué)位級(jí)別】:博士
【學(xué)位授予年份】:2012
【分類號(hào)】:H193.1

【參考文獻(xiàn)】

相關(guān)期刊論文 前2條

1 周曉紅;;第一語言習(xí)得研究概況[J];理論界;2008年05期

2 周強(qiáng);漢語句法樹庫標(biāo)注體系[J];中文信息學(xué)報(bào);2004年04期



本文編號(hào):2375765

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/wenyilunwen/hanyulw/2375765.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶d93e6***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com