基于全句內(nèi)共現(xiàn)的現(xiàn)代漢語和漢語中介語詞匯知識對比研究
發(fā)布時間:2018-11-26 17:02
【摘要】:詞語共現(xiàn)作為詞匯知識的重要組成部分,在以往的研究中頗受關(guān)注。在前人的研究中,詞語共現(xiàn)的范圍大都集中在所研究詞語左右各5個詞語以內(nèi)。本文根據(jù)漢語的實際特點以及研究的需要,將共現(xiàn)范圍調(diào)整為全句內(nèi),開發(fā)了基于現(xiàn)代漢語語料庫和漢語中介語語料庫的“漢語詞語全句共現(xiàn)的自動提取”程序,為詞匯知識的描述提供了可靠的素材。該程序可以在現(xiàn)代漢語語料庫中自動提取指定詞語的共現(xiàn)詞、共現(xiàn)詞距離、共現(xiàn)詞義項等信息,在漢語中介語語料庫中自動提取共現(xiàn)詞、共現(xiàn)詞距離、共現(xiàn)詞詞類并提供相應(yīng)的漢語水平、母語背景等信息,并可以按照研究者的需要統(tǒng)計頻次以及排序;凇皾h語詞語全句共現(xiàn)的自動提取”程序所得到的共現(xiàn)信息,不僅可以用作記錄詞語的詞匯知識,以及作為詞義表征的部分用在計算機模擬研究中,還可以用在中介語對比分析的過程中。本文對比了“看”在現(xiàn)代漢語語料庫和漢語中介語語料庫之間,以及中介語各水平語料庫之間的共現(xiàn)信息的差異。文章將“看”在現(xiàn)代漢語語料庫和漢語中介語語料庫中的共現(xiàn)詞依照《同義詞詞林》分別進行語義歸類,考察各類詞語在中介語當中相對于在現(xiàn)代漢語當中使用過度或使用不足的程度。此外還考察了中介語各水平之間的共現(xiàn)用法的差異。這使得漢語作為第二語言習(xí)得的研究不再局限在以往的偏誤分析,而是從詞匯共現(xiàn)的角度深入考察了中介語和現(xiàn)代漢語之間的用法差異。文章得出的主要結(jié)論有:在“看”的共現(xiàn)詞的語義分布中,中介語相對于現(xiàn)代漢語使用過度最嚴重的大類是“助語”,使用不足最嚴重的大類是“活動”;中介語相對于現(xiàn)代漢語使用過度最嚴重的三個中類依次是“抽象事物/文教”“抽象事物/社會政法”“物/地貌”,使用不足最嚴重的三個中類依次是“人/專名”“物/全身”“活動/行政管理”。漢語中介語四個水平的子庫中,“看”的共現(xiàn)詞的語義大類分布情況起伏不定。在學(xué)習(xí)一年半至兩年時,共現(xiàn)詞的整體語義大類分布與現(xiàn)代漢語差異最大,隨后隨著水平的提高,語義大類分布趨同于現(xiàn)代漢語。
[Abstract]:As an important part of lexical knowledge, lexical co-occurrence has attracted much attention in previous studies. In previous studies, the scope of cooccurrence of words is mostly concentrated in 5 words about each word studied. According to the actual characteristics of Chinese and the needs of the research, this paper adjusts the scope of co-occurrence to the whole sentence, and develops a program of "automatic extraction of Chinese words and phrases co-occurrence" based on modern Chinese corpus and Chinese interlanguage corpus. It provides reliable material for the description of lexical knowledge. The program can automatically extract the information such as cooccurrence words, cooccurrence words distance, co-occurrence terms and other information in modern Chinese corpus, and automatically extract co-occurrence words and cooccurrence words distance in Chinese interlanguage corpus. Co-occurrence of word categories and provide the corresponding Chinese level, mother tongue background and other information, and can be according to the needs of the researcher frequency and ranking. The co-occurrence information obtained from the program "automatic extraction of all sentences in Chinese words" can be used not only to record the lexical knowledge of words, but also to use them as part of word meaning representation in computer simulation research. It can also be used in the process of contrastive analysis of interlanguage. This paper compares the differences of co-occurrence information between the Modern Chinese Corpus and the Chinese Interlanguage Corpus, as well as between the Interlanguage Corpus and the Interlanguage level Corpus. In this paper, the co-occurrence words in modern Chinese corpus and Chinese interlanguage corpus are classified according to synonym forest. To investigate the degree of overuse or underuse of various words in interlanguage relative to modern Chinese. In addition, the differences of co-occurrence between different levels of interlanguage are investigated. This makes the study of Chinese as a second language acquisition no longer confined to the previous error analysis, but from the perspective of lexical co-occurrence, in-depth study of the interlanguage and modern Chinese usage differences. The main conclusions of this paper are as follows: in the semantic distribution of co-occurrence words of "look", the most serious category of interlanguage is "auxiliary language" compared with modern Chinese, and the most serious one is "activity"; The three most serious types of interlanguage used in modern Chinese are "abstract things / culture and education", "abstract things / social laws" and "things / landforms". The three most underused middle classes are "person / proper name", "object / body", "activity / administration". In the four levels of Chinese interlanguage subdatabase, the semantic distribution of co-occurrence words of "look" fluctuates. After a year and a half to two years of study, the overall semantic category distribution of co-occurrence words is most different from that of modern Chinese, and then, with the improvement of the level, the semantic large category distribution converges to modern Chinese.
【學(xué)位授予單位】:北京語言大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:H136
本文編號:2359126
[Abstract]:As an important part of lexical knowledge, lexical co-occurrence has attracted much attention in previous studies. In previous studies, the scope of cooccurrence of words is mostly concentrated in 5 words about each word studied. According to the actual characteristics of Chinese and the needs of the research, this paper adjusts the scope of co-occurrence to the whole sentence, and develops a program of "automatic extraction of Chinese words and phrases co-occurrence" based on modern Chinese corpus and Chinese interlanguage corpus. It provides reliable material for the description of lexical knowledge. The program can automatically extract the information such as cooccurrence words, cooccurrence words distance, co-occurrence terms and other information in modern Chinese corpus, and automatically extract co-occurrence words and cooccurrence words distance in Chinese interlanguage corpus. Co-occurrence of word categories and provide the corresponding Chinese level, mother tongue background and other information, and can be according to the needs of the researcher frequency and ranking. The co-occurrence information obtained from the program "automatic extraction of all sentences in Chinese words" can be used not only to record the lexical knowledge of words, but also to use them as part of word meaning representation in computer simulation research. It can also be used in the process of contrastive analysis of interlanguage. This paper compares the differences of co-occurrence information between the Modern Chinese Corpus and the Chinese Interlanguage Corpus, as well as between the Interlanguage Corpus and the Interlanguage level Corpus. In this paper, the co-occurrence words in modern Chinese corpus and Chinese interlanguage corpus are classified according to synonym forest. To investigate the degree of overuse or underuse of various words in interlanguage relative to modern Chinese. In addition, the differences of co-occurrence between different levels of interlanguage are investigated. This makes the study of Chinese as a second language acquisition no longer confined to the previous error analysis, but from the perspective of lexical co-occurrence, in-depth study of the interlanguage and modern Chinese usage differences. The main conclusions of this paper are as follows: in the semantic distribution of co-occurrence words of "look", the most serious category of interlanguage is "auxiliary language" compared with modern Chinese, and the most serious one is "activity"; The three most serious types of interlanguage used in modern Chinese are "abstract things / culture and education", "abstract things / social laws" and "things / landforms". The three most underused middle classes are "person / proper name", "object / body", "activity / administration". In the four levels of Chinese interlanguage subdatabase, the semantic distribution of co-occurrence words of "look" fluctuates. After a year and a half to two years of study, the overall semantic category distribution of co-occurrence words is most different from that of modern Chinese, and then, with the improvement of the level, the semantic large category distribution converges to modern Chinese.
【學(xué)位授予單位】:北京語言大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:H136
【參考文獻】
相關(guān)期刊論文 前2條
1 年洪東;張霄軍;;基于語料庫的容器類隱喻名詞短語研究——以“海洋”為例[J];心智與計算;2009年01期
2 儲誠志;陳小荷;;建立“漢語中介語語料庫系統(tǒng)”的基本設(shè)想[J];世界漢語教學(xué);1993年03期
,本文編號:2359126
本文鏈接:http://sikaile.net/wenyilunwen/yuyanyishu/2359126.html
最近更新
教材專著