天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當前位置:主頁 > 文藝論文 > 漢語言論文 >

語料庫結構研究及其應用

發(fā)布時間:2018-05-06 14:43

  本文選題:語料庫 + 結構; 參考:《江南大學》2012年碩士論文


【摘要】:基于真實的語言數(shù)據(jù),語料庫語言學以概率的手段從宏觀角度進行語言分析,越來越受到語言研究者的青睞。語料庫是語料庫語言學的研究基礎,建設一個全面、具有代表性的語料庫對研究結果具有極其重要的意義。語料庫的建設需要考慮諸多因素,如建庫大小,語料的來源、類型等等。 語料庫具不具有代表性,語料是否能全面的代表所要研究領域,折射出語料庫的結構是否合理。語料庫的結構主要涉及語料的分層標準及其在語料庫中所占的相應比例兩方面。本文由調(diào)查西方主要語料庫的結構著手,借鑒系統(tǒng)功能語言學,研究試回答語料庫在結構安排上存在何種潛在規(guī)律。系統(tǒng)功能語言學創(chuàng)始人韓禮德對語言有過系統(tǒng)的闡述。他認為語言整體上是一個連續(xù)體,口語和書面語處于連續(xù)體的兩端。并且特別的指出居于連續(xù)體中間的語體既有口語特征,也具有書面語特征,同時向兩端延伸演化為典型口語和書面語。連續(xù)體理論反對書面語第一位或口語第一位的論調(diào),從語體上全面、辯證統(tǒng)一的描述了語言。借助于該理論,作者發(fā)現(xiàn)SEU語料庫、Brown語料庫、LOB語料庫以及ICE-GB語料庫的結構充分考慮了語體的因素,尤以SEU語料庫最為突出。SEU中采取written origin、scripted to be spoken、Spoken origin三大主劃分,語體從書面語逐步發(fā)展為口語。其中scripted to be spoken分層標準包括訪談、劇本、演講稿等,精確的體現(xiàn)了連續(xù)體的口語和書面語的連續(xù)。Brown、LOB語料庫未收錄口語語體,正因為如此,它對書面語的歸類具有示范性作用。參照連續(xù)體示意圖,文章把綜上分析結果以及各個主要分層比例一一對映于該坐標,最后得出了一個比較對稱的圖行,表明了這些語料庫具有較好的代表性。但是,語體的分層標準并不是唯一的分類理據(jù),諸如BNC語料庫、LLELC語料庫、MCLC語料庫卻采用學科劃分標準,比如applied science, social science, arts等等。進一步的研究發(fā)現(xiàn)這兩類分層標準并不是孤立的,ICE-GB中的learned and the popular分類的子分支沿用了social sciences, natural sciences,這證實該語料庫同時采用了兩類分層模式。 以上兩種分層樣式是較常見的語料庫結構安排策略。未囿于此,該研究以自建英語專業(yè)相關知識語料庫的結構為例,從實際出發(fā),深入探討其結構構建。首先基于英語專業(yè)的實習日志數(shù)據(jù),分析學生所從事的行業(yè)以及英語用途,從而有效的表針社會對英語專業(yè)相關知識的需求。研究采用了2006屆102名畢業(yè)生的實習日志,經(jīng)過統(tǒng)計,34名同學未從事英語相關的職業(yè)。根據(jù)每個學生實習日志所關注的重點,剩余學生實習內(nèi)容主要涉及外貿(mào)英語、英語教學、英語翻譯、文秘英語、機械英語等行業(yè)。按照各個行業(yè)實際參入人數(shù),計算出相應所占比例,從而得出各個層次的比重。借鑒學科分層模式,結合行業(yè)統(tǒng)計,文章初步給出了外貿(mào)、機械、計算機、教學等分層參考樣式。每個分層之下,以外貿(mào)英語為例,本文運用連續(xù)體理論下語料庫結構分析成果,嘗試性的探討了如何進行具體劃分和收集語料。 著眼于主要西方語料庫結構分析,本文結合實例探討語料庫結構劃分。但因研究時間、精力有限,本文仍然存在不少亟待完善之處。僅僅102名學生的日志并不能有效的代表所有英語專業(yè)相關知識范疇。例如,所有的學生可能未從事與法律有關的英語工作,但這不能說明英語專業(yè)相關知識就不囊括法律英語。因此,后期研究仍期望有待進行。盡管如此,本文主要意在開拓一種新思路,為自建語料庫,特別是語料庫的結構安排提供建設性的借鑒。隨著小型語料庫不斷受到言語工作者的重視,希望本文對語料庫建設理論有所裨益。
[Abstract]:Corpus linguistics is becoming more and more popular with language researchers based on real language data. Corpus linguistics is becoming more and more popular with language researchers. Corpus is the foundation of corpus linguistics. Building a comprehensive and representative corpus is of great significance to the research results. Consider many factors, such as the size of the library, the source and type of the corpus.
The corpus is not representative. Whether the corpus can be fully represented is a reflection of the rationality of the structure of the corpus. The structure of the corpus mainly involves the stratification standard of the corpus and the corresponding proportion in the corpus of two aspects. This paper begins with the investigation of the structure of the main corpus in the West and draws on the functional language of the system. Hallidy, the founder of systemic functional linguistics, has a systematic exposition of language. He thinks that language is a continuum on the whole, spoken and written at both ends of the continuum. And it is particularly pointed out that the language in the middle of the continuum has spoken language features. It also has the characteristics of written language, and extends to the two ends as typical spoken and written language. Continuum theory is opposed to the first or the first spoken language of written language, which describes language comprehensively and dialectically. With the help of the theory, the author finds the structure of SEU corpus, Brown corpus, LOB corpus and ICE-GB corpus. Taking full consideration of the factors of the style of language, especially the SEU corpus is most prominent in.SEU, written origin, scripted to be spoken, Spoken origin are divided into three major divisions, and the style of language is gradually developed from written language to spoken language. The continuous.Brown, LOB corpus of the language is not included in the colloquial language. It is precisely because of this, it has a demonstration effect on the classification of the written language. Good representativeness. However, the stratification standard of the corpus is not the only classification principle, such as the BNC corpus, the LLELC corpus, the MCLC corpus and the discipline division standards, such as applied science, social science, arts and so on. Further studies have found that these two classes of stratification standards are not isolated, learned and the in ICE-GB. The sub branches of the classification follow the Social Sciences, natural sciences, which confirms that the corpus adopts two types of hierarchical models simultaneously.
The above two types of stratified styles are a more common corpus arrangement strategy. In this study, the structure of the self built English specialized knowledge corpus is taken as an example to explore its structure. First, it is based on the practice log data of English majors to analyze the profession and English use of the students. The need for English majors related knowledge. The study adopted an internship log of 2006 102 graduates. After statistics, 34 students did not engage in English related professions. According to the focus of each student's internship log, the remaining students' practice content mainly involved foreign trade English, English teaching, English translation, secretarial English, Mechanical English and other industries. According to the actual number of people in each industry, calculate the proportion of the corresponding, so as to draw the proportion of each level. Drawing on the subject stratification model, combined with industry statistics, the article gives a preliminary reference style of foreign trade, machinery, computer and teaching. Under each stratification, the example of foreign trade English is used in this article. Based on the results of corpus structure analysis, we attempt to explore how to divide and collect corpus.
In view of the structure analysis of the main western corpus, this article discusses the structure division of the corpus with an example. However, because of the time and the limited energy, there are still many problems to be perfected. Only 102 students' log can not effectively represent the domain of all English major related knowledge. For example, all the students may not be engaged in the law. The relevant English work, however, does not indicate that English major related knowledge is not included in legal English. Therefore, later research is still expected to be done. However, this article is intended to develop a new idea to provide a constructive reference for the self built corpus, especially the structure of a corpus. We hope that this article will benefit the corpus construction theory.

【學位授予單位】:江南大學
【學位級別】:碩士
【學位授予年份】:2012
【分類號】:H08

【參考文獻】

相關期刊論文 前10條

1 衛(wèi)乃興;李文中;濮建忠;;COLSEC語料庫的設計原則與標注方法[J];當代語言學;2007年03期

2 顧曰國;語料庫與語言研究——兼編者的話[J];當代語言學;1998年01期

3 丁信善;語料庫語言學的發(fā)展及研究現(xiàn)狀[J];當代語言學;1998年01期

4 王海華;高洋;尚曉華;;語料庫語言學發(fā)展回顧及展望[J];大連海事大學學報(社會科學版);2009年03期

5 何安平;;口語語料庫、平行語料庫、學習者語料庫——第23屆國際語料庫語言學年會ICAME2002綜述[J];國外外語教學;2003年01期

6 陳建生;語料庫語言學與英語教學[J];解放軍外國語學院學報;2004年01期

7 謝家成;小型英漢平行語料庫的建立與運用[J];解放軍外國語學院學報;2004年03期

8 蔣林;金兵;;語料庫翻譯研究的代表性問題[J];中國科技翻譯;2007年01期

9 謝徐萍;口語與書面語的關系探討及其對英語教學的啟示[J];南通大學學報(教育科學版);2005年02期

10 李德俊;;語料庫的“代表性”問題及其對英漢翻譯語料庫建設的啟示[J];外語研究;2007年05期

,

本文編號:1852719

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/wenyilunwen/hanyulw/1852719.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權申明:資料由用戶6e825***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com