熱點事件新聞語料庫的研制及詞匯研究
本文關鍵詞: 熱點事件 新聞語域 語料庫 詞頻統(tǒng)計 引發(fā)—持續(xù)模式 出處:《南京師范大學》2012年碩士論文 論文類型:學位論文
【摘要】:國內對新聞語言的研究取得一定成果,新聞語言研究的論著相繼發(fā)表、出版,但研究的出發(fā)點通常是寫作和修辭,討論語言如何去適應新聞寫作的要求,如何增強新聞語言的表達效果,而基于社會熱點事件語料庫的新聞語言研究則很少。 本研究從語言學的角度、運用語言學理論研究新聞語言。首先,對現(xiàn)代漢語語域信息庫進行回溯。已建成的日常、法律、商務、體育語域信息庫為語言分語域研究提供了第一手資料,基于語料庫的分語域語言研究取得了一定成果,本研究為信息庫中的新聞語域部分。其次,建立“社會熱點事件新聞語料庫”。本研究收錄《揚子晚報》2009年全年社會熱點事件,根據(jù)篩選標準,最終篩選出48.9萬字的熱點事件。其中70%是PDF形式,需要利用OCR軟件將其轉換成word形式,轉換過程中進行校對,以保證語料的正確性。為方便以后的查找、校對,對語料庫中的語料還要進行分類及編碼。本新聞語域語料庫包含33件熱點事件,庫中共365個文件,每條新聞都有一個編碼,并附有新聞標題,報道的時間、記者、版面及字數(shù)統(tǒng)計。在確定語料屬性及語料庫研制原則的前提下,按照語料庫的研制步驟,對語料庫進行深度加工。本研究采取機器自動分詞及詞性標注方式,再輔以人工校對。對分詞及詞性標注過程中出現(xiàn)的問題再進行討論,使其適合新聞語域的語言特點,為基于語料庫的新聞語言研究打下基礎,最終建成賦碼語料庫。最后利用“社會熱點事件新聞語料庫”。對語料庫中的詞匯進行詞頻統(tǒng)計制成《熱點事件新聞詞匯頻度表》,并編制《熱點事件新聞基本詞匯表》。將熱點事件新聞詞表(選取高頻詞、次高頻詞及部分中頻詞)與通用詞表比較,經(jīng)過篩選得到特殊詞匯216個,參考語義及語料分布對特殊詞匯進行分類。全部詞匯都要回歸到語料庫中進行檢索,根據(jù)熱點事件發(fā)生特點分為“表示時間”、“事件描述”、“網(wǎng)絡推動”、“媒體介入”、“司法介入”、“事件影響”六大類。特殊詞匯的分類并不是主觀斷定,而是基于語料庫,該詞語在語料庫中的分布決定其所屬類別,在分類基礎上進而梳理熱點事件的引發(fā)—持續(xù)模式。 本研究堅持定量研究和定性研究相結合的方法,建成的“社會熱點事件新聞語料庫”,及提取的《熱點事件新聞基本詞匯表》,為新聞教學、新聞辭典的編撰及新聞語言學的發(fā)展提供參考。梳理的熱點事件報道模式對新聞采編及報道有一定的啟示意義。
[Abstract]:Some achievements have been made in the study of the language of news in China, and the works on the study of news language have been published and published one after another. However, the starting point of the research is usually writing and rhetoric, discussing how language can adapt to the requirements of news writing. How to enhance the expression effect of news language, but the research of news language based on social hot event corpus is rare. From the linguistic point of view, this study uses linguistic theory to study news language. First of all, it traces the modern Chinese register information database. The sports register information database provides the first-hand information for the research of the language register. The research on the register language based on the corpus has made some achievements, and this research is the news register part of the information database. Secondly, In this study, the Yangzi Evening News was collected for the whole year of 2009. According to the screening criteria, 489,000 words of hot events were screened out. 70% of them are in the form of PDF. It is necessary to use OCR software to convert it into word form and proofread it in the process of conversion to ensure the correctness of the corpus. This news register corpus contains 33 hot events, 365 documents, each piece of news has a coding, and with the news title, the time of the report, the reporter, Layout and word count. On the premise of determining the data attributes and the principles of corpus development, the corpus is further processed according to the development steps of the corpus. In this study, automatic word segmentation and part of speech tagging are adopted. The problems in the process of word segmentation and part of speech tagging are discussed again to make them suitable for the language characteristics of news register and lay the foundation for the research of news language based on corpus. Finally, the code-assigned corpus was built. Finally, by using the "Social Hot event News Corpus", the "Hot event News Vocabulary Frequency Table" was obtained by the word frequency statistics of the vocabulary in the corpus, and the "Hot event News basic Vocabulary" was compiled. Table >. Select the hot event news word list (select high-frequency words, Compared with the general vocabulary, 216 special words were selected and classified by reference to semantic and corpus distribution. All the words were returned to the corpus for retrieval. According to the characteristics of hot events, they are divided into six categories: "express time", "event description", "network push", "media intervention", "judicial intervention" and "event influence". The classification of special words is not subjective determination, but based on corpus. The distribution of the word in the corpus determines its category, and then combs the initiation-persistence pattern of hot events on the basis of classification. This study adheres to the method of combining quantitative and qualitative research, the "social hot event news corpus" and the "basic glossary of hot event journalism", which are news teaching. The compilation of news dictionaries and the development of journalistic linguistics provide references for the compilation of news dictionaries and the development of journalistic linguistics.
【學位授予單位】:南京師范大學
【學位級別】:碩士
【學位授予年份】:2012
【分類號】:H136
【參考文獻】
相關期刊論文 前9條
1 李葆嘉;論言語的語層性、語域性和語體性[J];語文研究;2003年01期
2 俞士汶,段慧明,朱學鋒,孫斌;北京大學現(xiàn)代漢語語料庫基本加工規(guī)范(續(xù))[J];中文信息學報;2002年06期
3 武文杰;徐艷;;試論網(wǎng)絡語言的發(fā)展前景[J];商場現(xiàn)代化;2006年36期
4 陳建華;網(wǎng)絡語言的發(fā)展及其規(guī)范[J];福州大學學報(哲學社會科學版);2004年01期
5 蘇新春;漢語詞匯定量研究的運用及其特點——兼談《語言學方法論》的定量研究觀[J];廈門大學學報(哲學社會科學版);2001年04期
6 李葆嘉;論語言科學與語言技術的新思維[J];南京師范大學文學院學報;2002年01期
7 俞士汶,朱學鋒,段慧明;大規(guī),F(xiàn)代漢語標注語料庫的加工規(guī)范[J];中文信息學報;2000年06期
8 許家金;語料庫語言學的理論解析[J];外語教學;2003年06期
9 崔剛,盛永梅;語料庫中語料的標注[J];清華大學學報(哲學社會科學版);2000年01期
相關碩士學位論文 前3條
1 伍欣;近十年來報刊用語特點研究[D];四川師范大學;2006年
2 張會鵬;中文詞法分析技術的研究與實現(xiàn)[D];哈爾濱工業(yè)大學;2006年
3 封鵬程;現(xiàn)代漢語法律語料庫的建立及其詞匯計量研究[D];南京師范大學;2005年
,本文編號:1527274
本文鏈接:http://sikaile.net/wenyilunwen/hanyulw/1527274.html