天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

評分過程與評分員信念—評分員差異的內(nèi)在因素研究

發(fā)布時間:2018-05-31 00:30

  本文選題:評分員信念 + 評分過程 ; 參考:《廣東外語外貿(mào)大學》2009年博士論文


【摘要】: 主觀性考試中,評分員差異是影響考試信效度和公平性的最為重要的因素之一。和大多數(shù)利用統(tǒng)計方法描述評分員誤差的研究不同,本研究從評分員自身入手,深入探討他們在評分中產(chǎn)生差異的內(nèi)在原因,并期望通過對較好和較差評分員的對比研究,找到評分員能夠準確一致地進行評分的內(nèi)在決定因素,以期對改進評分員培訓和評分流程以及提高考試信效度提供實證證據(jù)及有效反饋。本研究的背景是全國大學英語四級考試的寫作評分,所有受試均為參加過四級考試正式評分環(huán)節(jié)的評分員,評分標準和作文題目均來自真實的四級考試。實證研究包括三個數(shù)據(jù)收集環(huán)節(jié):獨立評分,有聲思維和開放式半結(jié)構(gòu)式訪談。在利用多層面Rasch模型對評分員的評分情況進行統(tǒng)計分析的基礎上,作者根據(jù)受試的評分與專家評分的吻合程度將評分員分為較好和較差兩組。利用受試在有聲思維時產(chǎn)出的口頭報告以及一對一的訪談記錄,,作者對比了兩組評分員在其評分思維過程以及評分信念上的異同。 分析的結(jié)果揭示了較好和較差兩組評分員在很多方面都存在差異。首先,在評分過程中,不同的評分員傾向于關(guān)注不同的文章特征。好評分員關(guān)注的語言特點更為全面,包括文章內(nèi)容,整體結(jié)構(gòu)安排,語篇特征,句子結(jié)構(gòu),詞匯等;而較差評分員更多地關(guān)注一些孤立的、零散的語言特征,比如詞匯的多樣性,句子的長短和復雜程度,連接詞的使用等。其次,兩組評分員對所關(guān)注的信息有不同的處理方式。好評分員更善于將語言錯誤分類,總結(jié)信息,進行推斷,并且能更加有效地自己的評分過程和評分準確性進行自我監(jiān)控。此外,不同評分員的評分信念也不同。最主要的區(qū)別是他們對于評分對象和評分標準的認識和理解。好評分員與較差評分員相比,對寫作能力的定義更加清楚、全面。相應地,他們對文章中反映寫作能力的語言特征的定義也更為全面,系統(tǒng)化,并有系統(tǒng)、一致的標準來區(qū)分這些特征的權(quán)重。好評分員對評分標準中抽象描述語的理解和操作化定義包括了更為全面的語言特征。研究結(jié)果還表明好評分員之間的評分信念更為一致,與專家的期望和考試大綱中的構(gòu)念定義也更為接近。 通過比較,作者嘗試將評分員的評分結(jié)果與他們內(nèi)在的思維過程與信念聯(lián)系起來,并發(fā)現(xiàn)評分員的內(nèi)在差異,尤其是他們在信念上的差異,是他們評分行為上差異的根源。這對于評分員培訓的啟示是:培訓的目的和重點在于統(tǒng)一評分員對于評分對象和評分工具以及對與自身責任與任務等方面的理解和認識,只有在內(nèi)在信念上達成一致,形成較為統(tǒng)一的認識,評分員的評分才能準確反映考試開發(fā)者和管理者的意圖,體現(xiàn)考試所要測量的潛在能力,在某種意義上形成一個評價共同體。
[Abstract]:One of the most important factors affecting reliability, validity and fairness in subjective tests is the difference of raters. Different from most studies that describe the error of graders by statistical methods, this study starts with the raters themselves, and probes into the internal causes of their differences in scoring, and looks forward to a comparative study of better and worse graders. To find out the intrinsic determinants of grading, to provide empirical evidence and effective feedback to improve the training and scoring process of the graders and to improve the reliability and validity of the test. The background of this study is the writing score of CET-4. All the subjects are all graders who have taken part in the formal grading process of CET-4. The scoring criteria and composition questions are all from the real CET-4 test. The empirical study consists of three data collection sections: independent score, sound thinking and open semi-structured interviews. Based on the statistical analysis of the grader's score by using the multi-level Rasch model, the author divides the grader into better and worse groups according to the degree of agreement between the score and the expert score. Using oral reports and one-to-one interview records, the authors compared the differences and similarities between the two groups in the process of scoring thinking and scoring beliefs. The results of the analysis revealed that there were differences between the better and the worse groups of graders in many ways. First, different raters tend to focus on different characteristics of the article during the scoring process. The good graders pay more attention to the language characteristics, including the content of the article, the overall structure arrangement, the text features, sentence structure, vocabulary and so on, while the poor graders pay more attention to some isolated and scattered language features. For example, the variety of words, the length and complexity of sentences, the use of conjunction words and so on. Second, the two groups of raters had different approaches to the information they were concerned with. Good graders are better at classifying language errors, summarizing information, inferring, and more effectively monitoring their own grading process and scoring accuracy. In addition, different raters have different scoring beliefs. The main difference is their knowledge and understanding of rating objects and criteria. Good graders have a clearer and more comprehensive definition of writing ability than poor graders. Accordingly, their definitions of linguistic features that reflect writing competence are more comprehensive, systematic, and systematic, with consistent criteria to distinguish the weight of these features. The understanding and operational definition of abstract descriptors in the scoring criteria by good graders includes more comprehensive language features. The results also show that the scoring beliefs of the good graders are more consistent with the expectations of experts and the definition of constitution in the exam syllabus. Through comparison, the author tries to link the score result of the grader with their inner thinking process and belief, and finds out that the internal difference of the grader, especially the difference in their belief, is the root of the difference in their scoring behavior. The inspiration for the training of raters is that the purpose and emphasis of the training is to unify their understanding and understanding of the scoring objects and scoring tools, as well as their own responsibilities and tasks, and only to reach agreement on their internal beliefs. In order to form a unified understanding, the scoring system can accurately reflect the intention of the test developer and administrator, reflect the potential ability of the test to be measured, and form an evaluation community in a certain sense.
【學位授予單位】:廣東外語外貿(mào)大學
【學位級別】:博士
【學位授予年份】:2009
【分類號】:G424.74

【引證文獻】

相關(guān)期刊論文 前3條

1 徐鷹;;大學英語寫作能力構(gòu)念的操作定義研究[J];考試與評價(大學英語教研版);2012年06期

2 李航;;基于概化理論和多層面Rasch模型的CET-6作文評分信度研究[J];外語與外語教學;2011年05期

3 徐鷹;;不同性別評分人差異的實證研究[J];外語測試與教學;2013年03期

相關(guān)博士學位論文 前1條

1 李航;評分員與評分量表間的交互作用對EFL作文評分結(jié)果與過程的影響[D];浙江大學;2012年



本文編號:1957512

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/jiaoyulunwen/jsxd/1957512.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶17836***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com