基于IRT的錨題設計下同時估計和分離估計的比較
發(fā)布時間:2018-11-11 14:41
【摘要】:測驗等值是通過對考核同一種心理品質(zhì)的多個測驗形式作出測量分數(shù)系統(tǒng)的轉(zhuǎn)換,進而使得這些不同測驗形式的測驗分數(shù)之間具有可比性。當代許多大規(guī)模的考試都是采用同一測驗的不同形式,那么我們?nèi)绾螌κ褂貌煌嚲淼目忌姆謹?shù)進行比較,以及在教育測量上,隨著學生年級的上升,如何刻畫出學生本身縱向的能力發(fā)展狀況和趨勢,如何比較不同年齡段或者年級段或者年度之間的成績等問題的解決都依賴于等值技術(shù),那么就必然涉及到該選擇哪種等值方式,因此等值方式之間的比較顯得尤為重要。 在項目反應理論框架下,等值有兩種方式,一種稱為分離估計(separatecalibration),即,首先分別估計各個測驗的項目參數(shù),由于在IRT模型中,量尺的位置的不確定性,不同測驗上項目參數(shù)之間是線性相關(guān)的,那么就要進行線性轉(zhuǎn)換以至于所有的項目參數(shù)都在相同的尺度上,主要轉(zhuǎn)換方式有:均值均值法,均值標準差法,Haebara法,Stocking-Lord法。另一種稱為同時估計,即項目參數(shù)一次性用軟件估計出來的方法。 在以往的研究中,關(guān)于同時估計與分離估計的研究有如下的不足和問題:1,關(guān)于同時估計與分離估計比較的指標不統(tǒng)一,指標的不同很可能會影響研究的結(jié)果。2,研究中僅僅從誤差值的大小來判斷孰優(yōu)孰劣,這并不能從統(tǒng)計上說明哪種方法的顯著優(yōu)勢。 針對以上不足,在本研究中,1,采用絕對偏差和偏差來分別考察研究中的隨機誤差和系統(tǒng)誤差,以保證結(jié)果的精確性。2,,使用統(tǒng)計檢驗的辦法來考察同時估計與分離估計在統(tǒng)計上是否有顯著性差異。根據(jù)本研究的結(jié)果,我們得出,1,在a參數(shù)上同時估計的等值效果要顯著優(yōu)于分離估計的等值效果,在b參數(shù)上,從偏差的角度看,HA法的等值效果要顯著優(yōu)于其他4種,即分離估計顯著優(yōu)于同時估計,從絕對偏差的角度看,同時估計、SL和HA法的等值效果并無顯著性差異。2,樣本量越大,等值的效果越好。
[Abstract]:Test equivalence is a system transformation of measuring scores in multiple test forms that assess the same psychological quality, thus making the scores of these different test forms comparable. Many of today's large-scale tests take different forms of the same test, so how do we compare the scores of candidates who use different papers, and in terms of educational measurement, as students' grades rise, How to depict the development and trend of students' own vertical ability, how to compare the achievement of different ages, grades or years depends on the equivalent technology. So it is necessary to choose which way of equivalence should be chosen, so the comparison between the modes of equivalence is particularly important. In the framework of item response theory, there are two ways of equivalence. One is called separate estimation of (separatecalibration), that is, the item parameters of each test are estimated separately at first, because of the uncertainty of the position of the ruler in the IRT model. There is a linear correlation between item parameters in different tests, so that all item parameters are in the same scale. The main conversion methods are mean method, mean standard deviation method, Haebara method, Stocking-Lord method. Another method is called simultaneous estimation, that is, the project parameters are estimated by software at one time. In previous studies, the research on simultaneous estimation and separation estimation has the following shortcomings and problems: 1. With regard to the disunity of the indicators of simultaneous estimation and separation estimation, the difference of indicators may well affect the results of the study. In the study, only the magnitude of the error value is used to judge which method is the best, which does not show the significant advantage of the method. In order to ensure the accuracy of the results, the absolute deviation and the deviation are used to investigate the random error and the systematic error respectively. The statistical test is used to investigate whether there is significant statistical difference between simultaneous estimation and separation estimation. According to the results of this study, we conclude that: 1, the equivalent effect of simultaneous estimation on a parameter is significantly better than that of separation estimation, and in terms of b parameter, the equivalent effect of HA method is significantly better than that of the other four methods. That is, the separation estimation is significantly better than the simultaneous estimation, and from the angle of absolute deviation, the equivalent effect of SL and HA method is not significantly different. 2. The larger the sample size, the better the equivalent effect.
【學位授予單位】:江西師范大學
【學位級別】:碩士
【學位授予年份】:2014
【分類號】:B841
本文編號:2325151
[Abstract]:Test equivalence is a system transformation of measuring scores in multiple test forms that assess the same psychological quality, thus making the scores of these different test forms comparable. Many of today's large-scale tests take different forms of the same test, so how do we compare the scores of candidates who use different papers, and in terms of educational measurement, as students' grades rise, How to depict the development and trend of students' own vertical ability, how to compare the achievement of different ages, grades or years depends on the equivalent technology. So it is necessary to choose which way of equivalence should be chosen, so the comparison between the modes of equivalence is particularly important. In the framework of item response theory, there are two ways of equivalence. One is called separate estimation of (separatecalibration), that is, the item parameters of each test are estimated separately at first, because of the uncertainty of the position of the ruler in the IRT model. There is a linear correlation between item parameters in different tests, so that all item parameters are in the same scale. The main conversion methods are mean method, mean standard deviation method, Haebara method, Stocking-Lord method. Another method is called simultaneous estimation, that is, the project parameters are estimated by software at one time. In previous studies, the research on simultaneous estimation and separation estimation has the following shortcomings and problems: 1. With regard to the disunity of the indicators of simultaneous estimation and separation estimation, the difference of indicators may well affect the results of the study. In the study, only the magnitude of the error value is used to judge which method is the best, which does not show the significant advantage of the method. In order to ensure the accuracy of the results, the absolute deviation and the deviation are used to investigate the random error and the systematic error respectively. The statistical test is used to investigate whether there is significant statistical difference between simultaneous estimation and separation estimation. According to the results of this study, we conclude that: 1, the equivalent effect of simultaneous estimation on a parameter is significantly better than that of separation estimation, and in terms of b parameter, the equivalent effect of HA method is significantly better than that of the other four methods. That is, the separation estimation is significantly better than the simultaneous estimation, and from the angle of absolute deviation, the equivalent effect of SL and HA method is not significantly different. 2. The larger the sample size, the better the equivalent effect.
【學位授予單位】:江西師范大學
【學位級別】:碩士
【學位授予年份】:2014
【分類號】:B841
【參考文獻】
相關(guān)期刊論文 前5條
1 焦麗亞;;基于IRT的共同題非等組設計中五種項目參數(shù)等值方法的比較研究[J];考試研究;2009年02期
2 馬洪超;;考生樣本量對項目反應理論(IRT)等值穩(wěn)定性的影響[J];考試研究;2011年02期
3 焦麗亞;;測驗等值研究綜述[J];中國考試(研究版);2009年06期
4 馬洪超;;錨題參數(shù)特征對IRT真分數(shù)等值的影響[J];中國考試;2010年08期
5 王菲;任杰;張泉慧;曹文靜;;等級記分模型下幾種等值方法的比較研究[J];中國考試;2013年06期
本文編號:2325151
本文鏈接:http://sikaile.net/shekelunwen/xinlixingwei/2325151.html
教材專著