學刊論文
判斷試題是否具有良好鑑別力的客觀方法

中華心理學刊 民 91,44 卷,2 期,253-262
Chinese Journal of Psychology 2002, Vol.44, No.2, 253-262


王文中(國立中正大學心理學系);洪來發(國立中正大學心理學系)

 

摘要

題目難度和鑑別力的分析是最主要的兩種題目分析。傳統上,常用鑑別力指數、點二系列或二系列相關係數來表示題目的鑑別力。由於這些指數並沒有客觀的統計程序來判定題目是否具有良好的鑑別力,實用上通常以主觀的標準。例如鑑別力指數或點二系列相關係數大於0.3,就表示該題其有良好鑑別力。反之,則無。本研究試圖提出客觀的統計程序來檢定試題是否具有良好的鑑別力。首先定義具有良好鑑別力的題目應該要能同樣有效的區辨目標母體的所有分數點。接著利用題目答對率與測驗總分的線性關係,說明良好鑑別力的具體意義,並據以推導出在logistic迴歸模式中,迴歸係數各應為多少才能呼應良好鑑別力的要求。我們並說明如何求得題目的logistic參數估計值,以及如何檢定題目是否具有良好的鑑別力。透過電腦模擬分析,發現線性模式和logistic模式可以有不錯的配適,尤其是當測驗總分呈現常態或卡方分佈,且樣本數小於5000時。我們以大學聯考英文科的50題選擇題和500位考生的資料,進行實例分析。結果發現本研究提出的方法和古典試題分析的結果大致相同,不過本研究的方法具有客觀的統計意義,而古典的方法沒有。

關鍵詞:試題鑑別力、logistic迴歸模式、概似比檢定、Pearson殘差卡方檢定


AN OBJECTIVE PROCEDURE FOR DETERMINING IF ITEMS HAVE GOOD DISCRIMINATION

Wen-Chung Wang(Department of Psychology, National Chung Cheng University);Lai-Fa Hung(Department of Psychology, National Chung Cheng University)

 

Abstract

Item difficulty and discrimination analyses are the two most important item analyses. Several indices such as the index of discrimination, the point-biserial correlation, or the biserial correlation, and Fleiss' s odds ratio have been proposed to depict item discrimination power. Although we could test if these conventional discrimination indices are significantly different from zero, there is no objective criterion to determine how large they should be for an item to have good discrimination. Practically, test analysts usually use 0.3 or 0.4 as a cut-point. If the index of discrimination or the point-biserial correlation exceeds the cut-point, the item is flagged as exhibiting good discrimination power. In addition to the drawback of no objective criterion available, these indices depend on sample characteristics and item difficulty, for example, they will yield higher values for item difficulty (i.e., passing rate) close to 0.5 than for it at the extremes of difficulty. Although the cut-point may be a useful guideline, a statistical procedure is preferred. This study attempts to establish an objective statistical procedure for determining if an item has good discrimination or not. To do so, good discrimination is first defined. An item is said to have good discrimination if it discriminates every score point equally well for the target population. We find this definition appropriate because every score point is considered equally important. We use a linear relationship between the probability of passing an item and the test score to depict how the regression should look like when an item has good discrimination. For binary outcome variables, the logistic distribution can better depict the relationship between probabilities of passing an item and test scores. In order to hold the equal discrimination power assumption, we then derive a logistic regression curve that is closest to the ideal discrimination line. Once this 'theoretical' logistic regression curve is derived, the observed logistic regression curve, derived from test data, could be compared to the theoretical logistic regression curve. If the observed logistic regression curve is statistically different from the theoretical one, the item is said not to have good discrimination. A simulation study was conducted to compare the detection of item discrimination with the linear regression model and the logistic regression model when the underlying test score distributions follow the normal, uniform, or chi-square distribution, and the sample sizes are 40, 100, 500, 2000, or 5000. When the test scores follow the normal or chi-square distribution, the linear model and the logistic regression model yield almost identical results. Only when the sample sizes are extremely large, say up to 5000, would these two models yield different results. A real data set with 50 multiple-choice items and 500 examinees was analyzed to illustrate the similarity and difference between the proposed method of logistic regression model and the conventional item discrimination indices. Five items were arbitrarily chosen and analyzed. The item difficulties (percentage of correct responses) of these five items are between 0.50 and 0.81. Only one item is flagged as not exhibiting good discrimination with a p value of 0.000. Basically, the three conventional discrimination indices lead to almost identical results. This is expected because all of these procedures are invented to depict item discrimination, however, only the proposed objective procedure is statistically sound.

Keywords:Item discrimination, Logistic regression model, Likelihood ratio test, Pearson chi-squared test

登入
會員登入
更新驗證碼