Rasch Measurement in language education Part 5:
Assumptions and requirements of Rasch measurement
James Sick, Ed.D. (International Christian University, Tokyo)
|"[Unidimensionality, equal item discrimination, and low susceptibility to guessing] are not characteristics of a dataset that are assumed to be true . . . [they] are ideals that must be reasonably approximated . . . Real world data are not expected to match the [Rasch] model perfectly."|
[ p. 23 ]In Rasch terms, unidimensional measurement means simply that all of the non-random variance found in the data can be accounted for by a single dimension of difficulty and ability. Recall that the Rasch model predicts the likelihood of success at a task based on the gap between a person's ability and the task's difficulty. Improbable responses are predicted to occur, but infrequently and randomly. That is, we should not be able to predict unexpected responses from the responses to other items or by membership in a demographic group. If we can, we infer that there is another psychometric dimension that is influencing responses.
[ p. 24 ]To address the former point first, the Rasch model is an ideal. It is a standard intended to describe the response pattern that would be observed if all items were measuring the same construct, were independent of each other, and had no non-random measurement error. Real world data are not expected to match the model perfectly. A Rasch analysis seeks to determine whether the data approximate the model closely enough to be useful. The analysis produces various graphs and indices that allow us to quantify the degree of deviance from the model, identify sources of measurement disturbance and correct them, and then make informed decisions about whether the data are "good enough" to meet our purposes. The primary motivations of a Rasch analysis are evaluation, diagnosis, and fine-tuning. Generally, research and experience have shown that measures constructed from Rasch models are robust to minor deviations from the model's requirements (Henning, Hudson, & Turner, 1985; Smith, 1990).
[ p. 25 ]In Figure 2, however, each item has been rendered with an individualized slope, the procedure followed when using a 2-parameter IRT model. This allows the ICCs to cross each other at various points along the ability continuum. Now, the rather straightforward question of "which item is easiest" becomes ambiguous. For persons with abilities in the region of minus one logit, Item 1 is the easiest, with a probability of success of about .30, followed by Item 2 and then Item 3. At zero logits of ability, however, Item 3 is the easiest, followed by Item 1 and then Item 2. Finally, for a person with an ability of one logit, the order of difficulty is reversed: Item 3 is easiest, followed by Item 2, followed by Item 1. This ambiguous ordering of item difficulty destroys the Rasch concept of construct validity, which relies on the implicative hierarchy of task difficulty to define the latent variable. For a detailed discussion of the implications of allowing crossed ICCs on construct validity, including an intriguing example, see Wright (1992, 1999).
|"Items that predict the total score more than other items are likely to be redundant, or in some way dependent on other items. "|
[ p. 26 ]
|". . . most examinees do not engage in random guessing. Guessing behavior appears to be an individual attribute, related to risk-taking, cultural background, and test-wiseness . . ."|
[ p. 27 ]
|"When a test or questionnaire has been carefully designed, data deletion amounts to fine tuning: a few items or persons that did not function as expected are removed in order to make the constructed measures more efficient, reliable, and inferentially valid."|
[ p. 28 ]McNamara, T. F. (1996). Measuring second language performance. New York: Longman.
|Rasch Measurement in Language Education Series:|
|Article 1:||Article 2:||Article 3:||Article 4:||Article 5:||Article 6:|
[ p. 29 ]