[ p. 15 ]
Differential item functioning is a necessary condition but not sufficient condition for bias because a test item that functions differently for two groups might do so because it advantages one group in a construct-irrelevant way, but there might also be a legitimate reason for differential functioning. (p. 83)A wide range of methods for detecting DIF are then contrasted. The strengths and weaknesses of various parametric, non-parametric, and IRT approaches are duly weighed. Special focus is given to considering the merits and limitations of generalizability theory and FACETS-based multifaceted analyses in detecting DIF. The authors recommend multifaceted analysis for fine-tuned investigations, but generalizability studies when overall impressions of test dependability are sought. Noting how different DIF detection procedures tend to yield different results, the mantra of caution is again stressed when interpreting test results. To some extent, we also need to question the agenda of the researchers: few researchers are immune from political, economic, or social pressures to present themes that they research in a certain light (Marco & Larkin, 2000).
|"Though much of the socially-oriented research studies cited in this text are still underdeveloped, the authors do succeed in offering persuasive reasons why the somewhat narrow field of language testing needs to expand its scope."|
In a weak profession, like language testing, no professional association regulates the right to practice: membership in a professional organization is voluntary, it is not a precondition for practice, and, consequently, there are no serious sanctions against members who violate codes of ethics. The association might exclude them, but they cannot be stopped from continuing to practice, ethically or unethically. (p. 139)Observing how language tests often serve as identity markers, the authors then show how language test performance is often used to indicate membership in a specific group. Citing examples of test use in intercultural conflict, McNamara and Roever concur with Foucalt (1975/1997) in suggesting that language tests are often forms of surveillance and – in some cases – persecution. Ways in which language tests designed to ascertain the identity of persons are fraught with reliability and validity problems (not to mention moral quandaries) are underscored. Political uses and misuses of tests illustrate how frequently language testing is a form of social engineering. Indeed, the formal and seemingly scientific nature of most language tests serves as a mask for their preeminently social agenda. Stressing the need for more critical awareness about test use, the authors state:
. . . we cannot afford to be merely naive players in the discursively constructed world in which language tests are located. Appropriate intellectual and analytical tools enable us to recognize the roles that tests will play in the operation of power and systems of social control. We will be less inclined to seek shelter in the impersonality and purely technical aspects of our work. We need critical self-awareness in order for us to first recognize and then to decide whether or not to accept or to resist our own subject position in the system of social control in which tests play such a part. (p. 198)
[ p. 16 ]Language teachers will be particularly interested in Chapter 7 of this work since it considers how tests are used and abused in schools. Ways that political mandates often shape testing in various countries are described. In many cases, politicians create some sort of standard that is not based on social research but rather on a political agenda by which students, teachers, and entire school systems are judged. The impact of such standards is often far-reaching and unforeseen. For example, some schools in the USA now encourage low-performing students to drop out to avoid lowering the overall school scores (Balfanz & Legters, 2004). The authors remind us that political motives are generally more influential in shaping test constructs than any formal academic research.
the underlying construct [of current high school EFL tests in Japan] is not communicative proficiency in English . . . but, rather, diligence and hard work - attributes highly valued in Japanese society . . . the actual content of the test and its validity in terms of conformity to the curriculum guidelines . . . are not the central issue; what matters is that the test be difficult and play the role of selecting the character attributes of diligence and effort. (p. 208)The authors go on to assert that entrance exams in Japan are essentially a measure of character and/or intelligence rather than communicative ability. Though lip service is given to the need to improve the oral EFL proficiency among Japanese, most Japanese administrators believe tests should be a "properly noncommunicative" (p. 208, 209) means of demonstrating the ability to memorize complex rules and exhibit evidence of academic skills. In other words, testing is a sort of ritualized performance that has little to do with authentic communication, a point McVeigh (2002) discusses at length.
the relatively narrow intellectual climate of language testing will need to be broadened, with openness to input from such diverse fields as sociology, policy analysis, philosophy, cultural theory, social theory, and the like, in addition to the traditional source fields. (p. 254)Specifically, McNamara and Roever call for more investigations of test bias and the learning potential of individuals in unfamiliar environments. They also feel more research into alternatives to existing native speaker norms is warranted, as well as further studies in discourse analysis.
– Reviewed by Tim Newfields