Possible answers for the nine questions about testing/assessment
which were in the March 2011 issue of this newsletter appear below.
Because observed test scores are an imperfect measure of ability, high scores are typically an overestimate of ability and low scores are typically an underestimate – causing high and low scores to regress to the mean in subsequent tests. (p. 395)If we understand regression toward the mean, it should be easy to guess what is meant by a regression fallacy. Ascribing random tugs towards the norm to non-random causes is known as a regression fallacy. In testing contexts, this occurs when test score gains or score drops are mistakenly attributed to some external factor such as "improved ability".
[ p. 21 ]How can regression fallacies be avoided? Poulton (1994, p. 128, 134) underscores the need for researchers to be educated more about regression in general. It should be emphasized that regression itself is not problematic - incorrectly ascribing data shifts to non-random variables is the problem. Still, it should go without saying that as the gap between a person's true score and observed score widens, so does the regression toward the mean. Obviously, regression effects can be attenuated if a test's observed score approximates its true score. This occurs when the measurement error of a test is minimal. In other words, if a test measures what it purports to for the sample it was designed for, regression effects will be attenuated – but practically speaking, regression artifacts are a feature of all experiments.
Dallal, G. E. (2000). The regression effect - The regression fallacy. Retrieved March 11, 2011 from http://www.jerrydallal.com/LHSP/regeff.htm
Lohman, D. F. & Korb, K. A. (2006). Gifted today but not tomorrow? Longitudinal changes in ability and achievement during elementary school. Journal for the Education of the Gifted, 29(4) 451-484. doi: 10.4219/jeg-2006-245
Ostermann, T., Willich, S. N., & Lüdtke, R. (2008). Regression toward the mean - A detection method for unknown population mean based on Mee and Chua's algorithm. BMC Medical Research Methodology, 8(52) n.p. doi:10.1186/1471-2288-8-52
Poulton, E. C. (1994). Behavioral decision theory: A new approach. Cambridge, UK & New York, NY: Cambridge University Press.
Smith, G. & Smith, J. (2005). Regression to the mean in average test scores. Educational Assessment, 10(4) 377-399. doi: 10.1186/1471-2288-8-52
Trochim, W. (2006). Research methods knowledge base: Regression toward the mean. Retrieved March 10, 2011 from http://www.socialresearchmethods.net/kb/regrmean.php
[ p. 22 ]This concept of "test maintenance" has also been described in terms of a test development cycle by Breen (1989) and subsequently by Wigglesworth and Elder (1996) as well as Weir and Milanovic (2003). Although the process is cyclic, a conceptual final step consists of "evaluation and revision" and much of those processes overlap with procedures described in the previously mentioned "test maintenance" phase of a systems development life cycle.
Aline, D. & Churchill, E. (2006). Analyzing entrance exam item types with Rasch. Kanagawa Daigaku Gengo Kenkyuu, 28, 125-142. Retrieved on March 8, 2011 from http://hdl.handle.net/10487/3846
Breen, M. (1989). The evaluation cycle for language learning tasks. In R. K. Johnson (Ed.), The second language curriculum (pp. 187-206). Cambridge: Cambridge University Press.
Ito, A. (2005) Validation study on the English language test in a Japanese nationwide university entrance examination. Asian EFL Journal, 7(2) 6. Retrieved on March 8, 2011 from http://www.asian-efl-journal.com/June_05_ai.php
Japan Language Testing Association. (2007). JLTA code of good testing practice. Retrieved on March 8, 2011 from http://www.avis.ne.jp/~youichi/COP.html
Weir, C. J. & Milanovic, M. (Eds.) (2003). Continuity and innovation: The history of the Cambridge Proficiency Exam 1913-2002. Studies in Language Testing 15. Cambridge: Cambridge University Press/UCLES.
Wigglesworth, G. & Elder, C. (Eds). (1996). The language testing cycle: From inception to washback. Canberra, Australia: Australian National University.
[ p. 23 ]3 Q: What is questionnaire acquiescence? Why should it be of concern to survey designers? How can it be reduced?
Ray, J. J. (1990). Acquiescence and problems with forced-choice scales. Journal of Social Psychology, 130(3), 397-399. Retrieved on March 4, 2011 from http://jonjayray.tripod.com/forcho.html
O'Muircheartaigh, C., Krosnick, J.A., & Helic, A. (2000). Middle alternatives, acquiescence, and the quality of questionnaire data. The Harris School Working Papers Series, 1(3) n.p. Retrieved on March 13, 2011 from http://harrisschool.uchicago.edu/about/publications/working-papers/abstract.asp?paper_no=01.03+++
Saris, W. E., Krosnick, J. A., & Shaeffer, E. M. (2005). Comparing questions with agree/disagree response options to questions with construct-specific response options. Unpublished manuscript, Political, Social, Cultural Sciences, University of Amsterdam.
[ p. 24 ]A: An illegal value is a response that is outside of the range of valid options available. This term is most widely used in computer programming, but also is relevant to test analysis. In multiple-choice tests, the most common type of illegal value occurs when more than one response is selected under conditions when only one response is permitted. It is harder to judge illegal values with open response test items. However, if a test asks respondents to describe in detail how they would respond to a specific situation, and an examinee pumps out lots of fluff without indicating any clear response, that could be considered an illegal value.
[ p. 25 ]
NIST/SEMATECH e-Handbook of Statistical Methods. (n.d.). What are outliers in the data? Retrieved on March 15, 2011 from http://www.itl.nist.gov/div898/handbook/prc/section1/prc16.htm
Osborne, J. W. & Overbay, A. (2004). The power of outliers (and why researchers should always check for them). Practical Assessment, Research & Evaluation, 9(6). n.p. Retrieved March 14, 2011 from http://PAREonline.net/getvn.asp?v=9&n=6
Renze, J. (1999). MathWorld, A Wolfram Web Resource created by E. Weisstein: Outlier. Retrieved on March 10, 2011 from http://mathworld.wolfram.com/Outlier.html
Rousseeuw, P. J. & Leroy, A. M. (2003). Robust regression and outlier detection (Wiley Series in Probability and Statistics). New York: Wiley-Interscience.
Taleb, N. N. (2007). The black swan: The impact of the highly improbable. New York, NT: Random House.
Taleb, N. N. (2010). The black swan: The impact of the highly improbable (2nd Edition). New York, NT: Random House & Penguin.
[ p. 26 ]
Gabriel, T. (2010, December 27). Cheaters find an adversary in technology. New York Times: Online Edition. Retrieved on March 10, 2011 from http://www.nytimes.com/2010/12/28/education/28cheat.html
Univ. entrance exam cheats go online. (2001, March 12). Japan Times Weekly. Retrieved on March 10, 2011 from http://weekly.japantimes.co.jp/nn/univ-entrance-exam-cheats-go-online
High-tech cheating in entrance exams. (2011, March 2). Asahi Shimbun: English Web Edition. Retrieved on March 10, 2011 from http://www.asahi.com/english/TKY201103010401.html
Taroni, F., Bozza, S., Biedermann, A., Garbolino, P., Aitken, C. (2010). Data analysis in forensic science: A Bayesian decision perspective (Statistics in Practice). Chichester, West Sussex: John Wiley & Sons, Ltd.
[ p. 27 ]Option (D) describes priming questions. Priming is a phenomenon in which a previously mentioned survey or test item influences the response to a latter item. Priming is sometimes described as a question-order effect and according to Lasorsa (2003), it can be a source of significant context variance. For this reason surveys generally seek to reduce – or at least control for – priming effects. One strategy is to use several randomized alternative forms of the same survey. Another is to add "buffer questions" between core questions (Wänke & Schwarz, 1997). Yet another is to recognize question-order effects as inevitable and simply try to be consistent and explicit about the order.
Albrecht, S. A., Albrecht, C. C., Albrecht, C. O., & Zimberland, M. (2009). Fraud examination (3rd Edition). Mason, OH: South-Western Cengage Learning.
Henning, J. (2010). Sequential vs. grouped placement of filter questions. Retrieved March 15, 2011 from http://blog.vovici.com/blog/bid/28235/Sequential-vs-Grouped-Placement-of-Filter-Questions
Lasorsa, D. L. (2003). Question-order effects in surveys: The case of political interests, news attention, and knowledge. Journalism & Mass Communication Quarterly, 80(3) 499-512. Retrieved March 17, 2011 from http://www.aejmc.org/_scholarship/research_use/jmcq/03fall/lasorsa.pdf
Tsuda, S. (2003). Attitudes toward English language learning in higher education in Japan: Raising awareness of the notion of global English. Intercultural Communication Studies, 12(3) 61-75. Retrieved March 18, 20011 from http://www.uri.edu/iaics/content/2003v12n3/06%20Sanae%20Tsuda.pdf
Trochim, W. (2006). Research methods knowledge base: Types of questions. Retrieved March 15, 2011 from http://www.socialresearchmethods.net/kb/questype.php
Wänke, M. & Schwarz, N. (1997). Reducing question order effects: The operation of buffer items. In L.E. Lyberg, et al. (Eds.) Survey measurement and process quality (Wiley Series in Probability and Statistics). (pp. 115-139). New York: John Wiley & Sons, Inc.
Yu, S. (1999). The Pragmatic Development of Hedging in EFL Learners. Unpublished Ph.D. Thesis. City University of Hong Kong. Retrieved March 15, 2011 from http://lbms03.cityu.edu.hk/theses/ftt/phd-en-b23749398f.pdf
[ p. 28 ]Individual-environmental interactions are complex and controversies regarding the extent that behaviors should be ascribed to personality or to environmental conditions have been perennial. According to Ross (1977), the tendency of people to ascribe the behaviors of others to personality variables such as "character" rather than situational variables such as "interlocutor power gaps" is known as a fundamental attribution error. The opposite tendency, to ascribe behaviors to environmental factors rather than to individual personality traits represents a different type of cognitive error. Needless to see, different academic disciplines (and researchers) tend to focus on different parts of the individual-environmental spectrum.
Harper, M. (2009). Fundamental attribution error. Retrieved March 15, 2011 from http://www.knowledgerush.com/kr/encyclopedia/Fundamental_attribution_error/
Pettigrew, T.F. (1979). The ultimate attribution error: Extending Allport's cognitive analysis of prejudice. Personality and Social Psychology Bulletin, 5(4) 461-476. doi: 10.1177/014616727900500407
Ross, L. (1977). The intuitive psychologist and his shortcomings: Distortions in the attribution process. In L. (Ed.), Advances in experimental social psychology. (pp. 173-220). New York: Academic Press. doi: 10.1016/S0065-2601(08)60357-3
[ p. 29 ]A: The last two options are data processing errors that can be ascribed to machine error.
Groves, R. M. (2004). Survey errors and survey costs (New edition). New York: Wiley-Interscience.
Statistics Canada - Statistique Canada. (2010). Non-sampling error. Retrieved March 16, 2011 from http://www.statcan.gc.ca/edu/power-pouvoir/ch6/nse-endae/5214806-eng.htm
Analyse-it Software, Ltd. (2008). Testing the assumption of normality. Retrieved March 18, 2011 from http://www.analyse-it.com/blog/2008/8/testing-the-assumption-of-normality.aspx
Drexel University Math Forum. (2008). Testing a set of data for normal distribution. Retrieved March 18, 2011 from http://mathforum.org/library/drmath/view/72065.html
Laerd Statistics. (n.d.). Testing for Normality using SPSS. Retrieved March 18, 2011 from http://statistics.laerd.com/spss-tutorials/testing-for-normality-using-spss-statistics.php
Motulsky, H. (2009). Normality tests – use with caution. Retrieved March 18, 2011 from http://www.graphpad.com/library/BiostatsSpecial/article_197.htm
[ p. 30 ]