Student Evaluation of Teachers: Professional Practice or Punitive Policy? (continued)

A self-fulfilling prophecy

A concern to address in the use of student evaluations is the impact the act of evaluation has on the students' perceptions of the teachers and on the teachers themselves.
There are biases in evaluating a person's personality, performance and competence – biases that can lead to flawed information gathering strategies that are self fulfilling (Harris, 1994). A self-fulfilling prophecy as defined by Merton (1948) basically means that an incorrect perception, belief or definition of a set of circumstances can evoke behaviour that makes the incorrect perceptions or beliefs come true.
In the composition of the SETEs the administrators bring their own expectations about the teachers to the procedure. These expectations profoundly effect the way they design the SETEs and the information gathering strategies they use.
In clinical psychology in the study of interpersonal expectancy effects or behavioural confirmation, the problem of making incorrect diagnosis supported by presumptive questioning strategy is a serious ethical issue that remains a central focus. Observers, no matter how well trained and how ethical, will carry out their evaluations based on incorrect hypothesis.
Snyder and Swann (1978), in a classic study, gave subjects a list (personality profile) describing either an extroverted personality or an introverted personality and then asked them to choose 12 questions from a longer list that would best allow them to test the hypothesis for the profile they received for a target person. Analysis demonstrated a heavy emphasis on hypothesis-confirming strategies.
The process of question selection and the process applying those questions to the evaluation of a person's behaviour are difficult for well trained clinicians to perform objectively – the situation of untrained students and administrators and teachers is even more problematic.
When an administration or administrator has decided that teachers fit certain stereotypes or engage in certain types of behaviour – negative or constructive – the administrator will select hypothesis confirming questions for the students to answer.
For example, students are asked if the teacher is humourous, do they like the teacher, does the teacher stimulate or encourage them, is the teacher enthusiastic and dynamic – an entire battery of subjective parameters appear on SETEs that lead the students to believe that the teacher must conform to certain and possibly irrelevant behavioural parameters that actually have a different appeal to each individual student.

[ p. 16 ]

As a student answers objective and subjective questions – what will a student rely on – what they feel confident they can answer or what they are unsure about?
The nature of objective questions present certain problems. How can a student know whether a teacher is well prepared – how do they assess preparedness? How can a student evaluate a teacher's expertise in their field – if they know so much about the field why are they the student? Yet students will give answers to these types of questions which shows that even when they do not have a defensible point of view – they will give an opinion. This is not the way to solicit informed opinions.
    Additionally, it is not the students' opinions that have necessarily been solicited; they will be answering someone else's questions without having given the matter any thought until the point in time when they are supposed to 'evaluate' the teacher.
The administrators' perceptions of the teachers can also profoundly effect the teachers' perceptions of their own effectiveness. Teachers who are told that they are teaching poorly because they don't appeal to the parameters the students are asked to rate on the SETEs may in fact be teaching at a competent level but the administrations' input from the tainted SETEs can be amplified by insisting that they are accurate and show the teacher to be less than competent.
"the underlying belief [s] that the process of education is predominantly the sole burden of the teacher. . . . In this scenario, there is no room for a well rounded evaluation of the students, the management, the facility, the social pressures and inhibitions a long list of variables is ignored."

And through all of this is the underlying belief that the process of education is predominantly the sole burden of the teacher. The assumption that the teacher is primarily responsible completely colours the students' attitude and the evaluation designer's intent. In this scenario, there is no room for a well rounded evaluation of the students, the management, the facility, the social pressures and inhibitions – a long list of variables is ignored.

In real classrooms

Students' subjective opinions can be so varied that the overall results are untrustworthy. Students who are specifically shown that certain SETE parameters have been fulfilled may still evaluate related criteria ambivalently. Students may pointedly refer to a teacher's physical characteristics or manner in very negative or positive terms and judge the teacher on the basis of these characteristics – as if teachers who are not aesthetically acceptable are rendered less capable of teaching.
The entire process of SETEs becomes a convenient matter of picking and choosing what serves to comply with the original hypothesis of the SETE designer/administrator rather than actually engaging in an honest evaluation. This means the evaluation is rather like a shopping list of potentially conforming characteristics that further the administrators' personal biases.

[ p. 17 ]

A proposed paradigm

Adapted from Arnoult and Anderson (1988) to provide for a better paradigm for the evaluation of teacher effectiveness in the academic environment so as to reduce an evaluator's biases: (a) gather as much evidence as possible, (b) employ multiple evaluators who have different view points and interests, (c) vary the observational circumstances to provide for different emphasis in the environment, (d) review video tapes for greater accuracy, (e) compare the criteria on balance sheets to establish evidence for and against an evaluation, (f) solicit an explanation of the results and the subsequent conclusions made by evaluators to reveal gaps in reasoning. This paradigm constitutes constructive advice for the evaluations we make of others in a professional setting.
This type of evaluation is an example of a structured attempt at measuring professional competence with regard for the various facets of the evaluating process which is primarily designed to inform the teachers rather than to judge them – a philosophy that serves better to encourage improvement rather than to punish.


Arnoult, L. & Anderson, C. A. (1988). Identifying and reducing causal reasoning biases in clinical practice. In D. C. Turk & P. Salovey (Eds.), Reasoning, inference, and judgment in clinical psychology (pp. 209-232). New York: Free Press.

Basow, S. A. (1995). Student evaluations of college professor: When gender matters. Journal of Educational Psychology. 87, 656-665.

Darley, J. M., Fleming, J. H., Hilton, J. L., & Swann, W. B. (1988). Dispelling negative expectancies: The impact of interactional goals and target practices on the expectancy of the confirmation process. Journal of Experimental Social Psychology, 24, 19-36.

Feldman, K. A. (1978). Course characteristics and college students' ratings of their teachers: What we know and what we don't. Research in Higher Education, 9, 199-242.

Feldman, K. A. (1984). Class size and college students' evaluations of teachers and courses: A closer look. Research in Higher Education, 21, 45-116.

Harris, M. J. (1993). Information gathering strategies in social perception. Unpublished manuscript, University of Kentucky, Lexington. Cited in Harris, 1994.

Harris, M. J. (1994). Self-fulfilling prophecies in the clinical context: Review and implications for clinical practice. Applied and Preventive Psychology, 3 (3) 145-158.

Kayne, N. T. & Alloy, L. B. (1988). Clinician and patient as aberrant acutaries: Expectation-based distortions in assessment of covariation. L. Y. Abramson (Ed.) Social cognition and clinical psychology: A synthesis, (pp. 295-365). New York: Guilford Press.

Kishor, N. (1995). The effect of implicit theories on raters' inference in performance judgement: consequences for the validity of student ratings of instruction. Research in Higher Education, 36 (2) 177-195.

[ p. 18 ]

Marsh, H. W., & Dunkin, M. J. (1992). Student's evaluations of university teaching: A multidimensional perspective. In J. C. Smart (Ed.). Higher education: Handbook of theory and research. (Vol. 8. pp. 143-233). New York: Agathon Press.

Merton R. K. (1948). The self-fulfilling prophecy. Antioch Review, 8, 193-210.

Nielsen, R. S. (1993). The impact of the 1985 reform legislation on the formative evaluation practices of one central Illinois school district. Doctoral Dissertation, University of Illinois at Urbana-Champaign (in Harris, 1994:148).

O'Connell, D. Q., & Dickinson, D. J. (1993). Student ratings of instruction as a function of testing conditions and perceptions of amount learned. Journal of Research and Development in Education, 27 (1) 18-23.

Sackett, P. R. 1982. The interviewer as hypothesis tester. The effects of impressions of an applicant on interviewer questioning strategy. Personnel Psychology, 35, 789-804.

Seldin, P. (1993, July 21). The use and abuse of student ratings of professors. The Chronicle of Higher Education, p. A40.

Shiozawa T. (1995). The change of the Monbusho guidelines and their impact on language education. Paper. JALT 95, Nagoya Japan. Reprinted in PALE Newsletter,(1996) 2, 1.

Smith, M. L. & Glass, G. V. (1980). Meta-analysis of research on class size and its relationship to attitudes and instruction. American Education Research Journal, 17, 419-433.

Snyder, M., & Campbell, B. (1980). Testing hypothesis about other people: the role of the hypothesis. Personality and Social Psychology Bulletin, 6, 421-426.

Snyder, M., & Swann, W. B. (1978). Hypothesis-testing processes in social interaction. Journal of Personality and Social Psychology, 36, 1202-1212.

Snyder, M., & Thomasen, C. J. (1988). Interactions between therapists and clients: Hypothesis testing and behavioural confirmation. In C. D. Turk & P. Salovey (Eds.), Reasoning, inference and judgement in clinical psychology. New York: The Free Press.

Stedman, C. H. (1983). The reliability of teaching effectiveness rating scale for assessing faculty performance. Tennessee Education, 12 (3) 25-32.

Sugeno K. (1992). Japanese Labour Law, (Leo Kanowitz, Translator) Tokyo: University of Tokyo Press.

Swann, W. B., Jr., & Ely, R. J. (1984). A battle of wills: Self-verification versus behavioural confirmation. Journal of Personality and Social Psychology, 46, 1287-1302.

Swann, W. B., Jr., & Giuliano, T. 1987. Confirmatory search strategies in social interaction: How, when, why, and with what consequences. Journal of Social and Clinical Psychology, 5, 511-524.

Tagomori, H. T. (1993). A content analysis of instruments used for student evaluation of faculty in schools of education at universities and colleges accredited by the national council for accreditation of teacher education. Unpublished Ed. Doctorate dissertation. University of San Francisco.

Turk, C. D., & Salovey, P. (Eds.) 1988., Reasoning, inference and judgement in clinical psychology. New York: The Free Press.

Wigington, H., Tollefson, N. & Rodriguez, E. (1989). Student's ratings of instructors revisited: Interactions among class and instructor variables. Research in Higher Education, 30 (3) 331-344.

Whitten, B. J., & Umble, M. M. (1980). The relationship of class size, class level and core vs. non-core classification for class to student ratings of faculty: Implications for validity. Educational and Psychological Measurement, 40, 419-423.

- Return to Part 1 of this article -

NEWSLETTER: Topic IndexAuthor IndexTitle IndexDate Index
TEVAL SIG: Main Page Background Links Network Join
last Main Page next
HTML:   /   PDF:

[ p. 19 ]