On becoming a testing teacher: Preliminary notes (Part 1)
Greta J. Gorsuch
|"as social and demographic pressures push secondary and tertiary EFL teachers in Japan towards more diverse ways of teaching, they will also be forced to learn more varied ways of testing"|
[ p. 9 ]
Teaching Testing in West Texas
[ p. 10 ]
Course: Testing Language Skills
Date: Summer 1995
Text: Brown, J. D. (1996). Testing in language programs. Upper Saddle River, NJ: Prentice Hall Regents.
Course Purpose: (from syllabus) This course will provide students with a working knowledge of the basic principles for test construction and testing procedures with an emphasis on second language settings. Students will look critically at a variety of first and second language tests including standardized tests, integrative language tests, discrete-point tests and tests of communicative competence. No previous knowledge of statistics or higher mathematics is required. Students will learn the necessary statistical procedures to use in "testing the tests." This will enable them to read test manuals with understanding and construct their own examinations.
Course Requirements: Final exam, innovative test description, homework exercises, and participation. There are four homework exercises based on small datasets (N = 20, maximum) presented in the textbook: (1) item analysis; (2) descriptive statistics; (3) correlation; (4) reliability.
Mode of Class: Each three hour class consists mainly of lectures, anecdotal example explanations, and extended question and answer periods. Students are expected to read chapters from the textbook. The teacher will go over the textbook material and demonstrate mechanical calculations on the board. The entire textbook will be covered, in the order of the chapters in the book. During the question and answer period, the teacher will relate many of the concepts to practical situations made up of his extensive testing experiences at the program level. Students are also expected to complete exercises at the end of each chapter. Going over the exercises will take up part of the Q and A sessions. The homework exercises will consist of mechanical calculations of scores, data interpretation, and self-guided examination of existing tests known to the students. Students are encouraged to use spreadsheet and statistical programs, but are not given actual instruction in doing so. The teacher will also give out many examples of analytical and holistic rating scales and talked about them.
Content Areas: * criterion- versus norm-referenced tests; * relationship of CRTs and NRTs to different types of decisions; * history of language testing; channel versus mode; * discrete point versus integrative tests; * psychological constructs; * test fairness; issues involved in adopting, adapting, and creating tests; guidelines for giving tests and maintaining records; developing and improving test items; *item types: multiple choice, receptive response, matching, etc.; norm-referenced test item statistics; * criterion-referenced test item statistics; * nominal, ordinal, and interval scales; reading and creating histograms; * central tendency statistics; * dispersion statistics; * normal distribution; * outliers; * standardized scores; skew, kurtosis; * Pearson Product-Moment correlation (calculation and interpretation); * significance, meaningfulness (shared variance); * Spearman Rank-Order correlation; * point biserial correlation; * measurement error; * types of NRT test reliability estimates (test-retest, equivalent forms, internal consistency); * Spearman-Brown prophecy formula; * Cronbach alpha; * K-R20, K-R21; *interrater, intrarater reliability; * standard error of measurement; * CRT consistency estimation (threshold loss agreement, squared error loss agreement, domain score dependability); * agreement coefficient; kappa coefficient; phi lambda dependability; phi dependability; * confidence intervals; content validity; construct validity; * standards setting; * relationship of testing to curriculum; * developing goals and objectives.
Bottom Line: Students will cover a wide variety of content topics focusing on programmatic level CRT and NRT creation, use, and CRT and NRT score interpretation. Students will develop skills in connection with many of the content topics, particularly in completing calculations and displaying numerical data as homework. An "exploring language testing with statistics" course. Post-positivism (realism) with a strong streak of humanism.
Hidden Curriculum: CRTs should be used in the majority of educational situations. Tests have social implications and effects. Tests also have personal impact on students. Tests, and test scores, are often used irresponsibly (e.g., Japanese university entrance exams). Listening to other students' questions in the testing class is helpful to students. Practical applications to testing concepts should always be found. Students should actively seek ways to use math and statistics by analyzing tests (data). Numerical data is valued. Math and statistics are not as hard as you think.
What Students did not get: Experience in creating criterion referenced goals and objectives. Experience comparing tests with specific curricula. Exposure to "alternative forms" of assessment, including portfolios, etc., in which the assessment generates descriptive rather than numerical data. Experience in creating and revising CRTs for specified classroom situations. Experience explaining or presenting testing content topics.
Other Sources Recommended:
Cronbach, L. (1990). Essentials of psychological testing (Fifth Edition). New York: Harper Collins Publishers, Inc.
American Educational Research Association/American Psychological Association/National Council on Measurement in Education. (1990). Standards for educational and psychological testing. Washington, D.C.: Author.
[ p. 11 ]
Course: Doctoral Seminar-Advanced Topics in Language Testing
Date: Fall 1997
Text: Schumacker, R. & Lomax, R. (1996). A beginner's guide to structural equation modeling. Mahwah, NH: Lawrence Erlbaum Associates. Many handouts and additional readings.
Course Purpose: Introduce students to advanced concepts in testing, particularly linked issues of reliability/dependability/generalizability, and convergent/concurrent/content/construct validity. To give students practice using computer programs designed to explore these issues using large, authentic datasets (N = 500, minimum). Not quite right. Not all of the datasets were large. Some from Tabachnik and Fidell were not > 500, if I recall.
Course Requirements: Write a paper that demonstrates knowledge of concepts covered in class, preferably on a topic relating directly to students' dissertations. Homework assignments which involved statistical analysis and discussion of large datasets. Not a paper. Rather, it was to conduct a pilot study with special attention given to reliability and validity.
Mode of Class: Some lecture with extensive Q and A periods. Extended periods of small group, hands-on, guided use of statistical computer programs, including SPSS, EQS, GENOVA.
Content Areas Covered: *Uses of factor analysis; *factor analysis rotations; *exploratory and confirmatory factor analysis (2 homework assignments); * theta and omega reliability estimates (homework assignment); *generalizability theory (G and D study homework assignment); *path analysis (homework assignment + in-class work); structural equation modeling (homework assignment + many in-class tasks); *univariate and multivariate outliers; *issues of theory building (is theory imposed on the data, or does data make the theory?); multitrait/multimethod analyses item response theory (in-class tasks); *creating competency tasks (homework assignment); *a priori hypothesis testing versus "data snooping"
Bottom Line: The students learned the rudiments of discovering dimensionality in testing instruments, by analyzing large datasets using a variety of sophisticated statistical analyses. Post-positivist with a focus on the data itself, not the students. What is this-an assessment? Critical realist, I would say. This was said explicitly.
Hidden Curriculum: This implies some intention. I'm not sure what this means. Computer program copyrights must be respected. Datasets need to be screened and put in proper condition to use with computer programs. Students should be conversant with different types of computers. Students need to actually learn to use computer programs to transform their statistical knowledge and their attitudes about statistics. It's OK to force students a bit beyond their level of understanding. Not everyone learns at the same speed. Students should be asked to figure things out for themselves. Give hints, students should do the rest. People who have doctorates should be running language programs. Did I say this? I recall telling XXXX that Ed.D.s should be able to do educational research. Program administrators should make responsible decisions based on numerical data. At least not ignore, if because of ignorance. Data can be used for many policy forming decisions, such as using path analysis to discover which students may be "at risk" in a program. Strong understanding of dimensionality is the basis of good testing. Fair testing. Students need to learn how to interpret data logically. Test and questionnaire construction are very similar. Large numerical datasets are valued. Because they are stable.
[ p. 12 ]
What Students did not get: In-depth experience with any one of the content topics covered. Experience working with small datasets. 48 hours ain't much. This would entail checking assumptions more rigorously I would guess. Experience explaining or presenting testing content topics.
Other Sources Recommended:
Mulaik, S. & James, L. (1995). Objectivity and reasoning in science and structural equation modeling>. In R. Hoyle (Ed.). Structural equation modeling: Concepts, issues, and applications. Thousand Oaks, CA: Sage.
Carmines, E. G. & Zeller, R. A. (1979). Reliability and validity assessment. Newbury Park, CA: Sage.
|"While some elements of the hidden curriculum are imparted without conscious plan, some aspects emerge intentionally as the teachers talk through the concepts of the course in a way that makes sense to them."|
[ p. 13 ]
Initial conception of my testing course
Course: Second Language Testing, LING 5345 Instructor: Greta Gorsuch, Ed.D. Class Meeting Times: Monday and Wednesday, 4:30-5:50 PM Office Hours: Tuesday 2-4 PM, Thursday 9:30-11:00 AM, Friday 11-12 noon
[ p. 14 ]Welcome to the world of second language testing and assessment! In this course, I want you to get a working knowledge of basic principles of testing procedures which can applied to second language programs and classrooms. Note what I said about "working knowledge." This means that we will be looking at actual tests and testing procedures, working with actual data, and creating tests and testing procedures that most fit your teaching situation. You might be relieved to know that no previous math or statistical courses are required for this course. We will, however, be using some basic math and statistics in the course, and I hope you will get a taste for the usefulness of statistics when looking at data of all sorts.
There are six course goals:
Class Format: There will be lectures, pair- and group-work, and student presentations. A lot of the reading you will be doing will present quite different content that what you have had in other teaching and language courses. I have two pieces of advice: First, keep up with the reading; and second, do the homework assignments I give you. The homework really does help. Your answers on the homework will also give me an idea of what topics I need to review in class, and whether I need to slow down, or speed up.
Assigned Reading: The main text is Brown, J. D. (1996). Testing in Language Programs. Upper Saddle River, NJ: Prentice-Hall Regents. Other assigned readings are:
Brown, J. D., & Hudson, T. (1998). The alternatives in language assessment. TESOL Quarterly, 32 (4), 653-675. This has been placed on electronic reserve.Grading
Final Examination: 40%The final examination will take about one class period (80 minutes). Most of the items will be objective (only one answer is correct), but some will be open ended and I will grade the quality and comprehensiveness of your answer. I may ask you to interpret data, or do some calculations, or critique a test or testing procedure. Rest assured, however, you will not be asked to do anything we haven't covered and digested thoroughly.
The student presentation will involve a 10-minute presentation made by each student which describes a classroom test and testing procedure you would like to use, or have used, in a specific teaching situation. The format of your presentations may vary, but you should be sure to cover the following points: (1) give an adequate description of your teaching situation; (2) adequately articulate the construct you wish to capture in the test (tell us what it is you think you are testing--what skills, what knowledge, etc.);(3). give a comprehensive description of the development of your test instrument; (4) adequately describe your testing procedure with a focus on maintaining test reliability and test validity. You should also be prepared to respond to classmates' questions and comments. This should be a time of sharing and positive growth for everyone.
[ p. 15 ]There were many similarities between my course syllabus and my experiences as a learner of second language testing at the basic level. For example, adopting Brown's Testing in Language Programs practically guaranteed that the content my own students would be exposed to would be same content I had experienced as a learner. Note also that in the course format lectures and homework are mentioned: this is what I experienced as a learner in the basic testing course. Finally, note that one overall goal in common between my course and my learning experiences are to have hands on experience working with data.
[ p. 16 ]