Part 1: Terminology 
Part 2: Procedures 
Part 3: Test Interpretation 
Part 4: Assessment Ethics 
INSTUCTIONS: Below is a list of possible items for three different foreign language assessment literacy tests: one for professional test validators (Level A), one for language teachers with bachelor degrees in education (Level B), and yet another for first year undergraduate education majors (Level C). If you think that an item represents something that professional language test validators should know, mark "Level A". If you believe that an item should be known by foreign language teachers with B.A. degrees, mark "Level B". If you think that an item is something an education major should know before entering college, mark "Level C". If you believe it's not necessary for any of these three populations to know a given item, leave it blank. Please remember that Levels A, B, and C represent  in your view  the minimal competency levels for each of these three populations. If an item is beyond what you believe a member of a group ought to know, then leave it blank.
It's not necessary to answer any of the items below, but you're welcome to do so if you wish. When clicking the boxes for Levels A, B, and C remember that you may click more than one box if it seems appropriate or leave all boxes blank.
When you have completed this document, please email a copy to timothy*at*toyonet*dot*toyo*dot*ac*dot*jp. Thank you for your cooperation.
(1) df  _13_ (A) chisquare  
(2) F  ___ (B) coefficient of determination  
(3) H_{o}  ___ (C) degrees of freedom  
(4) k  ___ (D) Fvalue, variance ratio  
(5) N  ___ (E) null hypothesis  
(6) n  ___ (F) number of cases in a population  
(7) ρ  ___ (G) number of cases in a sample  
(8) r  ___ (H) number of items in a test  
(9) r^{2}  ___ (I) Pearson's correlation coefficient  
(10) r_{2}  ___ (J) probability of a Type I error  
(11) ζ, SD, S_{x}  ___ (K) sample mean  
(12) s^{2}  ___ (L) sample variance  
(13) χ^{2}, c^{2}  ___ (M) standard deviation  
(14) , M  ___ (N) frequency  
(15) v  ___ (O) xvalue  
___ (P) (1) level of significance, (2) the proportion of responses to an item that are correct 
[ p. 62 ]
(B) Multiple choice questionsNote that some items have more than one "correct" possible response. 
16. Gender, occupation, or nationality are considered
variables in most language studies. 

17. If a test only seems to measure what it claims to, then it is said to have validity. 

18. A error occurs when a researcher thinks there is no relationship between two variables, but there actually is. 

19. The cutoff point for a criteriareference test should be when the is equal to or greater than 1. 

20. Exams used to determine a student's progress toward mastery of a content area are known as tests. 

21. How many standard deviations a score is from the mean is revealed by a test's. 
22. The test excerpt below is an example of a test. 
23. To find out how well a particular item in a test correlates with the total test score, a should be ascertained. 

24. Any variable that is not part of a research study, but still has an effect on its results is said to that study. 

25. In a 3parameter IRT test model, the point on an ability scale at which the probability of a correct response for a given item is .5 is known as the . 

26. To predict how many more items need to be added to a given test to increase its reliability to a desired value, the should be calculated. 

27. If a test is unidimensional, then it should automatically show a high degree of . 

28. The tendency of examinee expectations to contaminate test results is known as . 

29. A test administration procedure in which a large set of test items is organized into shorter subsets, each of which is randomly assigned to a subsample, hence avoiding the need to administer all items to all examinees is known as a sampling. 
30. To compare a the mean of a particular subgroup to the mean of a larger group that is within the same population, a should be performed. 

31. Briefly explain the difference between the standard error of estimate (SEE) and standard error of measurement (SEM) in the space below, mentioning when each of these statistics should be used. 

32. If you want to see how closely "masters" who scored high on a particular CRT test differed from "nonmasters" who scored closer the bottom, which technique(s) might you use? 

33. What's the difference between a predictive and concurrent validation study? When should each type of study be used? 

34. How do the KuderRichardson Formula 20 and Formula 21 differ? When should each be used? 

35. What does the central limit theorem tell us? 
Main Article  Appendix A: I II III IV  Appendix B  Appendix C: I II III IV 