## Appendix A:

### Foreign Language Assessment Literacy Test - Preliminary Item Screening

 Part 1: Terminology Part 2:Procedures Part 3: Test Interpretation Part 4: Assessment Ethics

### PART II. Procedures

(A) Exercise 1
INSTRUCTIONS: Specify the mean and standard deviation for the following types of norm-referenced tests assuming that the curve has a normal distribution:
 36 quartile score mean = standard deviation= Level A Level B Level C 37 percentile score mean = standard deviation= Level A Level B Level C 38 stanine score mean = standard deviation= Level A Level B Level C 39 T score mean = standard deviation= Level A Level B Level C 40 z score mean = standard deviation= Level A Level B Level C

(B) Exercise 2
INSTRUCTIONS: Look at the data from the test below, then answer Questions 41-45 using any electronic device or software program that you know how to operate.
 Raw score sections of four sections of a norm-referenced language test of general English ability. (Correct number of items for each section of the test appears below) Section 1 Section 2 Section 3 Section 4 Total k (# of items) 10 30 20 30 90 1. Diana 8 28 15 14 65 2. Cindy 7 22 10 15 54 3. Marilyn 4 11 9 8 32 4. Jack 10 26 19 26 81 5. Chris 5 15 10 16 46 6. Faith 7 18 15 22 62 7. Doug 9 10 12 21 52 8. James 3 10 5 11 29 9. Emiko 8 23 16 25 72 10. Eric 6 19 12 18 55 etc. . .

 41. What is the mean of the total test? Level A Level B Level C 42. What is the standard deviation? Level A Level B Level C 43. Which student(s) is/are more than one standard deviation from the mean? Level A Level B Level C 44. Do any sections of this test correlate closely in a way that's statistically significant at a p<.05 level (If so, mention which) Level A Level B Level C 45. What sort of distribution curve does this test have so far? Level A Level B Level C

(C) Exercise 3
INSTRUCTIONS: The table below indicates the hypothetical data for a 50-item test that were given to two different population samples. Look at that data then calculate the statistics mentioned in Questions 46-50:
 Population A Population B sample size: 20 80 mean score: 32 25 standard deviation: 7.5 6 low-high: 14 - 48 12 - 50 alpha reliability estimate: .7 .8

 46. ANOVA: Level A Level B Level C 47. F-ratio: Level A Level B Level C 48. Chi-square distribution: Level A Level B Level C 49. effect size: Level A Level B Level C 50. standard error of measurement: Level A Level B Level C

(D) Exercise 4
INSTRUCTIONS: Compare the oral interview ratings below by two raters of the same student, then calculate the statistics mentioned in Questions 51-55. Note that all ratings are in terms of 5-point bands, with 5 representing the highest possible rating.
 Category Rater A Rater B Grammar 3.5 3 Fluency 4 4 Pronunciation 4 3.5 Cohesion 4 3.5 Vocabulary 4.5 4 Total 20 18

 51. The inter-rater reliability coefficient for A and B is . Level A Level B Level C 52. The Pearson correlation index for the two raters is . Level A Level B Level C 53. The index of concordance among the two raters is . Level A Level B Level C 54. The chi-square test of independence for these two raters is . Level A Level B Level C 55. The kappa coefficient of the combined rating is . Level A Level B Level C

(E) Exercise 5
INSTRUCTIONS: Read this hypothetical data comparing a 60-item classroom pretest/posttest, then complete the sentences below. Note that following the pretest, the top one-third students were classified into an "upper group" and the lower one-third were classified into a "bottom group":
 Category Pretest Posttest sample size: 48 42 total mean: 30 33 total range: 7-44 12-52 total standard deviation: 3.6 4.3 upper group mean score: 45 50 upper group standard deviation: 4.0 3.9 bottom group mean: 20 20 bottom group standard deviation: 4.2 5.8

 56. How did the upper group perform differently from the bottom group? . Level A Level B Level C 57. What sort of distribution curve would this posttest likely have? . Level A Level B Level C 58. Which type of ANOVA, if any, would be suitable for measuring the pretest/posttest gains made by this sample group? . Level A Level B Level C 59. What sort claims could validly be made about the "progress" of this class? . Level A Level B Level C

