Shiken: JALT Testing & Evaluation SIG Newsletter
Vol. 11 No. 1. Mar. 2007. (Supplement) [ISSN 1881-5537]

Suggested Answers for Assessment Literacy Self-Study Quiz #2
by Tim Newfields

Possible answers for the nine questions about testing/assessment which were in the March 2007 issue of this newsletter appear below.

Part I: Open Questions

1 Q: Mention at least one advantage and disadvantage of using scaled scores in a classroom context.

   A: First of all, that depends on what is meant by "scaled scores". Some people associate this term with any standardized score, which is generally a z-score or t-score. Other associate this term with any type of weighted standardized score, generally in which items with an good discrimination indices have more weight. For simplicity, let's consider just the first case.

Depending on the context, one advantage of scaled scores is that students can know where they stand relative to others in the same group. A second advantage of scaled scores is that it makes grading easier – the performance of any group member with respect to his/her peers is obvious.
* Calculating scaled scores does involve more work for teachers. Moreover, not all students will know how to interpret their scores: teachers may need to spend more time explaining the score system to students. Another disadvantage might be too much emphasis on scoring rather than learning: is all the energy spent on ranking students actually needed? What does scaling do for the classroom atmosphere? Teachers need to reflect on such questions carefully.

Further reading:

Bodner, G. (n.d.). Statistical Analysis of Multiple Choice Exams. Retrieved April 5, 2007 from

2 Q: According to classical testing theory, what are three requirements that alternative forms of a test must demonstrate?

   A: First, and perhaps too obviously, both forms should have the same format and length and also repute to measure the same construct. Second, both forms should have similar score means and variances. Moreover, the items in both forms should similar item-total correlations. It is not a requirement, but a common practice for both forms should have a certain number of identical "anchor items" to make it easier to ascertain any possible differences between the two samples.

Further reading:

Mousavi, S.A. (2002). An Encyclopedic Dictionary of Language Testing. (3rd Ed.). Taipei: Tung Hua Book Company. p. 476-477.

3 Q: Briefly describe what is the coefficient of variation (CV) supposedly measures and when it should be used.

   A: The coefficient of variation is the ratio of the standard deviation to the mean. It is one way of describing the relative distribution of data. Since, it has no units, it is usually reported as a percentage. The smaller the CV is, the less scattered the data is.

Further reading:

Lohninger, H. (2006). Fundamentals of Statistics: Coefficient of Variation. Retrieved April 5, 2007 from

Wikipedia. (2007). Coefficient of Variation. Retrieved April 5, 2007 from

4 Q: How does Pearson's coefficient of skewness differ from Bowley's coefficient of skewness?

   A: The three most common measures of skewness are (1) standard skewness, (2) Bowley's measure, and (3) the Pearson Skewness Coefficient. All three measures describe the asymmetry of a distribution curve is somewhat different ways.
* Standard skewness, which is sometimes called the third standardized moment, for univariate data is as follows -

Formula for standard skewness

Bowley skewness, which is also known as quartile skewness coefficient, measures the skewness of each of the quartile ranges of a curve and is defined by this formula –

Formula for Bowley skewness

* Since only the middle two quartiles of the distribution are considered, and the outer two quartiles are ignored, this adds robustness to the measure.
* The Pearson Skewness Coefficient is sometimes known as "relevant skewness" and is defined by this formula – (mean - median) / standard deviation. This is certainly easier to calculate than standard skewness. One disadvantage is of this formula is that outliers can have a strong influence on the mean and standard deviation.

Further reading:

Steiner, A. (2005). Investment Performance Analysis - Skewness. Retrieved April 6, 2007 from

Weisstein, E. W. (n.d.). Bowley Skewness. Retrieved April 6, 2007 from

Part II: Multiple Choice Questions

1 Q: In a perfect bell-shaped curve, what percentage of a sample should theoretically be over two standard deviations from the mean?

   A: The correct answer is (C). About .13% of the data would likely be more than three standard deviations from the mean if the curve was completely Gaussian.

Further reading:

Calkins, K. G. (2005). An Introduction to Statistics - Lesson 6: The Bell-shaped, Normal, Gaussian Distribution. Retrieved April 5, 2007 from

2 Q: In a normal bell-shaped curve, what percentage of a sample should be within the 5th stanine?

A: The correct answer is (D). The term stanine (a contraction of "standard nine") is a normalized standard score with nine bands, with 9 representing the highest band, 1 the lowest, and 5 the midpoint. The score distributions are as follows:

Table 1. Stanine score distribution
Stanine Score 1 2 3 4 5 6 7 8 9
Sample Ranking 4% 7% 12% 17% 20% 17% 12% 7% 4%

Further reading:

Azzolino, A. (2005). MIDDLE GROUND: Stanine - Statistical Standard Nine Normal Distribution. Retrieved April 5, 2007 from

3 Q: To calculate a chi-square statistic with one degree of freedom for two groups, which of the following is/are NOT needed?

   A: A chi-square test with one degree of freedom would have only two categories. For example, we might compare how a control group and experimental group did differently on a specific test by comparing the mean scores for each group. More technically, we would be examining the difference between the observed and expected mean scores for each group.

Alternatively, we might wish to know how widely distributed the scores were for two different groups. In that case the standard deviation would be a variable to take into account.

So the answer to this question partly depends on what you are trying to measure. However, (C) and (D) would not be variables to consider for a chi-square test with just one degree of freedom.

Further reading:

Baranowski, M. (2002, June 26). Statistical Analysis of Field Project #1. Retrieved April 7, 2007 from

Connor-Linton, J. (2003, March 22). Chi Square Tutorial. Retrieved April 7, 2007

Ryan, J. (n.d.). The Chi Square Statistic. Retrieved April 7, 2007

4 Q: In general statistics, the symbol r2, signifies .

   A: The correct answer is (B) - the coefficient of determination. This is equal to 1 minus the sum of squares about the mean divided by the sum of squared errors (residuals).

Further reading:

Wikipedia. (2007). Coefficient of determination. Retrieved April 7, 2007 from

5 Q: To find out how the total score on a test correlates with the chance of getting a single item on the same test correct, a

   A: There are several ways to achieve this. Within a classical testing framework, you could opt for Choice (A) . However, a Rasch analysis might yield a better picture of what is going on. If you are dealing with a low-stakes test and do not need much precision, the if you knew the item facility for a given item as well as the mean item facility for the entire test (reflected in the mean score), you should be able to get a rough estimate of how that item's relative difficulty.

Further reading:

Varma, S. (2006). Preliminary Item Statistics Using Point-Biserial Correlation and P-values. Retrieved April 8, 2007 from

NEWSLETTER: Topic IndexAuthor IndexTitle IndexDate Index
TEVAL SIG: Main Page Background Links Network Join
last Main Page next
HTML:   /   PDF:

Quiz 1: Qs   / As    *    2: Qs   / As    *    3: Qs   / As    *    4: Qs   / As   *    5: Qs   / As   *    6: Qs   / As   *    7: Qs   / As    8: Qs   / As   9: Qs   / As   10: Qs   / As  

[ p. 30b ]