Suggested Answers for Assessment Literacy Self-Study Quiz #2
by Tim Newfields
Possible answers for the nine questions about testing/assessment which were in the
March 2007 issue of this newsletter appear below.
Part I: Open Questions
1Q: Mention at least one advantage and disadvantage of using scaled scores in a classroom context.
A: First of all, that depends on what is meant by "scaled scores". Some people associate
this term with any standardized score, which is generally a z-score or t-score. Other associate
this term with any type of weighted standardized score, generally in which items with an good discrimination
indices have more weight. For simplicity, let's consider just the first case.
Depending on the context, one advantage of scaled scores is that students can know where they stand
relative to others in the same group. A second advantage of scaled scores is that it makes grading easier –
the performance of any group member with respect to his/her peers is obvious.
Calculating scaled scores does involve more work for teachers. Moreover, not all students will know how to interpret their
scores: teachers may need to spend more time explaining the score system to students. Another disadvantage might be
too much emphasis on scoring rather than learning: is all the energy spent on ranking students actually needed? What
does scaling do for the classroom atmosphere? Teachers need to reflect on such questions carefully.
Bodner, G. (n.d.). Statistical Analysis of Multiple Choice Exams. Retrieved April 5, 2007 from
2Q: According to classical testing theory, what are three requirements that alternative forms
of a test must demonstrate?
A: First, and perhaps too obviously, both forms should have the same format and length and also repute to measure the same construct.
Second, both forms should have similar score means and variances.
Moreover, the items in both forms should similar item-total correlations.
It is not a requirement, but a common practice for both forms should have a certain number
of identical "anchor items" to make it easier to ascertain any possible differences between the two samples.
Mousavi, S.A. (2002). An Encyclopedic Dictionary of Language Testing. (3rd Ed.).
Taipei: Tung Hua Book Company. p. 476-477.
3Q: Briefly describe what is the coefficient of variation (CV) supposedly measures and
when it should be used.
A: The coefficient of variation is the ratio of the standard deviation to the mean.
It is one way of describing the relative distribution of data. Since, it has no units, it is usually
reported as a percentage. The smaller the CV is, the less scattered the data is.
Lohninger, H. (2006). Fundamentals of Statistics: Coefficient of Variation. Retrieved April 5, 2007 from
Wikipedia. (2007). Coefficient of Variation. Retrieved April 5, 2007 from http://en.wikipedia.org/wiki/Coefficient_of_variation
4Q: How does Pearson's coefficient of skewness differ from Bowley's coefficient of
The three most common measures of skewness are (1) standard skewness, (2) Bowley's measure, and (3)
the Pearson Skewness Coefficient. All three measures describe the asymmetry of a distribution curve is
somewhat different ways.
Standard skewness, which is sometimes called the third standardized moment, for univariate data is as follows -
Bowley skewness, which is also known as quartile skewness coefficient,
measures the skewness of each of the quartile ranges of a curve and is defined by this formula –
Since only the middle two quartiles of the distribution are considered, and the outer two quartiles
are ignored, this adds robustness to the measure.
The Pearson Skewness Coefficient is sometimes known as "relevant skewness" and is defined by this formula –
(mean - median) / standard deviation. This is certainly easier to calculate than standard skewness.
One disadvantage is of this formula is that outliers can have a strong influence on the mean and standard deviation.
Steiner, A. (2005). Investment Performance Analysis - Skewness.
Retrieved April 6, 2007 from http://www.andreassteiner.net/performanceanalysis/?Risk_Measurement:Return_Distributions:Skewness
Weisstein, E. W. (n.d.). Bowley Skewness.
Retrieved April 6, 2007 from http://mathworld.wolfram.com/BowleySkewness.html
Part II: Multiple Choice Questions
1Q: In a perfect bell-shaped curve, what percentage of a sample should theoretically be over two standard deviations from the mean?
A: The correct answer is (C). About .13% of the data would likely be more than three standard deviations from the mean
if the curve was completely Gaussian.
Calkins, K. G. (2005). An Introduction to Statistics - Lesson 6: The Bell-shaped, Normal, Gaussian Distribution.
Retrieved April 5, 2007 from http://www.andrews.edu/~calkins/math/webtexts/stat06.htm
2Q: In a normal bell-shaped curve, what percentage of a sample should be within the 5th stanine?
A: The correct answer is (D).
The term stanine (a contraction of "standard nine") is a normalized standard score with nine bands, with
9 representing the highest band, 1 the lowest, and 5 the midpoint. The score distributions are as follows:
Table 1. Stanine score distribution
Azzolino, A. (2005). MIDDLE GROUND: Stanine - Statistical Standard Nine Normal Distribution.
Retrieved April 5, 2007 from http://www.mathnstuff.com/math/spoken/here/2class/90/stanine.htm
3Q: To calculate a chi-square statistic with one degree of freedom for two
groups, which of the following is/are NOT needed?
A: A chi-square test with one degree of freedom would have only two categories.
For example, we might compare how a control group and experimental group did
differently on a specific test by comparing the mean scores for each group.
More technically, we would be examining the difference between the observed
and expected mean scores for each group.
Alternatively, we might wish to know how widely distributed the scores were for
two different groups. In that case the standard deviation would be a variable
to take into account.
So the answer to this question partly depends on what you are trying to measure.
However, (C) and (D) would not be variables to consider for a chi-square test
with just one degree of freedom.
Baranowski, M. (2002, June 26). Statistical Analysis of Field Project #1.
Retrieved April 7, 2007 from http://www.ling.upenn.edu/courses/Summer_2002/ling102/chisq.html
Connor-Linton, J. (2003, March 22). Chi Square Tutorial.
Retrieved April 7, 2007 http://www.georgetown.edu/faculty/ballc/webtools/web_chi_tut.html
Ryan, J. (n.d.). The Chi Square Statistic.
Retrieved April 7, 2007 http://math.hws.edu/javamath/ryan/ChiSquare.html
4Q: In general statistics, the symbol r2, signifies .
A: The correct answer is (B) - the coefficient of determination.
This is equal to 1 minus the sum of squares about the mean divided by the sum of squared errors (residuals).
Wikipedia. (2007). Coefficient of determination. Retrieved April 7, 2007 from http://en.wikipedia.org/wiki/Coefficient_of_determination
5Q: To find out how the total score on a test correlates with the chance of getting a single item
on the same test correct, a
A: There are several ways to achieve this. Within a classical testing framework, you could opt for
Choice (A) . However, a Rasch analysis might yield a better picture of what is going on. If you are dealing with a low-stakes
test and do not need much precision, the if you knew the item facility for a given item as well as the mean item facility
for the entire test (reflected in the mean score), you should be able to get a rough estimate of how that item's relative
Varma, S. (2006). Preliminary Item Statistics Using Point-Biserial Correlation and P-values.
Retrieved April 8, 2007 from http://www.eddata.com/resources/publications/EDS_Point_Biserial.pdf