Shiken: JALT Testing & Evaluation SIG Newsletter
Vol. 5 No. 3 Oct. 2001 (p. 2 - 6) [ISSN 1881-5537]
PDF PDF Version

Reading complexity judgments: Episode 1

Gholam Reza Haji Pour Nezhad
Tehran University, Iran

This three-part paper describes how a reading comprehension test was developed, validated, and utilized to investigate item complexity and complexity judgments. Ninety-nine university students were asked to evaluate 55 pairs of contrasting statements with a unique type of stem-response construction. Respondents were asked to decide whether the response in each paired statement was correct or incorrect in the light of the previous stem. Validity decisions involved both matters of fact and inference. Moreover, since the contrasting statements themselves involved a variety of sentence types, the influence of factors such as lexical density, abstractness, background knowledge, T-unit complexity, and other syntactic elements on the overall sentence complexity was explored.
The first part of this paper describes how complexity theory has been controversial issue in language testing. It then introduces several factors thought to influence reading comprehension. The next part describes the Method of this study and three research questions. The final part examines two additional research questions, and concludes by considering the role complexity research should have in reading comprehension test development.


Although reading is often considered the easiest skill to test, ample evidence demonstrates that reading assessment has particular complexities (Silberstein, 1987; Weaver and Kintsch, 1991; Klapper, 1992). Reading tests attempt to assess a variety of skills. These range from associating graphic symbols with sounds and words, through understanding relationships between pieces of information in the elements of sentence structure, negation, and embedding, to inferential and evaluative skills.
". . . answering inferential questions is, on the average, more difficult than answering factual ones."

Inferential comprehension has been one of the most neglected areas in reading research. As a result, most commonly used reading tests rely on factual questions and measure comprehension of explicitly stated material. However, text-processing data has identified a large range of inferential processes in natural language processing, ranging from low-level to sophisticated skills. Ample evidence, for instance, shows that basic reading requires readers to infer numerous small pieces of information, such as the antecedent for a pronoun or the implicit object of an action. On the other hand, sophisticated inferences such as those requiring specific background knowledge to comprehend history texts may, as they usually do, present comprehension problems. This, in turn, brings us to the issue of item difficulty, which is such an attractive area in language testing. Test dimensionality has been also found strongly related to item difficulty. For instance, Oltman, Stricker, and Barrows (1988, p. 1-27), using multidimensional scaling analyses, reported that dimensionality (the number of underlying factors in a test revealed by factor analysis) may actually vary depending on the region of the difficulty-ability continuum under investigation.
The majority of the studies comparing factual and inferential comprehension conclude that answering inferential questions is, on the average, more difficult than answering factual ones. Accordingly, questions can be conceived of in terms of a factual to inferential continuum, and a simple to difficult continuum, as in Figure 1:

Figure 1. Two continuums for describing test items.

However, to claim that a test item is more difficult simply because is more inferential is of little explanatory value unless we can define difficulty in more analytic ways. In fact, many inferential questions exist which are easier than factual questions.
To come up with a more discriminating model of test item complexity, this study paid special attention to the following factors.
  1. Lexical density

  2. The Longman Dictionary of Applied Linguistics (1992, p. 163) defines lexical density as "a measure of the ratio of different words to the total number of words in a text." Lexical density is more commonly defined, as McCarthy (1990) states, as the proportion of the content (lexical) words over the total words. Psycholinguistic studies have long shown that less densely packed texts are more easily comprehended, particularly among non-proficient readers. Bradac et. al. (1977) and De Vries, (1998) have demonstrated that there is a correlation between low lexical density and comprehension test scores. Nevertheless, researchers have not shown much interest in taking the lexical density of a text as a measure of its lexical difficulty. One major reason for this reluctance is the fact that the number of content words as compared with that of function words does not convey how difficult the words of a given clause are. Consequently, despite the fact that lexical density is generally a measure of comprehensibility, lexical difficulty is influenced by a multitude of variables such as word frequency, word length, irregular word spelling, multiple word denotations, specialized word applications, and selectional and subcategorical restrictions (Thor, 1987).
    In the present study, the lexical density of the 55 pairs of contrasting statements was fixed into two categories. Lexically dense statements had a lexical density of around .64 and those which weren't had a measure of nearly .46. Informants also used a five-point Likert scale to determine the general lexical difficulty of each of the contrasting statements, in which 1 represented the lowest level of difficulty and 5 the highest.

  3. Abstractness

  4. A large amount of research demonstrates that abstract words are more difficult to understand than concrete words (Anderson, 1974; Wharton, 1980; Corkil, Glover and Bruning, 1988; Sadovski, Goetz, and Fritz, 1993). In an attempt t o account for this, two competing theories exist. Dual coding theory (Paivio, 1986) maintains that concrete language is easier because language is processed by both of the cognitive operations of verbal and nonverbal (imagery) systems, whereas abstract language is merely processed by the verbal system. However, context-availability theory (Sadovski, Goetz and Avila, 1995) emphasizes that concrete language has more prior knowledge connections than abstract language, and when abstract language is also sufficiently familiar or presented in context, its comprehension should be equal to concrete language.
    In the present study, care was taken to find out whether subjects decided upon complexity ratings on the basis of imagery or context.

  5. T-unit complexity

  6. Since Hunt's (1965) definition of T-units as the "shortest grammatically allowable sentences into which (writing can be segmented) or minimally terminable unit," researchers have frequently followed this concept. To Hunt, a T-unit is essentially composed of a main clause along with one or more subordinate clauses that go with it. The complexity of the T-unit is maintained t o be determined by two variables: main clause length, and the number of subordinate clauses. T-units which are longer and have more subordinate clauses are more complex (Vavra, 2000).
    In the present study, the length of all T-units was kept constant so that each first statement of a pair was 17 words and the statement which followed was 11 words. However, the number of subordinate clauses in the T-units was manipulated in such a way that T-unit complex sentences had two subordinate clauses and the others had none.
  7. Factuality/inferentiality

  8. The bulk of research in reading comprehension is in favor of the assumption that inferential items are more complex than factual ones and that it is critical to familiarize learners with both item kinds. (Herber and Gray, 1960; Pearson and Johnson, 1978).
    In the present study, about half of the judgments about items were designed to be factual and half were designed to be inferential. For example, two of the of 55 paired statements respondents were asked to judge are as follows:

    1 S. The red roses of the garden never had the freshness of the pink ones beyond the fence. R. The location, not the color, was the source of the difference. 2 S. The discussions among the members had a real effect on deciding on a solution to the problem. R. They found a solution mainly as a result of their discussions.

    Respondents were asked to judge the truth of second statement (known as a response or restatement) if the statement preceding it (known as the stem) is considered true. Example 1 involves inferential judgments and Example 2 is said to involve factual judgments.
    A main premise in this study is that since students are not familiar with inferential formats, they may not differentiate between item types appropriately.

  9. Background knowledge

  10. A widely discussed issue is the role of background knowledge in reading comprehension. Briefly, when reading a text for which we have some background knowledge, we are able to comprehend it more easily and more completely and t o make more accurate inferences from it than when reading a text for which w e have little background knowledge. In a review of the research on schema theory, Goetz and Armbruster (1980) report the following major findings which support the notion of schema: (1) connected discourse is much easier to learn and remember than collections of unrelated sentences; (2) text which is more congruent with what the reader already knows and expects is also better remembered; (3) the processing of text is selective – it is the most important element in a selection that will be stored and remembered; and (4) readers interact with text content in such a way that individual interests and perspectives influence text interpretation.
    Background knowledge seems to be one of the most significant determinants o f reading comprehension. As Kolers (1968) points out, "What the reader understands from what he has read is the result of a construction he makes and no t the result of a simple transmission of the graphic symbols to his mind."
    Background knowledge was a controlled variable in the present study, so that no item demanded more specialized knowledge than an ordinary university student would already possess."

  11. Other syntactic factors

  12. A wide variety of syntactic factors may influence text difficulty. From among these, we can refer to sentence length, adverbial and prepositional phrases, conjunctive structures, equi-deletion, permutation, transposition, embedding, sentential complements, topicalization, ellipsis, tense/aspect, concord rules, pseudo-cleft, and so on (Thor, 1987).
    The present study examined the influence of topicalization and pseudo-cleft. Syntactically complex statements which contained both topicalization (moving a cluster of elements in the initial position) and pseudo-cleft (in which one part of a sentence is focused and the other a free relative clause) were contrasted with syntactically non-complex statements with no topicalization or pseudo-cleft.
    A closer look at the themes in this study is provided in Appendix 1.

Upcoming Episodes

The study in the next issues of this newsletter will explore the following questions -
1. Are respondents able distinguish between different kinds of complexity?

2. Do respondents assign significantly differentiated item complexity orders on the basis of the statement/restatement types?

3. How do factuality/inferentiality ratings by respondents pertain to item complexity ratings?

4. Do the factuality/inferentiality levels judged by the researcher also influence complexity order judgments?

5. Do respondents rank item complexity orders the basis of statement/restatement combinations (specific item kinds)?


Alderson, J. C. (1993) Judgments in language testing. In D. Douglas & C. Chapelle (Eds.), A new decade of language testing. Alexandria, VA, USA: TESOL.

Alvarez, M. C. & Risko, V. J. (1989) Schema activation, construction, and application. Bloomington IN, USA: ERIC Clearinghouse on Reading & Communication Skills.

Anderson, R. C. (1974) Concretization in sentence learning. Journal of Educational Psychology, 66, 179-183.

Birkmire, D. P. (1985) Text processing: The influence of text structure, background knowledge and purpose. Reading Research Quarterly, 20 (1), 314-326.

Bradac, J. L., et al. (1977) The role of prior message context in evaluative judgments of high- and low-diversity messages. Language and Speech, 20 (4), 295-307.

Champeau De Lopez, C. L. Giancarla, M. B., and Arreaza-Coyle, M. E. Evaluating reading comprehension in EFL. English Teaching Forum Online. 35 (2).30-37.

Corkill, A. J., Glover, J. A., & Bruning, R. H. (1988) Advance organizers: Concrete versus abstract. Journal of Educational Research, 82, 76-81.

De Vries, H. (1998) The place of function words in language and in NLP. [Online]. [Expired Link].

Goetz, E. T., & Armbruster, B. B. (1980) Psychological correlates of text structure. In R. J. Spiro, B. C. Bruce, & W. F. Brewer (Eds.) Theoretical issues in reading comprehension: Perspectives from cognitive psychology, artificial intelligence, linguistics, and education. (pp. 201-220). Hillsdale, NJ: Erlbaum.

Herber, H. & Gray, W. S. (1960) Reported in Gray, W. S. (1960). The major aspects of reading. In H. M. Robinson (Ed.) Sequential development of reading abilities. Supplementary Educational Monographs 90. Chicago: University of Chicago Press.

Halliday, M. A. K. (1989). Spoken and written language. Oxford: Oxford University Press.

Houston, S. R. et al. (1987). Judgment analysis vs. Paired comparison as a statistical procedure for capturing policy models in readability prediction. Journal of Experimental Psychology. 56 (3) 24-7.

Hunt, K. (1965). Grammatical structures written at three grade levels. (NCTE Research Report 3). Champaign, IL, USA: NCTE, 1965a. ED 113 735.

Klapper, J. (1992). Reading in a foreign language: Theoretical issues. Language Learning,. 1 (5) 53-56.

Kolers, P. (1968). Introduction to Huey's psychology and pedagogy of reading. Cited in Newton, D. P. (1992). The level of abstraction of textual materials: A new and an old measure compared. Journal of Research in Reading. 15 (2). 117-9.

Newton, D. P. (1992). The level of abstraction of textual materials: A new and an old measure compared. Journal of Research in Reading, 15 (2). 117-9.

McCarthy, J. (1990). Formalization of common sense: Papers by John McCarthy. Edited by V. Lifschitz. Norwood, NJ, USA: Ablex.

Oltman, P. K., Stricker, L. J. & Barrows, T. S. (1988) Native Language, English Proficiency, and the Structure of the Test of English as a Foreign Language. July. TOEFL Research Report Series. Princeton, NJ: Educational Testing Service.

Paivio, A. (1986). Mental representations: A dual coding approach. New York: Oxford University Press.

Pearson, P. D. & Johnson, D. D. (1978). Teaching reading comprehension. New York: Holt Rinehart and Winston.

Powell, J. L. (1989). How well do tests measure real reading? Bloomington, IN, USA: ERIC Clearinghouse on Reading & Communication Skills.

Sadoski, M., Paivio, A., & Goetz, E. T. (1991). A critique of schema theory in reading and a dual coding alternative. Reading Research Quarterly, 26 (4) 463-84.

Silberstein, S. (1987). Let's take another look at reading: Twenty-five years of reading instruction. English Teaching Forum, 25. 28-35.

Thor, M. (1987). Evaluating Linguistic Difficulty. TESOL News, 8 (3). 24-33.

Vavra, E. (2000). Dr. Ed Vavra's KISS Approach to Sentence Structure. [Online]. [Expired Link].

Weaver C. A. & Kintsch, W. (1991). Expository text. In R. Barr, et al. (Eds.) Handbook of reading research. Vol. 2. New York: Longman.

Wharton, W. P. (1980). High imagery and the readability of college history texts. Journal of Mental Imagery, 4, 129-147.

-   top of page     continued   -

Abstract Background Method Results
Conclusion References Appendix 1 Appendix 2

Newsletter: Topic IndexAuthor IndexTitle IndexDate Index
TEVAL SIG: Main Page Background Links Network Join
last Main Page next
HTML:   /   PDF:

[ p. 6 ]