Shiken: JALT Testing & Evaluation SIG Newsletter
Vol. 1 No. 2 Sep. 1997 (p. 2 - 13) [ISSN 1881-5537]
PDF PDF Version

An overview of the ACTFL proficiency interview:
A test of speaking ability?

Leo Yoffe

The author chose to review the ACTFL OPI because, as a measure of speaking proficiency in a foreign language, this procedure has become extremely influential. Drafters of the ACTFL Proficiency Guidelines as well as the proponents of the OPI are well aware of its washback effect on the curriculum and syllabus design. Liskin-Gasparro (1984) writes that the guidelines took "proficiency-based approach to the functional-notional syllabus" as the point of departure. The proficiency movement spearheaded by ACTFL has generated a great deal of controversy among both practitioners and scholars (for further discussion see Lantolf and Frawley, 1988; Savignon, 1985; Bachman and Savignon, 1986; Byrnes, 1989; Omaggio, 1986). Today the ACTFL Proficiency Guidelines have a strong effect on the content and the teaching methodology of many foreign language courses. The guidelines are used to evaluate foreign language proficiency of secondary teachers in a number of states, and have been accepted as a standard measure to evaluate candidates' suitability for various government and administrative posts requiring FL speaking proficiency.
"the ACTFL Proficiency Guidelines have a strong effect on the content and the teaching methodology of many foreign language courses."

This paper will provide a brief outline of the history of the ACTFL OPI, describe the interview process drawing on personal experience and accounts of other test-takers, and discuss the controversy surrounding the use of the OPI in an academic setting. The discussion will incorporate a critical review of the relevant literature with particular attention on reliability, validity and instructional washback of the oral interview. One section will be devoted to the certification process for the benefit of those interested in becoming OPI raters.

OPI: Claims and Procedures

The ACTFL Oral Proficiency Interview was developed to evaluate speaking proficiency in a foreign language. It is a criterion-referenced, direct, face-to-face interview with only one interviewer present. The interview consists of five stages: the warm-up, level checks, probes, role-play, and wind-down. The role of the 'warm-up' is to put the interviewee at ease, to familiarize him/her with the pronunciation and way of speaking of the interviewer, and to generate topics which can be explored later in the interview. The 'level checks' allow the interviewee to demonstrate his/her ability to manipulate tasks and contexts at a particular level.

[ p. 2 ]

If the interviewer is satisfied with the testee's sustained performance, an attempt will be made to discover the 'ceiling', i.e. to elicit response at the higher level. 'Probes', thus, makes the testee reveal a pattern of weaknesses. A 'role-play' serves as an additional check, to help the interviewer confirm the testee's level. The 'wind-down' brings the interviewer down to a level comfortable for the testee so as to end the OPI on a positive note. The entire interview lasts about 15 minutes in the case of a novice, and can be as long as 35 minutes if a series of probes and level checks are necessary. The interview is taped and a decision is made if the interviewer and a second rater agree on the level. In the case of disagreement, the tape is sent to a third rater. It costs $120 to take the ACTFL OPI. If the testee resides in an area where no certified OPI rater is available, the interview will be conducted by telephone. The same procedure applies.

The ACTFL OPI's History

ACTFL OPI proficiency scales developed out of the FSI (Foreign Service Institute) levels of oral proficiency. These levels ranged from 1 (very basic proficiency) to 5 (native-like proficiency). In the early 1980's, the Common Yardstick project aimed to make the FSI testing procedure more accessible in academia and other foreign-language related fields. To this end, it was considered necessary to establish finer gradations at the lower end of the FSI scale in order to provide test takers with realistically reachable goals. Thus the 0 - 1 range on the FSI scale was subdivided into four levels in the ACTFL OPI Guidelines: 0 - Novice Low; Novice Mid; Novice High. Conversely, the upper end of the FSI scale (levels 3 - 5) were subsumed under the 'superior' level in the ACTFL Guidelines. Labels replaced numerical values to make the test more appealing to the uninitiated user. The elicitation and scoring procedures were further elaborated and defined by Higgs and Clifford (1982).
Each proficiency level consists of five components: function, content, context, accuracy and text type. In the ACTFL OPI Guidelines, 'function' refers to what the learner can do with the language. 'Content' and 'context' refer to the range of topics (personal, professional, and abstract) the learner can handle with confidence and in what setting (formal or informal). 'Accuracy' describes the extent of phonological and syntactical precision. Finally, 'text type' refers to the discourse complexity of the testee, i.e. whether the subject speaks in discrete words, unconnected sentences or extended, planned paragraphs.

Certification Process

[ p. 3 ]

To become a certified OPI rater a candidate must already be rated as 'superior' or 'native' in the language. The candidate must first attend a four-day workshop organized by ACTFL. The first day is devoted to the general explanation of the Oral Proficiency Interview, its intended purpose and generic assessment criteria. All subsequent sessions are language specific, i.e. candidates are grouped according to the language of their expertise. During these training sessions mock interviews are conducted with volunteer testees. Each interview is approximately 15 minutes long, and followed by a discussion of the strong and weak points of the tester's behavior. The workshop participants will be guided by the group leaders (typically experienced raters from ACTFL) in rating each sample according to the guidelines.
The second phase of the certification process has to be completed within six months. During this time prospective raters are required to conduct 25 interviews in two cycles, and send the recordings to the ACTFL office. Of these, 5+3 randomly chosen samples will be evaluated. On two of the first five samples the ratings (of the interviewer and the trainer) must agree precisely; in the other three sub-level disagreement can be acceptable, e.g. mid-intermediate vs. high-intermediate but not low-intermediate vs. high intermediate. Tapes are returned with detailed comments.
Certification is valid for two years, ostensibly to ensure continued proficiency of the rater. After that, a rater needs to re-qualify. Certification process including tester training manual, cue cards and role-play cards costs between US $575 and US $1,500 if taken in the United States, but quite a bit higher when taken in Japan.

OPI: Critique and Literature Review

Despite the wide use of the OPI, questions have been raised about the construct validity of this procedure. Does the interview actually measure what it is supposed to? Tester training manual defines the OPI as:
. . . a standardized procedure for the global assessment of functional speaking ability or oral proficiency. (Ch. 1-1)
Nowhere in the text, however, is the definition of the 'oral proficiency' provided. If we do not know exactly what it is the OPI tests, then any claim of its usefulness as an accurate evaluative mechanism is highly suspect. Van Lier (1989) goes as far as to suggest the following, somewhat facetious, yet defensible definition:
oral proficiency consists of those aspects of communicative competence that are displayed and rated in oral proficiency interviews. (p. 493)

[ p. 4 ]

While an exaggeration, this definition points to the problem of identifying the abilities which are subsumed under the nebulous heading of 'oral proficiency'.
"the OPI can hardly claim to be a 'natural conversation in the target language'".

As most of the talk occurs in the interactive setting, it is reasonable to assume that a conversation is the most representative demonstration of one's functional speaking ability. A conversation is characterized by a variety of factors such as mutual topic nomination and equal duties in negotiation of meaning (van Lier, 1988). Seen from this perspective the OPI can hardly claim to be a "natural conversation in the target language."

[ p. 4 ]

Procedure of an ACTFL OPI

First of all, in the course of an interview, it is an interviewer who brings in, then chooses to pursue or abandon a topic. The interviewee's input is limited to answering questions. Thus, topic nomination rests solely within the power of the interviewer. I can personally attest to the asymmetric roles of the interviewer and the interviewee in the OPI format. During an oral proficiency interview in Russian which lasted over 35 minutes the tester nominated all topics and carefully monitored any shift in the content of the conversation, I was given sufficient time to respond to a question but no opportunity to elaborate or introduce a new topic.
During the interview the tester and the testee are clearly not in equal position. The asymmetry is not specific to the OPI but is inherent in the notion of an 'interview' as an exchange wherein one person solicits information in order to arrive at a decision whereas the interlocutor produces what he/she perceives as most valued. The interviewee is, in most cases, acutely aware of the ramifications of the OPI rating and is, consequently, under a great deal of stress. On those occasions when an interview is taken for self-diagnostic purposes, the stress factor is considerably lessened.
Another problem lies in the form-function dichotomy. The interview purports to assess functional speaking ability yet the tester training manual strongly encourages the raters to pay careful attention to the form of the language produced rather than to the message conveyed. A Russian OPI rater reported that she tries to elicit samples of 'past tense' 'subjunctive' and progressively more complex structures to ascertain the appropriate level of proficiency (Lovick, personal communication, 1994).

[ p. 5 ]

Role-play situations also present a problem. While they are considered an additional level check, the very nature of this activity demands some acting ability. Cultural as well as personal resistance to assuming a role may interfere with the intended purpose of this speech elicitation device. It is, therefore, important to ensure that the type of behavior called for in a role-play is not incompatible with the sociocultural and personal parameters of the interviewee.
The government (FSI) scale and the ACTFL guidelines presume that a 'zero' level and a 'perfect' level of proficiency exist. As the OPI is a criterion-referenced measure, what norm is the point of reference? The lowest point on the ILR scale, an immediate predecessor to the ACTFL OPI is 0 characterized by the absence of speaking proficiency. It is, however, very hard to determine if the testee's level corresponds to the 'absolute zero'. Even if no oral production is observable, the testee may be able to function using the sound system of the language, or by relying on cognates. Conversely, language is a constantly evolving system, so it is impossible to achieve total proficiency in it.
The norm against which the testee's performance is rated is the functional contextual and accuracy level of proficiency of an educated native speaker (ENS). ACTFL guidelines do not provide an operationalized definition of an ENS. Using ENS as a reference point opens doors to a host of related questions: Whom do we consider to be an educated native speaker? Educated native speakers of a language do not form a homogeneous group. It is possible that a lawyer speaks differently from an educated housewife. Ingram and Wylie note that ". . . it is perfectly possible to have 'native housewives who are '3s', semi-educated 'natives' who are '3+s'". How do we interpret colloquial speech among native speakers? It may deviate from the standards prescribed by the institutional grammarians. Not unimportant is also the issue of pronunciation (accuracy). The plethora of language varieties makes the evaluation of language performance a very sensitive and problematic issue.
In the light of these considerations, Bachman and Savignon (1986) make the following recommendations for revising the guidelines.

[ p. 6 ]

OPI: Definitional Approach vs. Principled Approach

In 1988 James Lantolf and William Frawley argued against a definitional approach to oral proficiency and in favor of a principled approach, based on sound theoretical considerations. They quoted their 1985 article in which they called the OPI 'a criterion-reductive, analytically derived, norm-referenced test of how well an individual can deal with an imposition.' In the present article they labeled the OPI inauthentic in that experimenters cannot assume that subjects have defined a situation in the way the experimenters intended (Wertsch, Minick, and Arns, 1984, p. 160).
They noted that several states are moving toward the establishment of oral proficiency standards for teacher certification and for bilingual certification. The Universities of Minnesota and Pennsylvania are moving toward the establishment of 'intermediate-mid' as a minimum standard for students exiting its two-year language programs.
The authors call the OPI theoretically and empirically unsound and believe its institutionalization may be harmful. In the future, people who cannot meet specified requirements may be prevented from the study of languages, even if such study is only for personal enjoyment. Other people who fail to measure up to requirements may interpret such results to mean that they have no aptitude for languages.
They quote Schulz (1986) as saying that the OPI Guidelines should be prevented from penetrating any further into the foreign language curriculum until a sound theory of proficiency has been established and adequately evaluated. The fact that it is possible to devise tests on which individuals score arbitrary points does not mean that the quality being measured by the test is really metric. The illusion is provided by the scale. (Lewontin, Rose, and Kamin, 1984, p. 91) ACTFL provides no justification that oral proficiency can be so scaled. They must justify the number of levels in the scale. Why 5 or 12 or 3? Why not 6 or 95?
Volmer (1981, p. 152) defines proficiency quite nicely: 'Proficiency is what proficiency tests measure.' A theory of proficiency must be developed, independent of psychometrics. Once a theory of proficiency is proven by empirical research, then psychometrics can be re-introduced. But even this will be difficult because proficiency is not a construct that can be formalized in terms of a taxonomy of items, no matter how long or detailed.

The OPI does not measure communicative ability

[ p. 7 ]

In 1988 Raffaldini wrote an article discussing the ACTFL OPI in relation to current models of communicative skills, arguing that the OPI fails to evaluate important aspects of the communicative abilities of learner. To its credit, the OPI does allow a thorough assessment of grammatical competence, but it provides only a partial assessment of discourse competence. The learner in an OPI is dependent upon the tester. Exchanges are initiated by the tester, and the learner never gets a chance to initiate and react to the tester's responses. Thus the OPI only assesses how the learner answers tester-initiated questions.
The testing situation created by the OPI is extremely artificial. Information and opinion exchanges are purposeless, inauthentic, and not directed toward a particular outcome. The OPI range of discourse and socio-cultural contexts is extremely limited, so it does not adequately evaluate communicative ability. On the contrary, the OPI only assesses the subject's ability to exchange factual information and opinions in formal situations using polite language with a stranger.
Raffaldini goes on to report on two situational tests which were developed and administered along with the OPI to American learners who had returned to the U.S. after a year of living in France. The first one was a written multiple-choice test, administered to both native and non-native speakers of French. The second was an oral test. These situational tests assessed more areas of language proficiency in a wider range of language-use situations than did the OPI. They assessed socio-linguistic as well as grammatical competence, and most aspects of discourse competence.


In 1988 Bachman examined how the ACTFL OPI (as embodied in the ACTFL Proficiency Guidelines) provided valid indicators of the ability to communicate through speaking. He concluded that validation was impossible because traits and test methods are confounded in the design of the interview and in the interpretation of the ratings. The confounding of function, content, and structure is a fundamental flaw which makes validation impossible.
"The confounding of function, content, and structure is a fundamental flaw which makes validation [of the ACTFL OPI] impossible".

- continued -

[ p. 8 ]