The role of the TOEIC® in a major Japanese company|
by Mark Chapman (Hokkaido University)
Keywords: TOEIC®, criterion-referenced, corporate language assessment policies, communicative language assessment
This paper sets out to investigate how a major Japanese company conducts English language testing for its employees. After establishing what kinds of English skills the employees require, it outlines how the company selects employees for jobs that require English-language ability. This paper questions how effectively the company utilizes the TOEIC® (Test of English for International Communication) in this process. A major point of discussion is whether TOEIC® scores alone can be a valid basis for selecting employees best qualified to represent the company in international business environments.
BackgroundTwo main issues will be addressed here: firstly, the needs of company employees with regard to English-language skills and secondly, how the TOEIC® is actually used in the company.
The corporation (referred to as HLC from now on) is one of the largest companies in Japan and after more than ninety years of business is involved in a very diverse range of industrial areas. HLC has around seventy thousand employees throughout Japan and more than double that worldwide. The considerable scale of the company means that there is a broad range of linguistic skills required of company employees.
Perhaps the skill that is most broadly required throughout the company is reading. HLC has many research laboratories and carries out research in a wide variety of fields. The company's researchers need to be able to read international journals and research papers. Almost all white-collar employees have copious amounts of written communication, mostly through e-mail. There is also a need for some staff to read English-language manuals and contracts.
[ p. 73 ]The need for writing skills is very similar. Researchers are less likely to have to write research papers than they are to read them, but there is clearly some demand for this ability. An area of concern is the ability of non-native English speakers to register international patents. If a significant area of research is not published in English it is unlikely to get wide acceptance. Again there is a limited but real need for some staff to be able to write English-language manuals and contracts.
Many HLC employees involved in doing business overseas need to be able to comprehend spoken English. Employees are regular attenders of international conferences, trade shows and the like where there may not be a need to speak but listening will be essential. Many staff who do not leave Japan also have a need to understand spoken English in the form of telephone calls and video-conferences.
Speaking is a prime requirement for employees who are sent abroad on business. From the most rudimentary grasp of survival English to the most fluent international negotation, some form of spoken English skill will be necessary.
If the company is to be an effective performer in the global business world, then HLC needs to know which of its employees are capable of performing these linguistic tasks and which require further training. If mistakes in employee selection are to be minimized then the language testing that is a key part of this selection process needs to be reliable and as accurate as possible.
The role of the TOEIC® in HLCThe mainstay of HLC's English-language testing program is the TOEIC®. Here we will focus on how the TOEIC® is used throughout HLC. The TOEIC® plays two main roles in HLC. Firstly, it is used to determine which employees are qualified to go overseas on business. Secondly, the TOEIC® is used as a factor in determining promotion decisions.
A benchmark score throughout the company that enables an employee to go overseas on business is 650. If there are employees who have TOEIC® scores below 650 but have a good degree of communicative competence, then HLC may be unduly restricting the number of staff who are capable of representing the company internationally. If there are employees who have a TOEIC® score above 650, but are not competent communicators, then there is the risk that HLC will be sending employees on business abroad who are unlikely to be linguistically capable of representing the company. Detailing these results is beyond the scope of this paper, but they are not likely to have a positive effect on the company's reputation or image.
The role of the TOEIC® in determining promotion is not quite so clear cut. It was reported that TOEIC® 650 was to become the minimum score required for promotion to managerial positions throughout HLC, but this does not seem to have been officially implemented as yet. Different divisions within the company seem to have different policies regarding the use of the TOEIC® in determining promotions. The degree of importance attached to the TOEIC® score and the score required vary, but what is clear is that it is a factor in making a decision that is one of the most important in any employee's career. The company has officially implemented a TOEIC® score of 800 for promotion to senior managerial positions.
What is the TOEIC®?This section will begin with a description of the contents of the TOEIC® and its place in the language-testing world. This will be followed with claims regarding the uses of the TOEIC® and what it actually measures.
The TOEIC® was developed in the US by Educational Testing Service (ETS) in 1979 to "measure English language skills used in international corporations around the globe." (Wilson, 1993, p. 1). According to Gilfert (1995, p. 76), TOEIC® came about as a result of a request from the Japanese Ministry of Trade and Industry to ETS. TOEIC® is a multiple-choice test of English that consists of two main sections: Listening Comprehension and Reading as indicated below in Table 1.
Table 1. Structure of the current TOEIC® test.
LISTENING COMPREHENSION READING COMPREHENSION Part I One picture, four spoken sentences 20 items Part V Incomplete sentences 40 items Part II Spoken utterances, three spoken responses 30 items Part VI Error recognition - underlines 20 items Part III Short conversation, four printed answers 30 items Part VII Reading comprehension - passages 40 items Part IV Short talks, four printed questions and answers 20 items LISTENING: 100 items / READING: 100 items
[ p. 74 ]The time division between the two sections is not equal. There are forty-five minutes for the listening section and seventy-five minutes for the reading section.
TOEIC®'s place in the language-testing world
Tests fall into two main categories, criterion-referenced tests and norm-referenced tests. Bachman (1990, p. 7) defines a norm-referenced test as one in which, "an individual's test score is reported and interpreted with reference to the performance of other individuals on the test." Hughes (1989, p. 17) illustrates this definition with an example. If we want to know how a student performed on a test we can explain his or her performance in terms such as, "the student obtained a score that placed her or him in the top ten percent of candidates who have taken the test, or in the bottom five percent, or . . . she or he did better than sixty percent of those who took it." A test that produces this kind of information is norm-referenced. Hughes goes on to state that a norm-referenced test "relates one candidate's performance to that of other candidates. We are not told directly what the student is capable of doing in the language."
The second kind of test is a criterion-referenced test. In a criterion-referenced test, Bachman (1990, p. 8) states that, "test scores are reported and interpreted with reference to a specific content domain or criterion level of performance." In direct contrast to Hughes' explanation of norm-referenced tests given above, a criterion-referenced test tells us specifically what a student is capable of doing in the language. A criterion-referenced test does not tell us, however how a student did in relation to another. According to Hughes (1989, p. 18) "the purpose of criterion-referenced tests is to classify people according to whether or not they are able to perform some task or set of tasks satisfactorily."
TOEIC® is an example of a norm-referenced test. The question of whether a norm-referenced test is a suitable match for the requirements of HLC employees will be raised in a later section.
What are the uses of the TOEIC® and what does it measure?
Claims made about the TOEIC® will be considered from three separate sources. Firstly, we will examine claims made about the TOEIC® in ETS promotional material. Secondly, claims made about the TOEIC® through research carried out or funded by ETS will be mentioned. Finally, claims made by independent investigators will be addressed.
TOEIC® Promotional Literature
The TOEIC® web site claims that TOEIC® is "the world's most recognized English language test; more than 1.7 million people took the test last year." The opening page of the web site also goes on to say that TOEIC® is used by "corporations and government agencies to assess the English ability of their employees." On the same page there is, "Individuals take the TOEIC® test to track their progress in English Language improvement and demonstrate to employers their ability to use English at work." These claims immediately suggest that the TOEIC® is both a progress test and a proficiency test. The use of the word "ability" is interesting here as there is no indication of what kind of ability is being measured. The TOEIC® web site also claims that TOEIC® is an effective placement test for language schools. A further quote from the web site outlines another claim. "The TOEIC® test helps to demonstrate the effectiveness of your instructional English program, which can be used as a marketing tool to attract new students or as an accountability measure to reassure funding organizations that their training dollars are well spent." Again, it is interesting that there is no consideration of what kind of instructional English program is in place. It appears that TOEIC® is a testing panacea that will be all things to all language programs.
Research carried out by ETS
"If a corporation . . . wants to test the ability of its employees to speak English, employing the TOEIC® test in isolation is unlikely to be the most accurate method available."
In 1989 ETS produced a report investigating to what extent the conversational ability of individuals could be inferred from their TOEIC® score. Wilson's study compared the TOEIC® scores of candidates with their Language Proficiency Interview (LPI) scores. The LPI was developed by the US government and is an extensively used test to measure oral ability. It is scored on a scale of 0 to 5 with 5 being equal to an educated native speaker and 0 indicating no ability. The study was a sizeable one with 285 Japanese candidates, 56 French candidates, 42 from Mexico and 10 from Saudi Arabia. The main findings of the study are as follows: (Wilson, 1989, p. 51)
[ p. 75 ]
This report by Wilson (1989) seems to indicate that a separate speaking test and the TOEIC® will provide different information about examinees. If a corporation, then, wants to test the ability of its employees to speak English, employing the TOEIC® test in isolation is unlikely to be the most accurate method available.
- TOEIC® Listening/LPI correlations were higher than TOEIC® Reading/LPI correlations. The former correlated in the mid- .70's and the latter at 0.70.
- TOEIC® total/LPI correlations were approximately the same as those for the TOEIC® Listening/LPI correlations, but slightly lower in some instances.
Independent Research into TOEIC®
As with ETS-funded research, there is a surprisingly small amount of published research (in English) into the TOEIC®. One article of interest is Childs' (1995) investigation into how the TOEIC® is used in Japanese companies. The main thrust of Child's investigation is whether or not the TOEIC® is a suitable test for measuring students' progress in English. He investigated a group of 113 new employees of a Japanese company. The new employees underwent an intensive one-week English course followed by a half-day of English study once a month for the next four months. The new employees were given a TOEIC® test before the start of the intensive course, at the end of the intensive course, after the second half-day class and finally after the last half-day English class. There were hence a total of four TOEIC® tests administered. After analyzing the results of all the administrations of the test, Childs concluded the following:
Childs' (1995, p. 75) closing comments make for very interesting reading:
- The TOEIC® is reasonably effective at measuring overall group gains in proficiency.
- The TOEIC® is not as effective at measuring the progress of individual learners in the short term. Childs makes the strong statement that "the use of TOEIC® for gauging individual learning is, in general, inefficient or wrong." The reason for this conclusion was the standard error of the total scores was within the range of the expected individual gains. The SEM's in Childs' report are somewhat higher than those given in the TOEIC®'s initial validity study (Woodford, 1982). 43 points for the total score SEM on Childs' investigation, against Woodford's figure of 34.93.
- It is not possible to explain the reasons for students' progress using the TOEIC®. This is again due to the standard error of the scores. Child's dismisses TOEIC® as a diagnostic test.
- The TOEIC® can be an effective tool for comparing the performance of different language schools or programs. Childs qualifies this conclusion with the comment that care is needed as TOEIC® is not especially effective for measuring individual gains. Presumably, if a company sends a large number of employees to different schools or programs, then the group gains can reasonably confidently compared.Company education directors and language schools should be warned that short-term TOEIC results cannot be substituted for more specific measures of learning achievement. Test users await a series of criterion-referenced tests complementary to the norm-referenced TOEIC.And later:Education directors who incorporate TOEIC into their testing programs should do so thoughtfully. They should understand that the long-term solution to many of their needs will be not TOEIC but a series of tests that are in tune with the specific goals and methods of their English education programs.
A more recent study (Hirai, 2002, p. 2-8) also casts doubt on ETS' findings. Hirai investigated the degree of correlation between TOEIC® scores, company interview scores, and BULATS (a writing test) scores. He found that interview scores for intermediate level students (TOEIC® 400-650) correlated at only 0.49 with TOEIC® scores. The BULATS writing test and TOEIC® scores correlated at 0.66. Both these figures are considerably lower than those claimed by ETS.
[ p. 76 ]
"independent research . . . fails to support claims made about the TOEIC® by ETS."
An independent study into the degree of correlation between TOEIC® scores and oral interview scores was also carried out within HLC. The study was undertaken over a one-year period and involved the test scores of 169 company employees who were at an intermediate level. All the subjects had TOEIC® scores in the range of 400 – 650 points. The study examined the rank order correlations between the two sets of test scores and the figures were generally below those generated by ETS. The correlation figures generated in the HLC study were in the 0.5 to 0.6 range, significantly lower than the 0.7 correlation reported in the ETS study of 1989. The lower figures produced in the HLC study may be party attributable to the relatively low proficiency of the subjects and the oral interview used at HLC is not the same as the LPI in the ETS study. However, independent research again fails to support claims made about the TOEIC® by ETS.
Independent research and investigations lead us to question the claims about the uses of the TOEIC® and what it can effectively measure. Childs' study has cast doubt on the accuracy of the figures published by ETS regarding the internal reliability of the TOEIC®. The use of TOEIC® as a reliable progress and diagnostic test is also in doubt. The study performed within HLC has also raised questions about the capacity of the TOEIC® to accurately measure communicative competence. Given these doubts, further independent research into other claims made by ETS about the TOEIC® seems both necessary and overdue.
Changes in the corporate English testing systemThis paper has outlined the possibility that HLC may be over-reliant on the TOEIC® test. The TOEIC® is a reasonably valid measure of listening and reading, but is less valid as an indicator of speaking skills. The next logical step is for the company to consider what action it might take to improve the situation.
Bearing in mind the needs of HLC employees that were described in Section 2, the starting point for proposing a new test must be to select one that will measure the skills that employees need. This is a question of construct validity, or whether the test is actually a fair measure of what it is supposed to be measuring. From this starting point there are two more points that need to be considered. Firstly, should the test be a direct or an indirect one? Secondly, should the test be norm referenced as is the TOEIC® or criterion referenced?
Section 2 of this paper indicated that employees are likely to have a need for all four linguistic skills. Bearing this in mind, the company ideally needs a test or series of tests that reliably measure these four skills. Construct validity is at the core of theoretical testing literature and is the key issue here. According to Hughes (1989, p. 26) "a test, part of a test or a testing technique is said to have construct validity if it can be demonstrated that it measures just the ability which it is supposed to measure. The word construct refers to any underlying ability (or trait) which is hypothesised in a theory of language ability."
". . . the TOEIC® in isolation is not a valid test of communicative competence."
Studies reported in this paper bring into question the construct validity of the TOEIC® test as a measure of communicative competence. If we are to use TOEIC® scores as the sole predictor of the ability to communicate in English then we are likely to be misusing the TOEIC® test. If, alternatively we use TOEIC® scores as an indicator of the ability to read and listen to English we are far more likely to be able to establish construct validity.
Given that we have a reasonably clear idea of what skills HLC employees will need to function in an international business environment (see Section 2) it should be possible to have a clear idea of the theoretical construct that we want to measure. Weir (1988, p. 24) supports this belief with his statement that:It would seem self-evident that the more fully we are able to describe the theoretical construct we are attempting to measure, at the a priori stage, the more meaningful might be the statistical procedures contributing to construct validation that can subsequently be applied to the results of the test.
[ p. 77 ]The construct defined in Section 2 of this paper describes the ability to speak and write English as major requirements for employees and hence if construct validity is to be established HLC must test these two skills.
Direct versus indirect testing
This is another extensively documented distinction addressed in testing literature, including Bachman (1986 and 1990), Hughes (1989) and Weir (1988). Hughes (1989, p. 15), who is a strong advocate of direct testing, defines direct testing as that which "requires the candidate to perform precisely the skill which we wish to measure." Conversely, Bachman (1986, p. 72) labels an indirect test as one "in which test performance is perceived as somehow different from 'actual' or 'normal' performance." Hughes (1989, p. 15) states that, "indirect testing attempts to measure the abilities which underlie (original italics) the skills in which we are interested." ETS claims that the TOEIC® is an indirect measure of communicative competence. It tests skills that are required for effective communication.
There are two main reasons for proposing that HLC adopt two relatively direct tests to supplement (not replace) the TOEIC®. Firstly, this paper has suggested that the TOEIC® in isolation is not a valid test of communicative competence. If HLC cannot accurately select which employees are best able to communicate in English, there is the possibility that the company is doing damage to itself by being represented internationally by staff not well suited to the role. Secondly, new Japanese employees are not usually proficient at communicating in English. If the company does not have a test of spoken and/or written English in place, there is likely to be little motivation for the staff to improve their productive English skills. Developing these skills before they are actually required would seem to be desirable and making direct testing a part of employees' assessment along with the TOEIC® test may well provide some beneficial washback.
Norm-referenced or criterion-referenced testing
The differences between these two forms of testing were detailed above. Given that HLC's need for language tests to determine which employees are qualified to carry out clearly identifiable tasks, the case for criterion-referenced testing seems almost irrefutable. Under a norm-referenced testing system, HLC will discover how its employees' skills relate to one another, clearly desirable information to have. However, norm-referenced tests are far less effective at indicating precisely what the candidates are able to do. We may know that an employee scores in the top twenty percent of employees who have taken the TOEIC® test, but what does the score mean he or she is able to do for the company? This question is not answered by norm-referenced tests, but is directly addressed by criterion referenced-tests. Hughes (1989, p. 18) neatly summarises the advantages offered by criterion-referenced testing:Criterion-referenced tests have two positive virtues: they set standards meaningful in terms of what people can do, which do not change with different groups of candidates; and they motivate students to attain those standards.
Hughes (1989, p. 45) also explains that motivation is derived from criterion-referenced testing not only because what is being studied is of direct relevance to the skills required in the outside world. Motivation is also fostered because if candidates do "perform the tasks at the criterial level, then they will be successful on the test, regardless of how other students perform." This attribute of criterion-referenced testing is essential for HLC's needs. If only a norm-referenced test is used there may well be the assumption that those candidates at the bottom end of the scale are not capable, whereas the skills that HLC requires for certain tasks may be held even by those at the lower end of the norm-referenced bell curve. HLC should not be excluding any employees that are in the possession of skills valuable to the company.
"The TOEIC® mentality is entrenched in the company and changing it will not be simple."
This paper has outlined the need for some degree of change in the way HLC tests the English-language skills of its employees. However, the existing system has clear advantages in its simplicity, clarity and relatively low cost of administration. The TOEIC® also has the advantage of being an established test with the considerable authority of ETS behind it and very widespread use throughout Japanese corporations. Managers within HLC who lack any knowledge of English education and testing know that 850 points is a high score and 350 is a low one. Having a company-wide score of 650 that supposedly provides a limit indicating whether an employee is able to go overseas on business or not provides a clear and simple reference point for managers. The same managers are likely to have neither the time nor the inclination to discover whether staff really are linguistically capable of carrying out their responsibilities. The TOEIC® mentality is entrenched in the company and changing it will not be simple.
[ p. 78 ]A further problem is whether the managers who have the final say in all promotion decisions will be able to adequately understand foreign language test results. Even if a perfectly reliable and valid test can be developed that exactly meets the company's needs, it will be of no benefit if the results are not easily understandable and applicable for the company's management. An in-house test is likely to have the advantage of criteria that are designed at the a priori stage to reflect the company needs and test population and hence the results should be more applicable than those of a commercially available test. The problem comes in the question of authority. Will a test without a 'name' behind it be accepted as a reliable indicator of ability?
If HLC were to choose commercially available speaking and writing tests, then finding suitable tests that met the company's needs and cost would seem to be the main obstacles. Commercially available tests are less likely to offer criteria that are directly relevant to HLC's needs. The company will also be paying for each and every administration of the test and if the test is administered company wide on a regular basis, this may well be a significant expense.
Both an in-house test and a commercially available test have one practical problem in common: the time needed for administration. If both tests are to be direct as recommended, then each candidate will need to be interviewed for the speaking test and each candidate will be required to take a test that requires a writing sample to be produced. If the tests are to be reliable they may need their initial results to be reviewed and verified by a second examiner. This will mean considerable amounts of time to be taken for the interview test and there is likely to be a delay in getting results to candidates to ensure reliable scoring.
These practical problems are not insurmountable, but they do offer disincentives for HLC to change the comfortable existing system. One possible solution would be to introduce the two new tests into just one division of the company initially. The two tests (writing and speaking) could be developed in-house and administered to a relatively small number of employees, not the whole company. Over a two to three year period the results of the two different testing systems could be compared. One system would involve only TOEIC®, the other system would incorporate both TOEIC® and two direct, criterion-referenced tests of speaking and writing. Conclusions could be drawn in terms of the future progress of the subjects. Development of linguistic skills useful to the company could be the main criteria, obviously balanced against the cost of the new system. The administration of such a system is feasible; the question is whether the effort required is excessive in relation to the benefits generated for the company.
ConclusionThis paper has suggested that HLC is excessively reliant on the TOEIC® test in its approach to language testing. This over reliance raises troubling issues for the company, not least of which may be that the current system allows employees to be sent overseas on business who lack the ability to communicate effectively in English. It is difficult to quantify the damage this may do, but it is clearly not a desirable state of affairs for a multinational corporation. It has been suggested that the TOEIC® needs to be supplemented by two direct criterion-referenced tests of speaking and writing. There are practical problems standing in the way of implementing an expanded testing system, but if companies wish to make linguistic ability a key component of their decision-making process the TOEIC® alone is not sufficient.
ReferencesBachman, L. (1990). Fundamental considerations in language testing. Oxford, UK: Oxford University Press.
Brown, J.D. & S. Yamashita (Eds.) (1995). Language testing in Japan. Tokyo, Japan: The Japan Association for Language Teaching.
Childs, M. (1995). Good and bad uses of TOEIC by Japanese companies. In J.D. Brown & S. Yamashita (Eds.), 66-75.
Gilfert, S. (1995). A comparison of TOEFL and TOEIC. In J.D. Brown and S. Yamashita (Eds.) 76-85.
Hirai, M. (2002). Correlations between active skill and passive skill test scores. Shiken: JALT Testing & Evaluation SIG Newsletter. 6 (3), 2-8. Retrived from the World Wide Web at jalt.org/test/hir_1.htm on Septemper 1, 2003.
Hughes, A. (1989). Testing for language teachers. Cambridge: Cambridge University Press.
Weir, C. (1998). Communicative language testing. Hemel Hempstead: Prentice Hall.
Wilson, K. (1989). Enhancing the interpretation of a norm-referenced second-language test through criterion referencing: A research assessment of experience in the TOEIC® testing context. TOEIC® research report number 1. Princeton, NJ: Educational Testing Service.
Wilson, K. (1993). Relating TOEIC® scores to oral proficiency interview ratings. TOEIC® Research summaries number 1. Princeton, NJ: Educational Testing Service.
Woodford, P. (1982). An introduction to TOEIC®: The initial validity study. TOEIC® Research Summaries. Princeton, NJ: Educational Testing Service.