Divergence and Convergence, Educating with Integrity: Proceedings of the 7th Annual JALT Pan-SIG Conference.
May. 10 - 11, 2008. Kyoto, Japan: Doshisha University Shinmachi Campus. (pp. 36 - 46)
Correlation between STEP BULATS writing and TOEIC® scores
by Michihiro Hirai (Kanagawa University)
The author analyzed the results of STEP BULATS Writing Tests administered to a group of 559 Japanese (predominantly businesspeople) from September 2004 to December 2007 and found the correlation coefficient between their scores and the TOEIC scores to be .69 for the entire score range. The correlation coefficient for the upper end of the range (TOEIC ≧ 800) was .46, and it is noteworthy that 50.4% of the 349 test-takers in this range failed to exhibit the business English writing skills expected of competent international businesspeople as measured by the STEP BULATS. The author attributes this relatively low performance in the STEP BULATS Writing Test primarily to lack of exposure to business practice and vocabulary.
Keywords: test score correlation, TOEIC, BULATS, business writing test, business writing skills
In recent years, the growing need to improve the English skills of corporate employees amid the accelerating trend toward globalization of the economy has boosted public interest in English tests, especially those which claim to measure “business English” skills. Generally, in assessing the real value of any given test, caution should be exercised as to whether the test addresses the real needs of the organization that employs it. In actuality, however, two of the most important requirements (Hirai 2002b) regarding English tests used by an organization
— namely, alignment with its business objectives and the validity of the test itself (construct validity and content validity) — are often overlooked by management, especially by the personnel department that administers them. A case in point is the widespread use of certain tests that measure receptive (reading and listening) skills only, most notably the TOEIC® test. While it is open to question whether English tests focusing on receptive skills can adequately measure the test-takers’ productive (speaking and writing) skills, and whether the currently most popular tests cover most of the language skills required in practical situations (Hirai, 2002a; Chapman, 2005) they are still used by the vast majority of companies in certain parts of the world, particularly in Japan.
This gap in perception seems to result partly from the dearth of reports highlighting the inappropriateness of using receptive skill tests as the sole measure of employees’ language performance in actual business situations, where language production and familiarity with business vocabulary play more significant roles. In the first place, the amount of statistical data showing the degree of correlation between receptive skill test scores and productive skill test scores has been rather limited.
". . . the amount of statistical data showing the degree of correlation between receptive skill test scores and productive skill test scores has been rather limited."
Among the often cited data in this field is a series of reports from Educational Testing Service (ETS), the organization which developed the Test of English for International Communication (TOEIC) and/or The Chauncey Group International, which administers that test. Woodford’s study (1982, pp.14-15) cites a correlation coefficient of .83 between the direct Language Proficiency Interview (LPI) score and the TOEIC listening part score and a correlation coefficient of .83 between a “direct writing measure” and the TOEIC reading score, suggesting a “high degree of correlation” between productive skill scores and receptive skill scores. The TOEIC Technical Manual (1998, pp.1-2) reports a correlation coefficient of .74 between a direct speaking measure and the TOEIC score (both for the total test and listening part).
Meanwhile, to provide more independent data on this topic, Hirai (2002a) conducted a statistical study on the correlation between productive skills and TOEIC scores. Based on data collected at an internal corporate language institute, he reported a correlation coefficient of .78 between the institute’s interview test and TOEIC scores, and a correlation coefficient of .66 between the BULATS Writing Test and TOEIC scores. Whereas the sample for the former was reasonably large (N = 475), that for the latter was relatively small (N = 102). The BULATS Writing Test, which is part of the Business Language Test Service (BULATS) Test Suite developed by the University of Cambridge English for Speakers of Other Languages (University of Cambridge ESOL Examinations), is designed to test business English writing skills.
This report adds to the body of test correlation research by investigating how well business writing skills as measured by the STEP BULATS Test Suite run parallel with TOEIC Test performance. It should be noted that the BULATS Test Suite is now administered in Japan by The Society for Testing English Proficiency (STEP). A brief description of the STEP BULATS Test Suite appears in the section titled STEP BULATS Test Suite.
In this study the author also attempts to shed light on plausible reasons for the relatively weak and significantly varying performance in business writing, especially among advanced learners of English in Japan.
The author analyzed the score data of 644 individuals (original sample) collected by STEP who had taken both the TOEIC and the STEP BULATS Writing Test between September 2004 and December 2007.
In compiling meaningful score data, care was taken to eliminate items which can be considered practically duplicate, namely the scores of the same individuals (i.e., repeaters) recorded more than once within the timeframe in question, in which case the best STEP BULATS Writing level (score) was taken and the rest were discarded. Also, out of consideration for reliability, the data of individuals who took the STEP BULATS Writing Test and the TOEIC Test more than 24 months apart were eliminated. As a result, the score data of 559 test-takers out of the entire sample were finally selected for this study.
The results of the STEP BULATS (as well as the BULATS) Writing Test are reported in discrete levels ranging from 0 to 5 (more specifically, from levels 0, 0+, 1–, 1=, 1+ to levels 5–, 5=, and 5+) instead of scores. For the sake of statistical handling, these levels are converted to numbers by assigning -0.3 to "–" and +0.3 to"+": for example, Level 1+ is represented as 1.3 and Level 5– as 4.7.
Note that the STEP BULATS and the BULATS Test Suites are identical in content and format and thus the scores of the two tests can be treated on a comparable basis. Note further that the TOEIC has been modified somewhat “by aligning questions with everyday language scenarios that happen in today’s workplace” (May 2006 for the open test; April 2007 for the Institutional Program). The scores of the old and new TOEIC, however, can also be treated equally, since the continuity of the test scores has been maintained, according to ETS’s claim.
STEP BULATS Test Suite
To clarify this study, it would be worthwhile to provide some basic information on the BULATS. The BULATS Test Suite is designed to test the foreign language proficiency of individuals in English, German, French, or Spanish. The STEP BULATS Test Suite offered in Japan is essentially identical in content to the English version of the BULATS. Each Test Suite consists of four tests: Standard, Computer-based, Speaking, and Writing. The first two measure receptive skills, whereas the other two measure productive skills. Each of these tests covers areas such as descriptions of jobs, companies and products, travel, management and marketing, customer service, planning, reports, phone messages, business correspondence, and presentations (BULATS, 2008).
Furthermore, the actual tasks given in each Test Suite are designed to represent a variety of practical tasks encountered in workplaces such as understanding a business-related article, taking a phone message, and writing a letter or a report. As shown in the extract of a sample test in
Appendix A, the texts are reasonably long (e.g., 530 words) and complex, interspersed with quite a few advanced vocabulary items and business phrases such as “economies of scale” and "clashes of corporate cultures.” Thus, the Test Suite adequately covers the general business domain, yet without going into excessively technical or professional terms. The Writing Test consists of two tasks, each of which requires the test-taker to compose from scratch a short e-mail message, letter, or memo. As shown in the sample task from the 45-minute Writing Test in
Appendix B, the test-taker is expected to exhibit his/her ability to write a well-organized, persuasive report or letter, typical of tasks that may often come up in actual business situations.
In sum, the STEP BULATS Test Suite is specifically designed to measure the test-taker's English skills in work situations and environments, a significant departure from general English tests.
The BULATS Writing levels were found to correlate only moderately with TOEIC Total scores, as shown in Figure 1. The correlation coefficient was found to be .69 (between .64 and .73 at a confidence level of 95%)，which was very close to the .66 that was reported in the author’s previous study. In any case, it was considerably lower than the correlation coefficient of .83 reported by Woodford (1982, p.15) and The Chauncey Group International (1998, p.1-2).
When the entire sample was divided into two groups according to TOEIC scores, the correlation coefficient tended to decrease for each group. For example, when the entire sample was divided at a TOEIC cut-off point of 800, the correlation coefficient decreased somewhat to .62 (between .53 and .70 at a confidence level of 95%) for the group with TOEIC scores below 800 and it decreased significantly to .46 (between .37 and .54 at a confidence level of 95%) for the group with TOEIC scores at or above 800, as shown in Figures 2 and 3. Furthermore, when the cut-off point was raised to a TOEIC score of 900, the correlation coefficient dropped to as low as .27 for the top group (with TOEIC scores more than or equal to 900) as shown in Figure 4.
Figures 5 and 6 show the distributions of STEP BULATS Writing levels at TOEIC scores of 800 and 900, respectively. As can be seen, the STEP BULATS Writing levels varied significantly even at these TOEIC scores, which are considered very high in terms of general English.
Analysis and Discussion
Analysis of Correlation Coefficients
Based on this data, a few observations are in order. To begin with, the overall correlation coefficient of .69 should not be considered high enough to justify the use of TOEIC Test scores as a meaningful indicator of business writing skills. Furthermore, among individuals with high TOEIC scores (above or equal to 800 or even more so with 900), the correlation tended to become progressively weaker. Even for such groups, the STEP BULATS Writing levels spanned broad ranges, i.e., from 1+ to 5– for the group with TOEIC scores of 800 and above, or from 2= to 5– for the group with TOEIC scores of 900 and above. At TOEIC cut-off points of 800 and 900, the test-takers exhibited widely ranging STEP BULATS Writing levels (Figures 5 and 6). These wide variances partially explain the low correlation coefficients and suggest, although within narrow score ranges, that the TOEIC score is practically meaningless as an indicator of business writing skills for advanced levels.
Compared with the preliminary correlation coefficient (.86) between the STEP BULATS Standard Test scores and the TOEIC Test scores, the correlation coefficient (.69) between the STEP BULATS Writing Test levels and the TOEIC Test scores was also notably low. As for possible reasons, two factors can be considered. First, writing in general is a composite skill involving a variety of subskills and kinds of language knowledge, such as logical thinking, rhetoric, organization, and active vocabulary. All of these demand more experience and ad-hoc training than receptive skills; hence, greater variance in achievement should be expected. Second, business writing requires familiarity with general business practice and business vocabulary in addition to the subskills and language knowledge required for general writing. This, again, can be acquired mainly through experience and/or ad-hoc training, and should be viewed as another cause of greater variance in performance.
These two factors, which are intuitively seen to be important by those who teach business English, are also well addressed by the STEP BULATS Test Suite, particularly the Writing Test. Indeed, they are vital to getting a high score. It is clear from this example that the test-taker needs first to give thought to what kinds of functions are involved in business as well as other factors relevant to the education of staff, then to organize his/her ideas in a persuasive fashion, and finally to put them in writing in a coherent manner following an appropriate rhetorical pattern. Indeed, the STEP BULATS Writing Test demands both general writing skills and familiarity with business practice and vocabulary in good balance, which can be obtained only through experience and/or appropriate training.
The significant difference in correlation coefficient from Woodford’s 1982 report can be explained by referring to the observation presented in the author’s previous study (Hirai, 2002a). In sum, the tasks in the BULATS Writing Test are much more elaborate and more representative of real-life business situations, as illustrated in Appendix B; thus, they reveal more conspicuously the inherently significant variance in performance. Since the content and format of the BULATS and the STEP BULATS tests are identical, this explanation applies to the present study, too.
Finally, the fact that the overall correlation coefficient obtained in the present study (.69) was very close to that reported in the author’s previous study (.66) indirectly endorses the reliability of the STEP BULATS Writing Test.
Possible Effect of Sample Division
While the above observations appear to generally account for the relatively low correlation coefficients between STEP BULATS Writing levels and TOEIC Test scores, particularly among advanced learners of English, the question still remains as to whether the artificial division of the entire sample according to the TOEIC Test score may have some inherent effect of decreasing the correlation coefficient. The fundamental concern here is that, in analyzing correlations between two variables that characterize the members of any given sample, the very fact of dividing the sample according to the value of one of these variables may distort the distribution of that variable to such a degree that would blur the validity of the mathematical formula used to produce the correlation coefficient. Note here that, in general, dividing a sample according to a parameter other than the two variables between which the correlation coefficient is being studied will not in itself affect the validity.
To illustrate the point, Figures 7 through 9 superimpose the distributions of the two variables, namely TOEIC score and STEP BULATS Writing level, on Figures 1 through 3, respectively. While the STEP BULATS Writing levels show a pyramid-shaped distribution in all the three cases, the TOEIC scores show a very skewed distribution in Figures 8 and 9, with the peak occurring close to the division point (TOEIC = 800).
In general, the correlation coefficient rho between two variables x and y is calculated by the following formula:
where μ is the mean and σ is the standard deviation of each variable, and N is the number of (x, y) pairs (size of the sample).
If one is to divide the given sample according to one of the two variables, say x, into two groups at a value much higher than its mean, then the resulting upper group will have a very skewed distribution of the x variable, as shown in Figure 9, with a mean higher than but close to the division point, since the distribution of the x variable is far from symmetrical. Thus, many of the terms (xi - μx)/σx on the left side of the mean (i.e., for xi ≦ μx) tend to be smaller than in more balanced (e.g., normal) distributions because there are no members below the division point, while the distribution of the y variable may not be all that different. Therefore, the correlation coefficient calculated by Equation 1 tends to be smaller than that for the entire group, even if the upper group maintains the same degree of correlation.
The above observation may, at least partially and intuitively, account for the tendency of the correlation coefficient to decrease when the original sample is divided into two parts according to the value of one of the variables (in this case, x), especially for the group corresponding to the narrower range of the value (in this case, x ≧ 800). The author has been unsuccessful so far in finding a statistical theory which might explain this phenomenon and which might provide a formula for calculating the expected amount of decrease in correlation coefficient.
This finding suggests that the mere act of dividing a sample according to the value of one of its variables may affect the correlation coefficient to a certain degree and that, with a sample thus divided, one needs to exercise caution in interpreting relatively low values of correlation coefficient. Therefore, one cannot conclusively state at the present time, solely from the data thus far obtained and analyzed, that the relatively low correlation coefficients between the STEP BULATS Writing Test levels and the TOEIC scores among higher-level test-takers are attributable to any particular situation or condition.
Level of Performance in the STEP BULATS Writing Test
Figure 10 is based on Figure 1, with a regression line and other lines added that indicate the levels desired for international businesspeople. The regression line represents the center line around which the population of the sample is balanced, and thus can be considered, for a given value of one variable, to represent the average expected value of the other variable. It should be noted that the regression line had a slope that was significantly gentler than the diagonal line connecting the lowest possible values to the highest possible values of the two variables (the STEP BULATS Writing Test level and the TOEIC score), which can be regarded as a balanced performance line. It is clear from Figure 10 that the average writing levels were significantly lower than those expected from the balanced performance line, most notably among test-takers with high TOEIC scores.
Koike, et al. (2008) point out that the CEFR level (Council of Europe, 2001) most often cited in a recent poll of 7,354 Japanese people as the minimum level desired for competent international businesspeople is B2, which is equivalent to Level 3 in the STEP BULATS Test (BULATS, 2007). This level typically enables the test-taker to “write more complex messages and non-routine factual letters, if work is checked” (STEP, 2004). From Figure 10, the average TOEIC score corresponding to Level 3 was found to be about 890. Similarly, Figures 3 and 9 suggest that the majority (74.5%) of test-takers with TOEIC scores 800 and above were rated Level 3 or below (50.4% were below Level 3).
As discussed earlier in this paper, this low level of performance in the STEP BULATS Writing Test, especially among advanced learners, can also be attributed to the two factors, namely the complexity and sophistication of writing in general and a lack of exposure to business practice and vocabulary.
". . . the overall correlation coefficient of .69 [is] not high enough to justify the use of the TOEIC Test score as a meaningful indicator of business writing skills."
STEP BULATS Writing levels were found to correlate moderately with TOEIC scores, with a correlation coefficient of .69 for the entire sample (N = 559), well in line with the results (.66) of the author’s previous study conducted in 2002. When the sample was divided into two groups according to TOEIC score, the correlation coefficient decreased significantly to .46 for the upper-level group with TOEIC scores of 800 and above. It was also revealed that the STEP BULATS Writing levels observed were appreciably lower than those expected of competent international businesspeople, most notably in the upper-level group.
The author considers the overall correlation coefficient of .69 not high enough to justify the use of the TOEIC Test score as a meaningful indicator of business writing skills. The author further attributes the test-takers’ relatively low performance in the STEP BULATS Writing Test, especially toward the higher end of the spectrum, to their general lack of exposure to business practice and vocabulary, which plays a significant role in that test.
As for the relatively low correlation coefficients between their STEP BULATS Writing levels and TOEIC scores observed in the upper-level groups, the author finds it premature to attribute this also to lack of exposure to business practice and vocabulary, since the mere act of dividing the sample according to the value of one of its variables is also found to be a potential cause of decrease in the correlation coefficient.
The author suggests that in assessing business writing skills, a test designed specifically for the business community, with practical business situations in mind, such as the STEP BULATS Writing Test be employed, instead of general-purpose tests focusing on receptive skills such as the TOEIC.
The author wishes to thank the staff of the STEP BULATS team at STEP, particularly Mr. Koshizuka, for their valuable assistance and generosity in sharing the test data.
Business Language Testing Service (BULATS), University of Cambridge ESOL Examinations. (2008). The BULATS Test. Retrieved July 3, 2008 from http://www.bulats.org/tests/index.php
Business Language Testing Service (BULATS), University of Cambridge ESOL Examinations. (2007). BULATS Candidate Handbook.
Retrieved July 3, 2008 from http://www.bulats.org/handbook/index.php
Chapman, M. (2005). A case study of the need for change in the language testing policies of a Japanese corporation, JLTA Journal, 8, 51-67
Chauncey Group International, The Ltd. (1998). TOEIC technical manual. Princeton, NJ: The Chauncey Group International, Ltd.
Council of Europe. (2001). Common European Framework of Reference for Languages: Learning, teaching, assessment. Cambridge: Cambridge University Press
Hirai, M. (2002a). Correlations between active skill and passive skill test scores. Shiken: JALT Testing & Evaluation SIG Newsletter, 6 (3), 2-8. Retrieved June 12, 2008 from http://www.jalt.org/test/hir_1.htm
Hirai, M. (2002b). Bijinesupaason no tame-no eigo chou-kouritsu benkyouhou [Maximally Efficient Approaches to Studying English for Businesspeople]. Tokyo: Nihon Jitsugyo Shuppan.
Koike, I., et al. (2008). Kigyou ga motomeru eigo-ryoku chousa houkokusho: Daini gengo shuutoku kenkyuu wo kiban to suru shou, chuu, kou, dai no renkei wo hakaru eigo kyouiku no sendou-teki kiso kenkyuu (Kenkyuu kadai bangou 16202010: Heisei 16 nendo - Heisei 19 nendo kagaku kenkyuu-hi hojokin, Kiso kenkyuu (A)). Faculty of Languages and Cultures and Graduate School of Applied Linguistics, Meikai University.
Society for Testing English Proficiency, The (STEP) 2004. Sekai kijun no bijinesu eigo nouryoku tesuto STEP BULATS [Global-standard business English test STEP BULATS]. Tokyo: STEP.
University of Cambridge ESOL Examinations. (n.d.) BULATS Sample Paper EN60. Retrieved June 12, 2008, from http://www.bulats.org/sample_papers/writing_en.pdf
Woodford, P. E. (1982). An introduction to TOEIC: The initial validity study. Princeton, NJ: Educational Testing Service