JALT Testing & Evaluation SIG Newsletter
Vol. 13 No. 2. May 2009. (p. 9 - 14) [ISSN 1881-5537]
PDF PDF Version

Insights in Language Testing:

An Interview with Jessica Wu

by Yi-Ching Pan (National Pingtung Institute of Commerce, Taiwan; The University of Melbourne, Australia)

Jessica Wu has been the head of testing development at the Language Training and Testing Center (LTTC) in Taipei since 2006. In this capacity she has been in charge of test validation, research, and development for the General English Proficiency Test (GEPT) and the other foreign language testing programs conducted by the LTTC. In 2005, she earned her Ph.D. from the University of Surrey in Roehampton (UK). Her research interests include oral assessment and the impact of large-scale language testing. This interview was conducted via email in March 2009.

What first sparked your interest in language testing?

Well, it's a long story, considering I began my career at the Language Training and Testing Center (LTTC) in Taipei more than 20 years ago. I started as an instructor of English at the LTTC immediately after completing my MA in education at Ohio State University. As an instructor, I considered teaching to be my main duty, and regarded testing as only a 'by-product' of teaching. I must admit that I didn't consider testing a serious subject until being transferred to the testing development department of the LTTC in 1987. I still remember my first task there was to compile and edit a test paper according to an existing test specification.
My first encounter with a language tester was with Anthony Wu, the LTTC director at the time. He studied language testing under Robert Lado and David Harris at Georgetown University and was knowledgeable about testing. So I learned a lot from him, including how to construct a test and interpret the test scores. Mr. Wu recommended that I read Testing English as a Second Language by David P. Harris, which I found very useful. From that point onwards, I became more and more interested in language testing. However, I was weak at statistics, so I often sought advice from colleagues in the statistics department and tried to pick up as much as I could from texts on psychological measurement.
In 1997, I had an opportunity to study language testing for six-months at the University of Reading. I was very lucky to have the opportunity to meet and study with Cyril Weir and Don Porter there. Professor Weir was kind enough to share his time with me to discuss the contents of a book he was working on. I was also invited to attend lectures by Prof. Porter and others. At the end of my British sojourn Prof. Weir asked if I was interested in undertaking Ph.D. research in language testing. As a full-time employee at the LTTC and a mother of two teenaged children, you can imagine it was a tough decision for me to make. Completing a Ph.D. program meant four to five years of time and effort, and I was not sure if I could cope with the demands. After long discussions with my husband, he suggested I accept this chance to advance academically. With his great support (financially and spiritually), I obtained my Ph.D. in 2005. My thesis was about semi-direct speaking test task difficulty.

[ p. 9 ]

At the LTTC, I am currently heading the testing department, where I focus on test validation, research and development for the General English Proficiency Test (GEPT) and the other foreign language testing programs conducted by the center.

How has the language testing field changed since you first got involved? Are there any trends that concern you?

In recent years, large-scale standardized EFL tests have been adopting a task-based performance assessment approach to test development. An increasing number of large-scale tests now include direct speaking and writing as compulsory components to measure test-takers' English ability. This trend has introduced a number of improvements to language testing in Taiwan. These include more communicative skills in the test content to ascertain examinees' use English for communicative purposes, increasing the proportion of constructed response items, and developing new item formats to promote positive classroom washback effects.
Another trend is that many tests are now delivered via computer, and some even employ computer-adaptive tests or web-based language technology based on IRT psychometric models. Some advocates of computerized language testing suggest that it could replace all existing language tests. However, I agree with Alan Davis's observation that computerized language testing is a new genre that can exist with older modalities but not replace them.
". . . it is important to consider the social context and ethicality of test use, and these are fundamental questions for language testers in the 21st century . . ."

Recently, I've been thinking a lot about the societal implications of language testing. The need to take account of social context has been discussed by Alan Davies, Bernard Spolsky, Elana Shohamy, and Tim McNamara. I agree that it is important to consider the social context and ethicality of test use, and these are fundamental questions for language testers in the 21st century to reflect on. Such questions focus on test use rather than form.
Language testers involved in the development of high-stakes tests should recognize the fact that test results are powerful, and remain skeptical about the validity of our tests. Therefore, we should collect evidence to support the reliability and validity of any tests, and further, to justify the use of our tests. Alan Davis put it quite well that testing is not the same as teaching, which means that as language testers we can not help or encourage learners directly the way classroom teachers do, but we can collect the right evidence to help and encourage learners.

If you had the power to change any one thing about language testing in your country, what would it be?

Taiwan is an examination-oriented society like China, Hong Kong, Japan, Korea, and India where examinations have long been used as tools to facilitate better teaching and learning. Language tests can play a powerful role in influencing teaching and learning, as the GEPT clearly shows. However, every coin has two faces. The more power a test has (the higher the stakes), the more likely the test is to be over-used or mis-used. We've observed a number of emerging negative consequences of the GEPT, and as its developer, we are wondering: How responsible is the test developer for the uses and misuses of tests? What should the role of the test developer be once misuses are identified?

[ p. 10 ]

In the past, people tended to believe that it was not the testers' responsibility to worry about the test takers after a test had been handed to the users. However, I think that testers and stakeholders should share the responsibility to guard against test misuses. It is definitely necessary to have better communication among testers and stakeholders (teachers, researchers, test-takers, score users). For test developers, it is also necessary to disclose information about test-takers that is relevant to educators and the decisions they have to make. Testers and stakeholders should work collaboratively to maximize the beneficial consequences of the test and to minimize the unintended consequences of the test.
Another thing that I'd like to see change about language testing in Taiwan is the development of a code of ethics. Like the ILTA Code of Ethics, I hope that a code of standards for the profession of language testing can be developed in my country. A code of ethics is important because it would demonstrate to the members of our profession what the standards are. It would operate not only as a reminder of what members of the profession should expect of themselves and of one another, but also to demonstrate these standards to others.

The GEPT, which nearly 14% of the Taiwanese people have taken, has been widely used for various purposes such as university admissions, academic placement, graduation criteria, hiring, and promotions. As a person who helped develop that test, what do you feel it was designed to accomplish?

You're right that the GEPT has become a household name in Taiwan in both educational and professional circles. To date, a total of 3.3 million Taiwanese EFL learners have registered for the test since its launch in 2000. As someone who has been involved in its development, I'm proud to witness its success. The GEPT started as an in-house research project at the LTTC in 1997. Aspiring to develop a public language test that could induce beneficial washback for EFL classes in Taiwan, the LTTC invited a number of well-established EFL educators from different parts of the country to form the GEPT Advisory Board and the GEPT Research Committee. Two years later, Taiwan's Ministry of Education recognized that these efforts were in accord with its promotion of lifelong learning and therefore decided to sponsor the GEPT project. Without the support of the GEPT Advisory Board, the GEPT Research Committee, and the government, the GEPT could never have come to fruition in such a short period.

[ p. 11 ]

The GEPT is a five-level criterion-referenced EFL testing system that was developed in response to comments by educators and by employers from various industries about the general lack of ability to communicate in English due to 'old-fashioned' approaches to English education in Taiwan, which has over-emphasized the importance of grammatical accuracy. In other words, it is hoped that the GEPT can not only assess learners' knowledge of English but also their ability to use English in real life situations. Therefore, each level of the GEPT consists of listening, reading, writing, and speaking tasks. That was considered a rather revolutionary move in comparison with Taiwan's paper-and-pencil high school and university entrance exams, which do not assess listening and speaking skills. Before the GEPT was available, Taiwan's EFL educators thought it would be impossible to administer listening and speaking tests on a large scale. However, the GEPT has proved those concerns to be incorrect. Now, not only has language assessment become a topic of wide discussion in Taiwan, but the GEPT has also brought about positive washback effects. The most significant effect is that productive skills of writing and speaking are receiving more attention from teachers and learners, as reported in an impact study (Wu & Chin, 2006) and by many students and teachers of English in high schools and universities (Wu, 2008). It's worthwhile noting that the GEPT has successfully promoted a shift in English teaching and learning to a more communicative orientation. Such an influence can be attributed to successful interactions between the GEPT and teachers. More broadly, we can see a valuable reciprocal relationship between teaching and testing, which is exactly what the GEPT project has aimed to accomplish.

What research projects are you working on now - and which do you hope to become involved in?
"The primary concern of any language test revision process should be to ensure that the test reflects as closely as possible real-life language use contexts and results in favorable learning outcomes."

I'm currently working on the GEPT Revision Project and may continue to work on that project for some years. The primary concern of any language test revision process should be to ensure that the test reflects as closely as possible real-life language use contexts and results in favorable learning outcomes. Although the GEPT has had some positive effects on the teaching and learning of English, there's always room to improve its quality. Based on the productive dialogue between the GEPT and teaching professionals, directions for revising that test have been identified. Let me cite two examples. First, based on score data and the opinions of local teachers, it was proposed that 'mini-talk' tasks be added to the Elementary Level Listening Test. And secondly, it was also proposed that longer reading passages with a greater variety of genre types be employed in the High-Intermediate Reading Test.
Guided by the present LTTC executive director, Prof. Kao Tien-en, and academic advisor, Prof. Lin Yaofu, our institute has established a comprehensive research agenda focusing on areas such as validation, reliability, bias reduction, access and accommodations, administration and security, and social consequences (as suggested by Kunnan, 2000, 2004, 2005, 2008). These research aims can help defend the claims about all the LTTC tests with sufficient evidence and convincing argumentation (Bachman, 2005; Bachman & Palmer, forthcoming). I look forward to taking part in some of these research projects.

[ p. 12 ]


Bachman, L. F. (2005). Building and supporting a case for test use. Language Assessment Quarterly, 2 (1), 1-34.

Bachman, L.F. & Palmer, A.S. (forthcoming). Language Assessment in the Real World: Developing Language Assessments and Justifying their Use. Oxford: Oxford University Press.

Davies, A. (1997). Demands of being professional in language testing. Language Testing, 14 (2), 328-339.

Harris, D. P. (1969). Testing English as a second language. New York: McGraw-Hill.

International Language Testing Association. (2000). ILTA Code of Ethics. Retrieved March 15, 2009, from http://iltaonline.com/index.php?option=com_content&task=view&id= 57&Itemid=47

Kunnan, A. J. (2000). Fairness and justice for all. In A. J. Kunnan (Ed.), Fairness and validation in language assessment: Selected papers from the 19th Language Testing Research Colloquium, Orlando, Florida (pp. 1-14). Cambridge, UK: Cambridge University Press.

Kunnan, A. J. (2004). Test fairness. In M. Milanovic & C. Weir (Eds.), European Year of Languages Conference Papers, Barcelona (pp. 27-48). Cambridge, UK: Cambridge.

Kunnan, A.J. (2005). 40 years in applied linguistics: an interview with Alan Davies. Language Assessment Quarterly, 2 (1), 35-50.

Kunnan, A. J. (2008). Towards a model of test evaluation: Using the test fairness and wider context frameworks. In L. Taylor & C. Weir (Eds.), Multilingualism and Assessment: Achieving transparency, assuring quality, sustaining diversity. Papers from the ALTE Conference, Berlin, Germany (pp. 229-251). Cambridge, UK: Cambridge University Press.

Lado, R. (1957). Linguistics across cultures: Applied linguistics for language teachers Ann Arbor: University of Michigan Press.

Lado, R. (1961). Language testing: the construction and use of foreign language tests: a teacher's book. London: Longman.

Lado, R. (1964). Language teaching, a scientific approach. New York, McGraw-Hill

McNamara, T. & Roever, C. (2006). Language testing: The social dimension. Oxford: Blackwell.

Shaw, S. D. & Weir, C. J. (2007). Examining writing: research and practice in assessing second language writing. New York: Cambridge University Press.

Shohamy, E. (2001). The power of tests: a critical perspective on the uses of language tests. New York: Longman.

Shohamy, E. (2006). Language policy: hidden agendas and new approaches. London; New York: Routledge,

Spolsky, B. (1995). Measured words: the development of objective language testing. Oxford: Oxford University Press.

Spolsky, B. (2004). Language policy. New York: Cambridge University Press.

Weir, C. J. (1988 ). Communicative language testing with special reference to English as a foreign language. Exeter: University of Exeter.

Weir, C. J. (1990). Communicative language testing. New York: Prentice Hall.

Weir, C. J. (1993). Understanding and developing language tests. New York: Prentice Hall.

[ p. 13 ]

Wu, J. (2008). Views of Taiwanese students and teachers on English language testing. Research Notes, 34, 6-9.

Wu, R. & Chin, J. (2006). An impact study of the Intermediate level GEPT. Proceedings of the Ninth International Conference on English Language Testing in Asia., pp. 41-65. Taipei, Taiwan: College Entrance Examination Center.

Newsletter: Topic IndexAuthor IndexTitle IndexDate Index
TEVAL SIG: Main Page Background Links Network Join
last Main Page next
HTML: http://jalt.org/test/wu_pan.htm   /   PDF: http://jalt.org/test/PDF/Wu-Pan.pdf

[ p. 14 ]