How are PCA and EFA used in language research?
James Dean Brown
University of Hawai'i at Manoa
[ p. 19 ]
Factor analysis techniques, including principal components analysis and Varimax rotation, were used to investigate the degree to which variables were orthogonal (independent of each other). … A large number of linguistic variables were also examined for relationship to EFL Difficulty. Four of these variables were selected on the basis of factor analysis as being orthogonal: syllables per sentence, average frequency elsewhere in the passage of the words that had been deleted, the percent of long words of seven letters or more, and the percent of function words. When combined, they proved to be the best predictors of observed EFL Difficulty.The mechanics of reducing the number of variables in a study can be accomplished in several ways: (a) by going factor-by-factor and using that variable that loads highest on the first factor to represent all the other variables that load heavily on that factor, then turning to the second factor and doing the same thing, and then turning to the third factor, etc., or (b) by saving and using the component or factor scores (that are produced during the PCA or EFA analyses) as variables to represent the components or factors in the study. Clearly then, one way to use factor analyses is for reducing the number of variables in a study and thereby increasing the power of the study.
|* p < .01|
[ p. 20 ]Notice that the patterns of correlations in Table 1 are not very interesting. Sure, 12 of the correlation coefficients in Table 1 are significant at p < .01. But even with all these significant correlation coefficients, each coefficient only tells us about the degree of relationship between whatever two variables are involved. No amount of staring at Table 1 leads to any interesting pattern of overall relationships (except, perhaps, that the MDCT didn't correlate well with any other measure). In addition, there is no way of knowing from Table 1 how much differences in the sample sizes, distributions of scores, and test reliabilities of the variables may have affected the relative values of these correlation coefficients, or the degree to which the number of correlation coefficients has distorted the meaning of the p values.
|VARIABLE||FACTOR 1||FACTOR 2||h2|
|Proportion of Variance||.48||.27||.75|
[ p. 21 ]Previous research had consistently shown that these twelve traits fall into two general categories labeled neuroticism and extraversion (the first six traits representing extraversion and the last six representing neuroticism). The results shown in Table 3 are for Brazilian university students taking the Y/GPI. With the exception of Thinking extraversion, the bold-faced italics loadings are in exactly the pattern of relationships that theory would predict.
|Variables||Rotated 2 Factors|
|Factor 1||Factor 2||h2|
|Lack of agreeableness||0.139||0.527||0.297|
|Lack of cooperativeness||0.468||0.013||0.219|
|Lack of objectivity||0.607||0.018||0.368|
|Proportion of Variance||0.255||0.183||0.437|
[ p. 22 ]Brown, J. D. (2009c). Statistics Corner. Questions and answers about language testing statistics: Choosing the right type of rotation in PCA and EFA. Shiken: JALT Testing & Evaluation SIG Newsletter, 13 (3), 20 - 25. Also retrieved from the World Wide Web at http://http://jalt.org/test/bro_31.htm
[ p. 23 ]