When determining the number of components to retain in a principal components analysis or factor analysis, a common rule is the Kaiser eigenvalue-greater-than-one rule since for uncorrelated random variables the eigenvalues are one. However, researchers must make decisions based on a sample correlation matrix where the expected value of the largest eigenvalue of a correlation matrix formed from sample data will always be greater than one.
Another approach, called Horn's Parllel Analysis, is to compare sample data eigenvalues to those from a randomly generated dataset with the same sample size. To create these eigenvalues, Monte Carlo simulations can be used to randomly generate the data and the mean of the eigenvalues can be determined. A plot of the sample eigenvalues versus the mean eigenvalues from random data or the 95th percentile eigenvalues as shown in Figure 1 can be used to determine the number of components to retain. In this figure, 5 components would be retained.
While not widely used, this technique has garnered a lot of support. In addition, computer programs have been created to ease in the generation of the eigenvalues from sample data. And regression equations have been created to easily generate these comparison eigenvalues.
Figure 1. Parallel Analysis Implementation
Excel file to compute the eigenvalues using the three regression equations (Mean, Mean large samples, 95th percentile).
Email me for copies.
A rationale and test for the number of factors in factor analysis (Horn, Psychometrika, 30(2), p.179-185, 1965).
Comparison of Five Rules for Determining the Number of Components to Retain (Zwick & Velicer, Psychological Bulletin, 99(3), p. 432-442, 1986)
An Improvement on Horn's Parallel Analysis Methodology for Selecting the Correct Number of Factors to Retain (Glorfeld Educational and Psychological Measurement, 55(3), p. 377-393, 1995)
Efficient theory development and factor retention criteria: Abandon the ‘eigenvalue greater than one’ criterion (Patil, Journal of Business Research, 61(2), p.162-170, 2008)
Email me for copies.
Methods to generate eigenvalues from "random" datasets
Regression Equation for Mean Eigenvalues
A Regression Equation for Determining the Dimensionality of Data, Kellie B. Keeling, Multivariate Behavioral Research, Vol. 35, No. 4, 2000, pp 457-468.
Regression Equation for 95th Percentile Eigenvalues
A Regression Equation Predicting 95th Percentile Eigenvalues for the Parallel Analysis Criterion in Principal Components Analysis, Kellie B. Keeling and Robert J. Pavur, International Journal of Operations & Quantitative Management, Vol. 11, No. 2, 2005, pp. 1-12.
Asymptotic Methods: Regression Equation, Normal Order Statistics, and Normal Percentiles
A Comparison of Methods for Approximating the Mean Eigenvalues of a Random Matrix, Kellie B. Keeling and Robert J. Pavur, Communications in Statistics: Simulation and Computation, Vol. 33, No. 4, 2004, pp. 1-16.
Comparisons of different methods
Comparison of Methods to Generate Eigenvalues for Parallel Analysis, Kellie B. Keeling, Robert J. Pavur, In JSM Proceedings, Social Statistics Section, American Statistical Association (Denver, CO: 2008.)