AI, Data & Technology

What is a good sample size for Structural Equation Modeling (SEM)

Structural equation modeling is a powerful and flexible extension of the general linear model. Like any statistical method, it features a number of assumptions. These assumptions should be met or at least approximated to ensure trustworthy results. Determining sample size requirements for structural equation modeling (SEM) is a challenge often faced by researchers and data analysts.

According to James Stevens’ Applied Multivariate Statistics for the Social Sciences, a good general rule for sample size is 15 cases per predictor in a standard ordinary least squares multiple regression analysis. Since SEM is closely related to multiple regression in some respects, 15 cases per measured variable in SEM is not unreasonable.

Bentler and Chou (1987) note that researchers may go as low as five cases per parameter estimate in SEM analysis, but only if the data are perfectly well-behaved (i.e., normally distributed, no missing data or outlying cases, etc.). Notice that Bentler and Chou mention five cases per parameter estimate rather than per measured variable.

Measured variables typically have at least one path coefficient associated with another variable in the analysis, plus a residual term or variance estimate, so it is important to recognize that the Bentler and Chou and Stevens recommendations dovetail at approximately 15 cases per measured variable, minimum.

Loehlin (1992) reports the results of Monte Carlo simulation studies using confirmatory factor analysis models. After reviewing the literature, he concludes that for this class of model with two to four factors, the investigator should plan on collecting at least 100 cases, with 200 being better (if possible). Consequences of using smaller samples include more convergence failures (the software cannot reach a satisfactory solution), improper solutions (including negative error variance estimates for measured variables), and lowered accuracy of parameter estimates and, in particular, standard errors – SEM program standard errors are computed under the assumption of large sample sizes. When data are not normally distributed or are otherwise flawed in some way (almost always the case), larger samples are required. A widely accepted rule of thumb is 10 cases/observations per indicator variable in setting a lower bound of an adequate sample size (Nunnally, 1967).

It is difficult to make absolute recommendations as to what sample sizes are required when data are skewed, kurtotic, incomplete, or otherwise less than perfect. The general recommendation is thus to obtain more data whenever possible.

RELATED ARTICLESMORE FROM AUTHOR

Understanding Data Ethics: A Beginner’s Guide

Data Cleaning and Preprocessing with Python

Machine Learning with Scikit-Learn

RELATED ARTICLES MORE FROM AUTHOR