Issues and Opinion on Structural Equation Modeling

Structural Equation Modeling in IS Research - Understanding the LISREL and PLS perspective

Wynne W. Chin

University of Houston

As in many other social science areas, the IS field has seen a substantial increase in the number of submissions and publications using structural equation modeling (SEM) techniques. This is likely due to the proliferation of software packages to perform covariance-based (e.g., LISREL, EQS, AMOS, SEPATH, RAMONA, MX, and CALIS) and component-based (e.g., PLS-PC, PLS-Graph) analysis. The SEM approach is integrative in the sense that it combines the perspective of two research traditions:

an econometric perspective focusing on prediction and,

a psychometric emphasis that models concepts as latent (unobserved) variables that are indirectly inferred from multiple observed measures (alternately termed as indicators or manifest variables).

This resulting combination allows researchers to perform path analytic modeling with latent variables (LVs). Specifically, SEM provides the researcher with the flexibility to: (a) model relationships among multiple predictor and criterion variables, (b) construct unobservable LVs, (c) model errors in measurements for observed variables, and (d) statistically test a priori substantive/theoretical and measurement assumptions against empirical data (i.e., confirmatory analysis).

SEM involves generalizations and extensions of earlier first-generation procedures. By applying certain constraints or assumptions on an SEM analysis, a researcher can end up performing the equivalent of techniques such as canonical correlation, multiple regression, multiple discriminant analysis, analysis of variance or covariance, or principle components analysis.

Naturally, along with the benefits comes the complexity. This virtual discussion will cover various issues that often appear among researchers. The primary focus will be on both the covariance based approach often equated generically as a LISREL analysis and the Partial Least Squares approach. Hopefully, we will not only cover matters generic to social scientists, but also specific to the IS field. To guide the questions, we might look at it from various frames.

One standard approach is to examine the stages in the traditional SEM lifecycle. They are:

Model Specification,

Identification,

Estimation,

Testing Fit, and

Model Modification or Respecification.

Another approach is to examine common mistakes that are made. We can discuss such issues as:

Critical Missing Information. Information that should be included but are often left out from research articles thereby preventing other researcher from reproducing the analysis and building a cumulative tradition.

Mismatch of questionnaire items and subsequent analysis. Survey questions analyzed are often formative in nature or a composite of formative and reflective measures. A LISREL analysis and use of internal consistency measures such as Cronbach’s alpha would be incorrect.

Sole reliance of overall goodness of fit measures. Using only covariance based goodness of fit measures as the primary arbiter of confirmation while ignoring other important measures of model adequacy.

Analyzing second order factors without a purpose. Demonstrating second order factor models without providing an underlying rationale for its subsequent usage.

Lack of Empirical over-identification. Many empirical studies do not perform a strong test of the model/latent variables.

Ignoring the statistical power of models.

Ignoring equivalent models.

Falling into an exploratory mode via initial exploratory factor analysis or using information from the statistical package to modify initial models for better fit.

Premature or inappropriate approach of analysis when either substantive or theoretical knowledge is relatively new.

Finally, we can approach it from an applied perspective. Example questions might include:

When should I consider using Partial Least Squares as opposed to LISREL?

More importantly - how does Partial Least Square differ from LISREL?

How does LISREL compare to path analysis using multiple regression?

Does it make sense to do an exploratory factor analysis prior to using SEM?

How about a confirmatory factor analysis first?

What are the advantages to using SEM for multi-sample or cross-cultural analysis?

To start off, we might consider the following model (Figure 1) as a basis for discussion. The model is a simple two factor model where F1 is hypothesized to affect F2. The data in the form of a correlation matrix is provided in Table 1 for the four measures/indicators (x1,x2,y1,and y2) of their respective factors.

Figure 1. Two factor model with two indicator/measures for each factor.

x1 x2 y1 y2

x1 1.00

x2 .087 1.00

y1 .140 .080 1.00

y2 .152 .143 .272 1.00

Table 1. Sample Data Set (n=1000).

All correlations, given the sample size, are significant but quite low ranging from 0.087 to 0.272. If we use the theoretical model as depicted in Figure 1, what would be the path estimate p linking F1 and F2? The covariance based estimate using software such as LISREL would result in a standardized estimate of p at 0.83 whereas the PLS estimate was 0.22. The standardized loadings of a, b, c, d using the covariance procedure were 0.33, 0.26, 0.46, and 0.59. The PLS estimates resulted in loadings of 0.81, 0.66, 0.73, and 0.85 with corresponding weights of 0.75, 0.60, 0.54, and 0.71. In the case of the covariance estimates, the path estimate of 0.83 is much larger than the observed correlations between the x and y variables where the highest is 0.152. In the case of PLS, the estimate of 0.22 is much closer to the observed correlations. Which estimate should we place confidence in?