Simultaneous equations model
Simultaneous equations models are a type of statistical model in which the dependent variables are functions of other dependent variables, rather than just independent variables. This means some of the explanatory variables are jointly determined with the dependent variable, which in economics usually is the consequence of some underlying equilibrium mechanism. For instance, in the simple model of supply and demand, price and quantity are jointly determined.
Simultaneity poses challenges for the estimation of the statistical parameters of interest, because the Gauss–Markov assumption of strict exogeneity of the regressors is violated. And while it would be natural to estimate all simultaneous equations at once, this often leads to a computationally costly non-linear optimization problem even for the simplest system of linear equations. This situation prompted the development, spearheaded by the Cowles Commission in the 1940s and 1950s, of various techniques that estimate each equation in the model seriatim, most notably limited information maximum likelihood and two-stage least squares.
Structural and reduced form
Suppose there are m regression equations of the formwhere i is the equation number, and is the observation index. In these equations xit is the ki×1 vector of exogenous variables, yit is the dependent variable, y−i,t is the ni×1 vector of all other endogenous variables which enter the ith equation on the right-hand side, and uit are the error terms. The “−i” notation indicates that the vector y−i,t may contain any of the y’s except for yit. The regression coefficients βi and γi are of dimensions ki×1 and ni×1 correspondingly. Vertically stacking the T observations corresponding to the ith equation, we can write each equation in vector form as
where yi and ui are T×1 vectors, Xi is a T×ki matrix of exogenous regressors, and Y−i is a T×ni matrix of endogenous regressors on the right-hand side of the ith equation. Finally, we can move all endogenous variables to the left-hand side and write the m equations jointly in vector form as
This representation is known as the structural form. In this equation is the T×m matrix of dependent variables. Each of the matrices Y−i is in fact an ni-columned submatrix of this Y. The m×m matrix Γ, which describes the relation between the dependent variables, has a complicated structure. It has ones on the diagonal, and all other elements of each column i are either the components of the vector −γi or zeros, depending on which columns of Y were included in the matrix Y−i. The T×k matrix X contains all exogenous regressors from all equations, but without repetitions. Thus, each Xi is a ki-columned submatrix of X. Matrix Β has size k×m, and each of its columns consists of the components of vectors βi and zeros, depending on which of the regressors from X were included or excluded from Xi. Finally, is a T×m matrix of the error terms.
Postmultiplying the structural equation by, the system can be written in the reduced form as
This is already a simple general linear model, and it can be estimated for example by ordinary least squares. Unfortunately, the task of decomposing the estimated matrix into the individual factors Β and is quite complicated, and therefore the reduced form is more suitable for prediction but not inference.
Assumptions
Firstly, the rank of the matrix X of exogenous regressors must be equal to k, both in finite samples and in the limit as . Matrix Γ is also assumed to be non-degenerate.Secondly, error terms are assumed to be serially independent and identically distributed. That is, if the tth row of matrix U is denoted by u, then the sequence of vectors should be iid, with zero mean and some covariance matrix Σ. In particular, this implies that, and.
Lastly, the identification conditions require that the number of unknowns in this system of equations should not exceed the number of equations. More specifically, the order condition requires that for each equation, which can be phrased as “the number of excluded exogenous variables is greater or equal to the number of included endogenous variables”. The rank condition of identifiability is that, where Πi0 is a matrix which is obtained from Π by crossing out those columns which correspond to the excluded endogenous variables, and those rows which correspond to the included exogenous variables.
Estimation
Two-stages least squares (2SLS)
The simplest and the most common estimation method for the simultaneous equations model is the so-called two-stage least squares method, developed independently by and. It is an equation-by-equation technique, where the endogenous regressors on the right-hand side of each equation are being instrumented with the regressors X from all other equations. The method is called “two-stage” because it conducts estimation in two steps:If the ith equation in the model is written as
where Zi is a T× matrix of both endogenous and exogenous regressors in the ith equation, and δi is an -dimensional vector of regression coefficients, then the 2SLS estimator of δi will be given by
where is the projection matrix onto the linear space spanned by the exogenous regressors X.
Indirect least squares
Indirect least squares is an approach in econometrics where the coefficients in a simultaneous equations model are estimated from the reduced form model using ordinary least squares. For this, the structural system of equations is transformed into the reduced form first. Once the coefficients are estimated the model is put back into the structural form.Limited information maximum likelihood (LIML)
The “limited information” maximum likelihood method was suggested M. A. Girshick in 1947, and formalized by T. W. Anderson and H. Rubin in 1949. It is used when one is interested in estimating a single structural equation at a time, say for observation i:The structural equations for the remaining endogenous variables Y−i are not specified, and they are given in their reduced form:
Notation in this context is different than for the simple IV case. One has:
- : The endogenous variable.
- : The exogenous variable
- : The instrument
where, and λ is the smallest characteristic root of the matrix:
where, in a similar way, .
In other words, λ is the smallest solution of the generalized eigenvalue problem, see :
K class estimators
The LIML is a special case of the K-class estimators:with:
- κ=0: OLS
- κ=1: 2SLS. Note indeed that in this case, the usual projection matrix of the 2SLS
- κ=λ: LIML
- κ=λ - α : estimator. Here K represents the number of instruments, n the sample size, and α a positive constant to specify. A value of α=1 will yield an estimator that is approximately unbiased.
Three-stage least squares (3SLS)
Using cross-equation restrictions to achieve identification
In simultaneous equations models, the most common method to achieve identification is by imposing within-equation parameter restrictions. Yet, identification is also possible using cross equation restrictions.To illustrate how cross equation restrictions can be used for identification, consider the following example from Wooldridge
y1 = γ12 y2 + δ11 z1 + δ12 z2 + δ13 z3 + u1
y2 = γ21 y1 + δ21 z1 + δ22 z2 + u2
where z's are uncorrelated with u's and y's are endogenous variables. Without further restrictions, the first equation is not identified because there is no excluded exogenous variable. The second equation is just identified if δ13≠0, which is assumed to be true for the rest of discussion.
Now we impose the cross equation restriction of δ12=δ22. Since the second equation is identified, we can treat δ12 as known for the purpose of identification. Then, the first equation becomes:
y1 - δ12 z2 = γ12 y2 + δ11 z1 + δ13 z3 + u1
Then, we can use as instruments to estimate the coefficients in the above equation since there are one endogenous variable and one excluded exogenous variable on the right hand side. Therefore, cross equation restrictions in place of within-equation restrictions can achieve identification.