10. Gram-Schmidt Orthogonalization and Regression
Michael Friendly
2024-10-05
Source:vignettes/aA-gramreg.Rmd
aA-gramreg.Rmd
This vignette illustrates the process of transforming a set of variables to a new set of uncorrelated (orthogonal) variables. It carries out the Gram-Schmidt process directly by successively projecting each successive variable on the previous ones and subtracting (taking residuals). This is equivalent by replacing each successive variable by its residuals from a least squares regression on the previous variables.
When this method is used on the predictors in a regression problem,
the resulting orthogonal variables have exactly the same
anova()
summary (based on βType Iβ, sequential sums of
squares) as do original variables.
Setup
We use the class
data set, but convert the character
factor sex
to a dummy (0/1) variable male
.
library(matlib)
data(class)
class$male <- as.numeric(class$sex=="M")
For later use in regression, we create a variable IQ
as
a response variable
class <- transform(class,
IQ = round(20 + height + 3*age -.1*weight -3*male + 10*rnorm(nrow(class))))
head(class)
## sex age height weight male IQ
## Alfred M 14 69.0 112.5 1 103
## Alice F 13 56.5 84.0 0 110
## Barbara F 13 65.3 98.0 0 90
## Carol F 14 62.8 102.5 0 114
## Henry M 14 63.5 102.5 1 118
## James M 12 57.3 83.0 1 113
Reorder the predictors we want, forming a numeric matrix,
X
.
## height weight age male
## Alfred 69.0 112.5 14 1
## Alice 56.5 84.0 13 0
## Barbara 65.3 98.0 13 0
## Carol 62.8 102.5 14 0
## Henry 63.5 102.5 14 1
## James 57.3 83.0 12 1
Orthogonalization by projections
The Gram-Schmidt process treats the variables in a given order,
according to the columns in X
. We start with a new matrix
Z
consisting of X[,1]
. Then, find a new
variable Z[,2]
orthogonal to Z[,1]
by
subtracting the projection of X[,2]
on
Z[,1]
.
Z <- cbind(X[,1], 0, 0, 0)
Z[,2] <- X[,2] - Proj(X[,2], Z[,1])
crossprod(Z[,1], Z[,2]) # verify orthogonality
## [,1]
## [1,] 7.276e-12
Continue in the same way, subtracting the projections of
X[,3]
on the previous columns, and so forth
Z[,3] <- X[,3] - Proj(X[,3], Z[,1]) - Proj(X[,3], Z[,2])
Z[,4] <- X[,4] - Proj(X[,4], Z[,1]) - Proj(X[,4], Z[,2]) - Proj(X[,4], Z[,3])
Note that if any column of X
is a linear combination of
the previous columns, the corresponding column of Z
will be
all zeros.
These computations are similar to the following set of linear regressions:
z2 <- residuals(lm(X[,2] ~ X[,1]), type="response")
z3 <- residuals(lm(X[,3] ~ X[,1:2]), type="response")
z4 <- residuals(lm(X[,4] ~ X[,1:3]), type="response")
The columns of Z
are now orthogonal, but not of unit
length,
## [,1] [,2] [,3] [,4]
## [1,] 57888 0 0 0
## [2,] 0 3249 0 0
## [3,] 0 0 7 0
## [4,] 0 0 0 2
We make standardize column to unit length, giving Z
as
an orthonormal matrix, such that
.
Z <- Z %*% diag(1 / len(Z)) # make each column unit length
zapsmall(crossprod(Z)) # check orthonormal
## [,1] [,2] [,3] [,4]
## [1,] 1 0 0 0
## [2,] 0 1 0 0
## [3,] 0 0 1 0
## [4,] 0 0 0 1
Relationship to QR factorization
The QR method uses essentially the same process, factoring the matrix
as
,
where
is the orthonormal matrix corresponding to Z
and
is an upper triangular matrix. However, the signs of the columns of
are arbitrary, and QR()
returns QR(X)$Q
with
signs reversed, compared to Z
.
# same result as QR(X)$Q, but with signs reversed
head(Z, 5)
## height weight age male
## Alfred 0.2868 0.07545 -0.3687 0.12456
## Alice 0.2348 -0.08067 0.3569 -0.02177
## Barbara 0.2714 -0.07715 -0.3862 -0.45170
## Carol 0.2610 0.07058 0.1559 -0.20548
## Henry 0.2639 0.05132 0.1047 0.40538
## [,1] [,2] [,3] [,4]
## [1,] 0.2868 0.07545 -0.3687 0.12456
## [2,] 0.2348 -0.08067 0.3569 -0.02177
## [3,] 0.2714 -0.07715 -0.3862 -0.45170
## [4,] 0.2610 0.07058 0.1559 -0.20548
## [5,] 0.2639 0.05132 0.1047 0.40538
## [1] TRUE
Regression with X and Z
We carry out two regressions of IQ
on the variables in
X
and in Z
. These are equivalent, in the sense
that
- The and MSE are the same in both models
- Residuals are the same
- The Type I tests given by
anova()
are the same.
class2 <- data.frame(Z, IQ=class$IQ)
Regression of IQ on the original variables in X
## Analysis of Variance Table
##
## Response: IQ
## Df Sum Sq Mean Sq F value Pr(>F)
## height 1 127 127 1.09 0.321
## weight 1 194 194 1.67 0.225
## age 1 417 417 3.59 0.087 .
## male 1 200 200 1.72 0.218
## Residuals 10 1162 116
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Regression of IQ on the orthogonalized variables in
Z
## Analysis of Variance Table
##
## Response: IQ
## Df Sum Sq Mean Sq F value Pr(>F)
## height 1 127 127 1.09 0.321
## weight 1 194 194 1.67 0.225
## age 1 417 417 3.59 0.087 .
## male 1 200 200 1.72 0.218
## Residuals 10 1162 116
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
This illustrates that anova()
tests for linear models
are sequential tests. They test hypotheses about the extra
contribution of each variable over and above all previous ones, in a
given order. These usually do not make substantive sense, except in
testing ordered (βhierarchicalβ) models.