Skip to contents

This vignette illustrates the process of transforming a set of variables to a new set of uncorrelated (orthogonal) variables. It carries out the Gram-Schmidt process directly by successively projecting each successive variable on the previous ones and subtracting (taking residuals). This is equivalent by replacing each successive variable by its residuals from a least squares regression on the previous variables.

When this method is used on the predictors in a regression problem, the resulting orthogonal variables have exactly the same anova() summary (based on β€œType I”, sequential sums of squares) as do original variables.

Setup

We use the class data set, but convert the character factor sex to a dummy (0/1) variable male.

library(matlib)
data(class)
class$male <- as.numeric(class$sex=="M")

For later use in regression, we create a variable IQ as a response variable

class <- transform(class, 
                   IQ = round(20 + height + 3*age -.1*weight -3*male + 10*rnorm(nrow(class))))
head(class)
##         sex age height weight male  IQ
## Alfred    M  14   69.0  112.5    1 103
## Alice     F  13   56.5   84.0    0 110
## Barbara   F  13   65.3   98.0    0  90
## Carol     F  14   62.8  102.5    0 114
## Henry     M  14   63.5  102.5    1 118
## James     M  12   57.3   83.0    1 113

Reorder the predictors we want, forming a numeric matrix, X.

X <- as.matrix(class[,c(3,4,2,5)])
head(X)
##         height weight age male
## Alfred    69.0  112.5  14    1
## Alice     56.5   84.0  13    0
## Barbara   65.3   98.0  13    0
## Carol     62.8  102.5  14    0
## Henry     63.5  102.5  14    1
## James     57.3   83.0  12    1

Orthogonalization by projections

The Gram-Schmidt process treats the variables in a given order, according to the columns in X. We start with a new matrix Z consisting of X[,1]. Then, find a new variable Z[,2] orthogonal to Z[,1] by subtracting the projection of X[,2] on Z[,1].

Z <- cbind(X[,1], 0, 0, 0)
Z[,2] <- X[,2] - Proj(X[,2], Z[,1])
crossprod(Z[,1], Z[,2])     # verify orthogonality
##           [,1]
## [1,] 7.276e-12

Continue in the same way, subtracting the projections of X[,3] on the previous columns, and so forth

Z[,3] <- X[,3] - Proj(X[,3], Z[,1]) - Proj(X[,3], Z[,2]) 
Z[,4] <- X[,4] - Proj(X[,4], Z[,1]) - Proj(X[,4], Z[,2]) - Proj(X[,4], Z[,3])

Note that if any column of X is a linear combination of the previous columns, the corresponding column of Z will be all zeros.

These computations are similar to the following set of linear regressions:

z2 <- residuals(lm(X[,2] ~ X[,1]), type="response")
z3 <- residuals(lm(X[,3] ~ X[,1:2]), type="response")
z4 <- residuals(lm(X[,4] ~ X[,1:3]), type="response")

The columns of Z are now orthogonal, but not of unit length,

zapsmall(crossprod(Z))     # check orthogonality
##       [,1] [,2] [,3] [,4]
## [1,] 57888    0    0    0
## [2,]     0 3249    0    0
## [3,]     0    0    7    0
## [4,]     0    0    0    2

We make standardize column to unit length, giving Z as an orthonormal matrix, such that Zβ€²Z=IZ' Z = I.

Z <- Z %*% diag(1 / len(Z))    # make each column unit length
zapsmall(crossprod(Z))         # check orthonormal
##      [,1] [,2] [,3] [,4]
## [1,]    1    0    0    0
## [2,]    0    1    0    0
## [3,]    0    0    1    0
## [4,]    0    0    0    1

Relationship to QR factorization

The QR method uses essentially the same process, factoring the matrix 𝐗\mathbf{X} as 𝐗=𝐐𝐑\mathbf{X = Q R}, where 𝐐\mathbf{Q} is the orthonormal matrix corresponding to Z and 𝐑\mathbf{R} is an upper triangular matrix. However, the signs of the columns of 𝐐\mathbf{Q} are arbitrary, and QR() returns QR(X)$Q with signs reversed, compared to Z.

# same result as QR(X)$Q, but with signs reversed
head(Z, 5)
##         height   weight     age     male
## Alfred  0.2868  0.07545 -0.3687  0.12456
## Alice   0.2348 -0.08067  0.3569 -0.02177
## Barbara 0.2714 -0.07715 -0.3862 -0.45170
## Carol   0.2610  0.07058  0.1559 -0.20548
## Henry   0.2639  0.05132  0.1047  0.40538
head(-QR(X)$Q, 5)
##        [,1]     [,2]    [,3]     [,4]
## [1,] 0.2868  0.07545 -0.3687  0.12456
## [2,] 0.2348 -0.08067  0.3569 -0.02177
## [3,] 0.2714 -0.07715 -0.3862 -0.45170
## [4,] 0.2610  0.07058  0.1559 -0.20548
## [5,] 0.2639  0.05132  0.1047  0.40538
all.equal( unname(Z), -QR(X)$Q )
## [1] TRUE

Regression with X and Z

We carry out two regressions of IQ on the variables in X and in Z. These are equivalent, in the sense that

  • The R2R^2 and MSE are the same in both models
  • Residuals are the same
  • The Type I tests given by anova() are the same.
class2 <- data.frame(Z, IQ=class$IQ)

Regression of IQ on the original variables in X

mod1 <- lm(IQ ~ height + weight + age + male, data=class)
anova(mod1)
## Analysis of Variance Table
## 
## Response: IQ
##           Df Sum Sq Mean Sq F value Pr(>F)  
## height     1    127     127    1.09  0.321  
## weight     1    194     194    1.67  0.225  
## age        1    417     417    3.59  0.087 .
## male       1    200     200    1.72  0.218  
## Residuals 10   1162     116                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Regression of IQ on the orthogonalized variables in Z

mod2 <- lm(IQ ~ height + weight + age + male, data=class2)
anova(mod2)
## Analysis of Variance Table
## 
## Response: IQ
##           Df Sum Sq Mean Sq F value Pr(>F)  
## height     1    127     127    1.09  0.321  
## weight     1    194     194    1.67  0.225  
## age        1    417     417    3.59  0.087 .
## male       1    200     200    1.72  0.218  
## Residuals 10   1162     116                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

This illustrates that anova() tests for linear models are sequential tests. They test hypotheses about the extra contribution of each variable over and above all previous ones, in a given order. These usually do not make substantive sense, except in testing ordered (β€œhierarchical”) models.