The function ridge
fits linear models by ridge regression, returning
an object of class ridge
designed to be used with the plotting
methods in this package.
It is also designed to facilitate an alternative representation of the effects of shrinkage in the space of uncorrelated (PCA/SVD) components of the predictors.
The standard formulation of ridge regression is that it regularizes the estimates of coefficients by adding small positive constants \(\lambda\) to the diagonal elements of \(\mathbf{X}^\top\mathbf{X}\) in the least squares solution to achieve a more favorable tradeoff between bias and variance (inverse of precision) of the coefficients.
$$\widehat{\mathbf{\beta}}^{\text{RR}}_k = (\mathbf{X}^\top \mathbf{X} + \lambda \mathbf{I})^{-1} \mathbf{X}^\top \mathbf{y} $$
Ridge regression shrinkage can be parameterized in several ways.
If a vector of
lambda
values is supplied, these are used directly in the ridge regression computations.Otherwise, if a vector
df
can be supplied the equivalent values for effective degrees of freedom corresponding to shrinkage, going down from the number of predictors in the model.
In either case, both lambda
and
df
are returned in the ridge
object, but the rownames
of the
coefficients are given in terms of lambda
.
coef
extracts the estimated coefficients for each value of the shrinkage factor
vcov
extracts the estimated \(p \times p\) covariance matrices of the coefficients for each value of the shrinkage factor.
best
extracts the optimal shrinkage values according to several criteria:
HKB: Hoerl et al. (1975); LW: Lawless & Wang (1976); GCV: Golub et al. (1975)
Usage
ridge(y, ...)
# S3 method for class 'formula'
ridge(formula, data, lambda = 0, df, svd = TRUE, contrasts = NULL, ...)
# Default S3 method
ridge(y, X, lambda = 0, df, svd = TRUE, ...)
# S3 method for class 'ridge'
coef(object, ...)
# S3 method for class 'ridge'
print(x, digits = max(5, getOption("digits") - 5), ...)
# S3 method for class 'ridge'
vcov(object, ...)
best(object, ...)
# S3 method for class 'ridge'
best(object, ...)
Arguments
- y
A numeric vector containing the response variable. NAs not allowed.
- ...
Other arguments, passed down to methods
- formula
For the
formula
method, a two-sided formula.- data
For the
formula
method, data frame within which to evaluate the formula.- lambda
A scalar or vector of ridge constants. A value of 0 corresponds to ordinary least squares.
- df
A scalar or vector of effective degrees of freedom corresponding to
lambda
- svd
If
TRUE
the SVD of the centered and scaledX
matrix is returned in theridge
object.- contrasts
a list of contrasts to be used for some or all of factor terms in the formula. See the
contrasts.arg
ofmodel.matrix.default
.- X
A matrix of predictor variables. NA's not allowed. Should not include a column of 1's for the intercept.
- x, object
An object of class
ridge
- digits
For the
print
method, the number of digits to print.
Value
A list with the following components:
- lambda
The vector of ridge constants
- df
The vector of effective degrees of freedom corresponding to
lambda
- coef
The matrix of estimated ridge regression coefficients
- scales
scalings used on the X matrix
- kHKB
HKB estimate of the ridge constant
- kLW
L-W estimate of the ridge constant
- GCV
vector of GCV values
- kGCV
value of
lambda
with the minimum GCV- criteria
Collects the criteria
kHKB
,kLW
, andkGCV
in a named vector
If svd==TRUE
(the default), the following are also included:
- svd.D
Singular values of the
svd
of the scaled X matrix- svd.U
Left singular vectors of the
svd
of the scaled X matrix. Rows correspond to observations and columns to dimensions.- svd.V
Right singular vectors of the
svd
of the scaled X matrix. Rows correspond to variables and columns to dimensions.
Details
If an intercept is present in the model, its coefficient is not penalized. (If you want to penalize an intercept, put in your own constant term and remove the intercept.)
The predictors are centered, but not (yet) scaled in this implementation.
A number of the methods in the package assume that lambda
is a vector of shrinkage constants
increasing from lambda[1] = 0
, or equivalently, a vector of df
decreasing from \(p\).
References
Hoerl, A. E., Kennard, R. W., and Baldwin, K. F. (1975), "Ridge Regression: Some Simulations," Communications in Statistics, 4, 105-123.
Lawless, J.F., and Wang, P. (1976), "A Simulation Study of Ridge and Other Regression Estimators," Communications in Statistics, 5, 307-323.
Golub G.H., Heath M., Wahba G. (1979) Generalized cross-validation as a method for choosing a good ridge parameter. Technometrics, 21:215–223. https://doi.org/10.2307/1268518
See also
lm.ridge
for other implementations of ridge regression
traceplot
, plot.ridge
,
pairs.ridge
, plot3d.ridge
, for 1D, 2D, 3D plotting methods
pca.ridge
, biplot.ridge
,
biplot.pcaridge
for views in PCA/SVD space
precision.ridge
for measures of shrinkage and precision
Examples
#\donttest{
# Longley data, using number Employed as response
longley.y <- longley[, "Employed"]
longley.X <- data.matrix(longley[, c(2:6,1)])
lambda <- c(0, 0.005, 0.01, 0.02, 0.04, 0.08)
lridge <- ridge(longley.y, longley.X, lambda=lambda)
# same, using formula interface
lridge <- ridge(Employed ~ GNP + Unemployed + Armed.Forces + Population + Year + GNP.deflator,
data=longley, lambda=lambda)
coef(lridge)
#> GNP Unemployed Armed.Forces Population Year GNP.deflator
#> 0.000 -3.4471925 -1.827886 -0.6962102 -0.34419721 8.431972 0.15737965
#> 0.005 -1.0424783 -1.491395 -0.6234680 -0.93558040 6.566532 -0.04175039
#> 0.010 -0.1797967 -1.361047 -0.5881396 -1.00316772 5.656287 -0.02612152
#> 0.020 0.4994945 -1.245137 -0.5476331 -0.86755299 4.626116 0.09766305
#> 0.040 0.9059471 -1.155229 -0.5039108 -0.52347060 3.576502 0.32123994
#> 0.080 1.0907048 -1.086421 -0.4582525 -0.08596324 2.641649 0.57025165
# standard trace plot
traceplot(lridge)
# plot vs. equivalent df
traceplot(lridge, X="df")
pairs(lridge, radius=0.5)
#}
# \donttest{
data(prostate)
py <- prostate[, "lpsa"]
pX <- data.matrix(prostate[, 1:8])
pridge <- ridge(py, pX, df=8:1)
pridge
#> Ridge Coefficients:
#> lcavol lweight age lbph svi
#> 0.00000 0.6617092 0.2651031 -0.1573777 0.1395860 0.3136993
#> 7.08544 0.5774912 0.2574888 -0.1240459 0.1239824 0.2825540
#> 17.80389 0.4966273 0.2435259 -0.0910692 0.1087544 0.2542261
#> 34.68940 0.4182786 0.2224269 -0.0591493 0.0936983 0.2271752
#> 62.98554 0.3413151 0.1935614 -0.0296600 0.0784146 0.1988651
#> 115.28893 0.2641305 0.1565931 -0.0049775 0.0622832 0.1660546
#> 230.11308 0.1842675 0.1116128 0.0112850 0.0444562 0.1249585
#> 601.44017 0.0979221 0.0591504 0.0144649 0.0239578 0.0711874
#> lcp gleason pgg45
#> 0.00000 -0.1475193 0.0353655 0.1250701
#> 7.08544 -0.0556184 0.0457910 0.0958079
#> 17.80389 0.0159166 0.0518160 0.0800300
#> 34.68940 0.0670223 0.0559538 0.0738953
#> 62.98554 0.0979877 0.0592922 0.0732073
#> 115.28893 0.1087550 0.0607835 0.0729838
#> 230.11308 0.0981548 0.0564920 0.0666687
#> 601.44017 0.0633002 0.0392473 0.0455834
plot(pridge)
pairs(pridge)
traceplot(pridge)
traceplot(pridge, X="df")
# }
# Hospital manpower data from Table 3.8 of Myers (1990)
data(Manpower)
str(Manpower)
#> 'data.frame': 17 obs. of 6 variables:
#> $ Hours : num 567 697 1033 1604 1611 ...
#> $ Load : num 15.6 44 20.4 18.7 49.2 ...
#> $ Xray : int 2463 2048 3940 6505 5723 11520 5779 5969 8461 20106 ...
#> $ BedDays: num 473 1340 620 568 1498 ...
#> $ AreaPop: num 18 9.5 12.8 36.7 35.7 ...
#> $ Stay : num 4.45 6.92 4.28 3.9 5.5 4.6 5.62 5.15 6.18 6.15 ...
mmod <- lm(Hours ~ ., data=Manpower)
vif(mmod)
#> Load Xray BedDays AreaPop Stay
#> 9597.570761 7.940593 8933.086501 23.293856 4.279835
# ridge regression models, specified in terms of equivalent df
mridge <- ridge(Hours ~ ., data=Manpower, df=seq(5, 3.75, -.25))
vif(mridge)
#> Variance inflaction factors:
#> Load Xray BedDays AreaPop Stay
#> 0.0000000000 9597.571 7.941 8933.087 23.294 4.280
#> 0.0002836352 5602.507 7.927 5215.176 19.045 3.923
#> 0.0009043689 2438.604 7.913 2270.762 15.668 3.638
#> 0.0026667276 634.529 7.891 591.832 13.699 3.468
#> 0.0203643899 23.624 7.715 23.250 12.531 3.313
#> 0.1364900421 7.337 6.733 8.136 9.838 2.796
# univariate ridge trace plots
traceplot(mridge)
traceplot(mridge, X="df")
# \donttest{
# bivariate ridge trace plots
plot(mridge, radius=0.25, labels=mridge$df)
pairs(mridge, radius=0.25)
# 3D views
# ellipsoids for Load, Xray & BedDays are nearly 2D
plot3d(mridge, radius=0.2, labels=mridge$df)
# variables in model selected by AIC & BIC
plot3d(mridge, variables=c(2,3,5), radius=0.2, labels=mridge$df)
# plots in PCA/SVD space
mpridge <- pca(mridge)
traceplot(mpridge, X="df")
biplot(mpridge, radius=0.25)
#> Vector scale factor set to 8774.365
# }