candisc
performs a generalized canonical discriminant analysis for
one term in a multivariate linear model (i.e., an mlm
object),
computing canonical scores and vectors. It represents a transformation of
the original variables into a canonical space of maximal differences for the
term, controlling for other model terms.
Usage
candisc(mod, ...)
# S3 method for mlm
candisc(mod, term, type = "2", manova, ndim = rank, ...)
# S3 method for candisc
print(x, digits = max(getOption("digits") - 2, 3), LRtests = TRUE, ...)
# S3 method for candisc
summary(
object,
means = TRUE,
scores = FALSE,
coef = c("std"),
ndim,
digits = max(getOption("digits") - 2, 4),
...
)
# S3 method for candisc
coef(object, type = c("std", "raw", "structure"), ...)
# S3 method for candisc
plot(
x,
which = 1:2,
conf = 0.95,
col,
pch,
scale,
asp = 1,
var.col = "blue",
var.lwd = par("lwd"),
var.labels,
var.cex = 1,
var.pos,
rev.axes = c(FALSE, FALSE),
ellipse = FALSE,
ellipse.prob = 0.68,
fill.alpha = 0.1,
prefix = "Can",
suffix = TRUE,
titles.1d = c("Canonical scores", "Structure"),
points.1d = FALSE,
...
)
Arguments
- mod
An mlm object, such as computed by
lm()
with a multivariate response- ...
arguments to be passed down. In particular,
type="n"
can be used with theplot
method to suppress the display of canonical scores.- term
the name of one term from
mod
for which the canonical analysis is performed.- type
type of test for the model
term
, one of: "II", "III", "2", or "3"- manova
the
Anova.mlm
object corresponding tomod
. Normally, this is computed internally byAnova(mod)
- ndim
Number of dimensions to store in (or retrieve from, for the
summary
method) themeans
,structure
,scores
andcoeffs.*
components. The default is the rank of the H matrix for the hypothesis term.- digits
significant digits to print.
- LRtests
logical; should likelihood ratio tests for the canonical dimensions be printed?
- object, x
A candisc object
- means
Logical value used to determine if canonical means are printed
- scores
Logical value used to determine if canonical scores are printed
- coef
Type of coefficients printed by the summary method. Any one or more of
"std"
,"raw"
, or"structure"
- which
A vector of one or two integers, selecting the canonical dimension(s) to plot. If the canonical structure for a
term
hasndim==1
, orlength(which)==1
, a 1D representation of canonical scores and structure coefficients is produced by theplot
method. Otherwise, a 2D plot is produced.- conf
Confidence coefficient for the confidence circles around canonical means plotted in the
plot
method- col
A vector of the unique colors to be used for the levels of the term in the
plot
method, one for each level of theterm
. In this version, you should assign colors and point symbols explicitly, rather than relying on the somewhat arbitrary defaults, based onpalette
- pch
A vector of the unique point symbols to be used for the levels of the term in the
plot
method- scale
Scale factor for the variable vectors in canonical space. If not specified, a scale factor is calculated to make the variable vectors approximately fill the plot space.
- asp
Aspect ratio for the
plot
method. Theasp=1
(the default) assures that the units on the horizontal and vertical axes are the same, so that lengths and angles of the variable vectors are interpretable.- var.col
Color used to plot variable vectors
- var.lwd
Line width used to plot variable vectors
- var.labels
Optional vector of variable labels to replace variable names in the plots
- var.cex
Character expansion size for variable labels in the plots
- var.pos
Position(s) of variable vector labels wrt. the end point. If not specified, the labels are out-justified left and right with respect to the end points.
- rev.axes
Logical, a vector of
length(which)
.TRUE
causes the orientation of the canonical scores and structure coefficients to be reversed along a given axis.- ellipse
Draw data ellipses for canonical scores?
- ellipse.prob
Coverage probability for the data ellipses
- fill.alpha
Transparency value for the color used to fill the ellipses. Use
fill.alpha
to draw the ellipses unfilled.- prefix
Prefix used to label the canonical dimensions plotted
- suffix
Suffix for labels of canonical dimensions. If
suffix=TRUE
the percent of hypothesis (H) variance accounted for by each canonical dimension is added to the axis label.- titles.1d
A character vector of length 2, containing titles for the panels used to plot the canonical scores and structure vectors, for the case in which there is only one canonical dimension.
- points.1d
Logical value for
plot.candisc
when only one canonical dimension.
Value
An object of class candisc
with the following components:
- dfh
hypothesis degrees of freedom for
term
- dfe
error degrees of freedom for the
mlm
- rank
number of non-zero eigenvalues of \(HE^{-1}\)
- eigenvalues
eigenvalues of \(HE^{-1}\)
- canrsq
squared canonical correlations
- pct
A vector containing the percentages of the
canrsq
of their total.- ndim
Number of canonical dimensions stored in the
means
,structure
andcoeffs.*
components- means
A data.frame containing the class means for the levels of the factor(s) in the term
- factors
A data frame containing the levels of the factor(s) in the
term
- term
name of the
term
- terms
A character vector containing the names of the terms in the
mlm
object- coeffs.raw
A matrix containing the raw canonical coefficients
- coeffs.std
A matrix containing the standardized canonical coefficients
- structure
A matrix containing the canonical structure coefficients on
ndim
dimensions, i.e., the correlations between the original variates and the canonical scores. These are sometimes referred to as Total Structure Coefficients.- scores
A data frame containing the predictors in the
mlm
model and the canonical scores onndim
dimensions. These are calculated asY %*% coeffs.raw
, whereY
contains the standardized response variables.
Details
In typical usage, the term
should be a factor or interaction
corresponding to a multivariate test with 2 or more degrees of freedom for
the null hypothesis.
Canonical discriminant analysis is typically carried out in conjunction with
a one-way MANOVA design. It represents a linear transformation of the
response variables into a canonical space in which (a) each successive
canonical variate produces maximal separation among the groups (e.g.,
maximum univariate F statistics), and (b) all canonical variates are
mutually uncorrelated. For a one-way MANOVA with g groups and p responses,
there are dfh
= min( g-1, p) such canonical dimensions, and tests,
initially stated by Bartlett (1938) allow one to determine the number of
significant canonical dimensions.
Computational details for the one-way case are described in Cooley & Lohnes (1971), and in the SAS/STAT User's Guide, "The CANDISC procedure: Computational Details," http://support.sas.com/documentation/cdl/en/statug/63962/HTML/default/viewer.htm#statug_candisc_sect012.htm.
A generalized canonical discriminant analysis extends this idea to a general
multivariate linear model. Analysis of each term in the mlm
produces
a rank \(df_h\) H matrix sum of squares and crossproducts matrix that
is tested against the rank \(df_e\) E matrix by the standard
multivariate tests (Wilks' Lambda, Hotelling-Lawley trace, Pillai trace,
Roy's maximum root test). For any given term in the mlm
, the
generalized canonical discriminant analysis amounts to a standard
discriminant analysis based on the H matrix for that term in relation to the
full-model E matrix.
The plot method for candisc objects is typically a 2D plot, similar to a
biplot. It shows the canonical scores for the groups defined by the
term
as points and the canonical structure coefficients as vectors
from the origin.
If the canonical structure for a term
has ndim==1
, or
length(which)==1
, the 1D representation consists of a boxplot of
canonical scores and a vector diagram showing the magnitudes of the
structure coefficients.
References
Bartlett, M. S. (1938). Further aspects of the theory of multiple regression. Proc. Cambridge Philosophical Society 34, 33-34.
Cooley, W.W. & Lohnes, P.R. (1971). Multivariate Data Analysis, New York: Wiley.
Gittins, R. (1985). Canonical Analysis: A Review with Applications in Ecology, Berlin: Springer.
Examples
grass.mod <- lm(cbind(N1,N9,N27,N81,N243) ~ Block + Species, data=Grass)
car::Anova(grass.mod, test="Wilks")
#>
#> Type II MANOVA Tests: Wilks test statistic
#> Df test stat approx F num Df den Df Pr(>F)
#> Block 4 0.33721 1.5620 20 80.549 0.08372 .
#> Species 7 0.01570 4.9756 35 103.389 1.039e-10 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
grass.can1 <-candisc(grass.mod, term="Species")
plot(grass.can1)
#> Vector scale factor set to 5.545
# library(heplots)
heplot(grass.can1, scale=6, fill=TRUE)
# iris data
iris.mod <- lm(cbind(Petal.Length, Sepal.Length, Petal.Width, Sepal.Width) ~ Species, data=iris)
iris.can <- candisc(iris.mod, data=iris)
#-- assign colors and symbols corresponding to species
col <- c("red", "brown", "green3")
pch <- 1:3
plot(iris.can, col=col, pch=pch)
#> Vector scale factor set to 9.58
heplot(iris.can)
#> Vector scale factor set to 43.914
# 1-dim plot
iris.can1 <- candisc(iris.mod, data=iris, ndim=1)
plot(iris.can1)