Appendix C — Exercises – Visualizing Multivariate Data and Models in R

This appendix collects exercises for each chapter of the book. It is only a beginning…

Chapter 1: Warm-up Exercises

Chapter 2: Introduction

Chapter 3: Getting Started

Chapter 4: Plots of Multivariate Data

Exercise C.1 Using the Salaries dataset, create one or more plots to compare different smoothing methods for the relationship between yrs.since.phd and salary shown in Figure 4.5. Include linear regression, quadratic polynomial, and loess smoothers,

library(ggplot2)
data(Salaries, package = "carData")
# Your code here

Exercise C.2 One alternative to a loess smooth, which allows a span argument to control the degree of smoothing is a natural spline, that can be used in geom_smooth() using the argument formula = y ~ splines::ns(x, df=), where df is the equivalent number of degrees of freedom for the spline smoother. Re-do Exercise C.1, but trying out this smoothing method for several values of df.

Chapter 5: Dimension Reduction

Chapter 6: Overview of Linear models

Chapter 7: Plots for Univariate Response Models

Chapter 8: Topics in Linear Models

Chapter 9: Collinearity & Ridge Regression

Chapter 10: Hotelling’s \(T^2\)

Exercise C.3 The value of Hotelling’s \(T^2\) found by hotelling.test() is 64.17. The value of the equivalent \(F\) statistic found by Anova() is 28.9. Verify that Equation 10.4 gives this result.

Chapter 11: Multivariate Linear Models

Chapter 12: Visualizing Multivariate Models

Exercise C.4 The dataset heplots::hernior contains data on measures of post-operative recovery of 32 patients undergoing an elective herniorrhaphy operation, in relation to pre-operative measures.

The outcome measures are:

leave, the patient’s condition upon leaving the recovery room (a 1-4 scale, 1=best),
nurse, level of nursing required one week after operation (a 1-5 scale, 1=worst) and
los, length of stay in hospital after operation (in days)

The predictor variables are:

patient age, sex,
pstat, physical status (a 1-5 scale, with 1=perfect health, …, 5=very poor health),
build, body build (a 1-5 scale, with 1=emaciated, …, 5=obese), and
preoperative complications with (cardiac) heart and respiration (resp), 1-4 scales, 1=none, …, 4=severe.

Fit the multivariate regression model and test the contributions of the predictors using car::Anova(). What do you conclude?

Show the code

data(Hernior, package = "heplots")
Hern.mod <- lm(cbind(leave, nurse, los) ~ age + sex + pstat + build + cardiac + resp,
               data = Hernior)
car::Anova(Hern.mod)

Extract the R² for each response separately using summary(Hern.mod), and compare with the overall multivariate test from (a). Does the multivariate test reveal anything that the univariate R² values miss? The function heplots::glance.mlm(Hern.mod) gives a compact one-line-per-response summary.
Test the joint hypothesis that all predictors simultaneously have zero effect, using car::linearHypothesis(). Compare the four multivariate test statistics (Pillai, Wilks, Hotelling-Lawley, Roy) with the individual predictor p-values from (a). What does this suggest about the collective vs. individual predictive power of the pre-operative measures?

Show the code

predictors <- rownames(coef(Hern.mod))[-1]
car::linearHypothesis(Hern.mod, predictors)

Construct an HE pairs plot for the model, adding the overall regression as a joint hypothesis ellipse. Use different colors for each predictor term.

Show the code

clr <- c("red", "darkgray", "blue", "darkgreen", "magenta", "brown", "black")
vlab <- c("LeaveCondition\n(leave)", "NursingCare\n(nurse)", "LengthOfStay\n(los)")
pairs(Hern.mod,
      hypotheses = list("Regr" = predictors),
      col = clr, var.labels = vlab,
      fill = c(TRUE, FALSE), fill.alpha = 0.1)

Which predictor shows the largest multivariate effect? Are any predictors associated with better outcomes on one response but worse on another?

Use candiscList() to examine predictor effects in canonical space and plot the results for pstat and build. What do the structure coefficient arrows tell you about which recovery outcomes each predictor most strongly influences?

Show the code

Hern.canL <- candiscList(Hern.mod)
plot(Hern.canL, term = "pstat")
plot(Hern.canL, term = "build")

Appendix C — Exercises

Chapter 1: Warm-up Exercises

Chapter 2: Introduction

Chapter 3: Getting Started

Chapter 4: Plots of Multivariate Data

Chapter 5: Dimension Reduction

Chapter 6: Overview of Linear models

Chapter 7: Plots for Univariate Response Models

Chapter 8: Topics in Linear Models

Chapter 9: Collinearity & Ridge Regression

Chapter 10: Hotelling’s \(T^2\)

Chapter 11: Multivariate Linear Models

Chapter 12: Visualizing Multivariate Models

Chapter 13: Visualizing Equality of Covariance Matrices

Chapter 14: Multivariate Influence and Robust Estimation

Chapter 15 (Appendix): Discriminant Analysis