School Data, from Charnes et al. (1981). The aim is to explain scores on 3
different tests, reading
, mathematics
and selfesteem
from 70 school sites by means of 5 explanatory variables related to parents
and teachers.
Format
A data frame with 70 observations on the following 8 variables.
education
Education level of mother as measured in terms of percentage of high school graduates among female parents
occupation
Highest occupation of a family member according to a pre-arranged rating scale
visit
Parental visits index representing the number of visits to the school site
counseling
Parent counseling index calculated from data on time spent with child on school-related topics such as reading together, etc.
teacher
Number of teachers at a given site
reading
Reading score as measured by the Metropolitan Achievement Test
mathematics
Mathematics score as measured by the Metropolitan Achievement Test
selfesteem
Coopersmith Self-Esteem Inventory, intended as a measure of self-esteem
Source
A. Charnes, W.W. Cooper and E. Rhodes (1981). Evaluating Program and Managerial Efficiency: An Application of Data Envelopment Analysis to Program Follow Through. Management Science, 27, 668-697.
Details
This dataset was shamelessly borrowed from the FRB
package.
The relationships among these variables are unusual, a fact only revealed by plotting.
Examples
data(schooldata)
# initial screening
plot(schooldata)
# better plot
library(corrgram)
corrgram(schooldata,
lower.panel=panel.ellipse,
upper.panel=panel.pts)
#fit the MMreg model
school.mod <- lm(cbind(reading, mathematics, selfesteem) ~
education + occupation + visit + counseling + teacher, data=schooldata)
# shorthand: fit all others
school.mod <- lm(cbind(reading, mathematics, selfesteem) ~ ., data=schooldata)
car::Anova(school.mod)
#>
#> Type II MANOVA Tests: Pillai test statistic
#> Df test stat approx F num Df den Df Pr(>F)
#> education 1 0.37564 12.4337 3 62 1.820e-06 ***
#> occupation 1 0.56658 27.0159 3 62 2.687e-11 ***
#> visit 1 0.26032 7.2734 3 62 0.0002948 ***
#> counseling 1 0.06465 1.4286 3 62 0.2429676
#> teacher 1 0.04906 1.0661 3 62 0.3700291
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# HE plots
heplot(school.mod, fill=TRUE, fill.alpha=0.1)
pairs(school.mod, fill=TRUE, fill.alpha=0.1)
# robust model, using robmlm()
school.rmod <- robmlm(cbind(reading, mathematics, selfesteem) ~ ., data=schooldata)
# note that counseling is now significant
car::Anova(school.rmod)
#>
#> Type II MANOVA Tests: Pillai test statistic
#> Df test stat approx F num Df den Df Pr(>F)
#> education 1 0.39455 12.8161 3 59 1.488e-06 ***
#> occupation 1 0.59110 28.4301 3 59 1.683e-11 ***
#> visit 1 0.23043 5.8888 3 59 0.0013819 **
#> counseling 1 0.25257 6.6456 3 59 0.0006083 ***
#> teacher 1 0.09812 2.1395 3 59 0.1048263
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Index plot of the weights
wts <- school.rmod$weights
notable <- which(wts < 0.8)
plot(wts, type = "h", col="gray", ylab = "Observation weight")
points(1:length(wts), wts,
pch=16,
col = ifelse(wts < 0.8, "red", "black"))
text(notable, wts[notable],
labels = notable,
pos = 3,
col = "red")
# compare classical HE plot with that based on the robust model
heplot(school.mod, cex=1.4, lty=1, fill=TRUE, fill.alpha=0.1)
heplot(school.rmod,
add=TRUE,
error.ellipse=TRUE,
lwd=c(2,2), lty=c(2,2),
term.labels=FALSE, err.label="",
fill=TRUE)