The dataset come from a small random sample of the U.S. National Longitudinal Survey of Youth.
Format
A data frame with 243 observations on the following 6 variables.
math
Math achievement test score
read
Reading achievement test score
antisoc
score on a measure of child's antisocial behavior,
0:6
hyperact
score on a measure of child's hyperactive behavior,
0:5
income
yearly income of child's father
educ
years of education of child's father
Source
This dataset was derived from a larger one used by Patrick Curran at the 1997 meeting of the Society for Research on Child Development (SRCD). A description now only exists on the WayBack Machine, http://web.archive.org/web/20050404145001/http://www.unc.edu/~curran/example.html.
More details are available at http://web.archive.org/web/20060830061414/http://www.unc.edu/~curran/srcd-docs/srcdmeth.pdf.
Details
In this dataset, math
and read
scores are taken at the outcome
variables. Among the remaining predictors, income
and educ
might be considered as background variables necessary to control for.
Interest might then be focused on whether the behavioural variables
antisoc
and hyperact
contribute beyond that.
Examples
library(car)
data(NLSY)
#examine the data
scatterplotMatrix(NLSY, smooth=FALSE)
# test control variables by themselves
# -------------------------------------
mod1 <- lm(cbind(read,math) ~ income+educ, data=NLSY)
Anova(mod1)
#>
#> Type II MANOVA Tests: Pillai test statistic
#> Df test stat approx F num Df den Df Pr(>F)
#> income 1 0.034469 4.2661 2 239 0.015121 *
#> educ 1 0.051521 6.4912 2 239 0.001798 **
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
heplot(mod1, fill=TRUE)
# test of overall regression
coefs <- rownames(coef(mod1))[-1]
linearHypothesis(mod1, coefs)
#>
#> Sum of squares and products for the hypothesis:
#> read math
#> read 859.6586 1474.716
#> math 1474.7164 2929.558
#>
#> Sum of squares and products for error:
#> read math
#> read 22882.46 12051.69
#> math 12051.69 23763.79
#>
#> Multivariate Tests:
#> Df test stat approx F num Df den Df Pr(>F)
#> Pillai 2 0.1166962 7.435629 4 480 8.1261e-06 ***
#> Wilks 2 0.8840660 7.594147 4 478 6.1527e-06 ***
#> Hotelling-Lawley 2 0.1302750 7.751361 4 476 4.6699e-06 ***
#> Roy 2 0.1232808 14.793699 2 240 8.7377e-07 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
heplot(mod1, fill=TRUE, hypotheses=list("Overall"=coefs))
# additional contribution of antisoc + hyperact over income + educ
# ----------------------------------------------------------------
mod2 <- lm(cbind(read,math) ~ antisoc + hyperact + income + educ, data=NLSY)
Anova(mod2)
#>
#> Type II MANOVA Tests: Pillai test statistic
#> Df test stat approx F num Df den Df Pr(>F)
#> antisoc 1 0.019343 2.3374 2 237 0.098803 .
#> hyperact 1 0.014442 1.7364 2 237 0.178380
#> income 1 0.038280 4.7167 2 237 0.009801 **
#> educ 1 0.053152 6.6521 2 237 0.001546 **
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
coefs <- rownames(coef(mod2))[-1]
heplot(mod2, fill=TRUE, hypotheses=list("Overall"=coefs, "mod2|mod1"=coefs[1:2]))
linearHypothesis(mod2, coefs[1:2])
#>
#> Sum of squares and products for the hypothesis:
#> read math
#> read 170.3478 261.2230
#> math 261.2230 516.0188
#>
#> Sum of squares and products for error:
#> read math
#> read 22712.12 11790.46
#> math 11790.46 23247.77
#>
#> Multivariate Tests:
#> Df test stat approx F num Df den Df Pr(>F)
#> Pillai 2 0.0239869 1.444548 4 476 0.218172
#> Wilks 2 0.9760624 1.444284 4 474 0.218264
#> Hotelling-Lawley 2 0.0244741 1.443972 4 472 0.218372
#> Roy 2 0.0221965 2.641385 2 238 0.073351 .
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
heplot(mod2, fill=TRUE, hypotheses=list("mod2|mod1"=coefs[1:2]))