The High School and Beyond Project was a longitudinal study of students in the U.S. carried out in 1980 by the National Center for Education Statistics. Data were collected from 58,270 high school students (28,240 seniors and 30,030 sophomores) and 1,015 secondary schools. The HSB data frame is sample of 600 observations, of unknown characteristics, originally taken from Tatsuoka (1988).
Format
A data frame with 600 observations on the following 15 variables. There is no missing data.
id
Observation id: a numeric vector
gender
a factor with levels
male
female
race
Race or ethnicity: a factor with levels
hispanic
asian
african-amer
white
ses
Socioeconomic status: a factor with levels
low
middle
high
sch
School type: a factor with levels
public
private
prog
High school program: a factor with levels
general
academic
vocation
locus
Locus of control: a numeric vector
concept
Self-concept: a numeric vector
mot
Motivation: a numeric vector
career
Career plan: a factor with levels
clerical
craftsman
farmer
homemaker
laborer
manager
military
operative
prof1
prof2
proprietor
protective
sales
school
service
technical
not working
read
Standardized reading score: a numeric vector
write
Standardized writing score: a numeric vector
math
Standardized math score: a numeric vector
sci
Standardized science score: a numeric vector
ss
Standardized social science (civics) score: a numeric vector
Source
Tatsuoka, M. M. (1988). Multivariate Analysis: Techniques for Educational and Psychological Research (2nd ed.). New York: Macmillan, Appendix F, 430-442.
References
High School and Beyond data files: http://www.icpsr.umich.edu/icpsrweb/ICPSR/studies/7896
Examples
str(HSB)
#> 'data.frame': 600 obs. of 15 variables:
#> $ id : num 55 114 490 44 26 510 133 213 548 309 ...
#> $ gender : Factor w/ 2 levels "male","female": 2 1 1 2 2 1 2 2 2 2 ...
#> $ race : Factor w/ 4 levels "hispanic","asian",..: 1 3 4 1 1 4 3 4 4 4 ...
#> $ ses : Factor w/ 3 levels "low","middle",..: 1 2 2 1 2 2 1 1 2 3 ...
#> $ sch : Factor w/ 2 levels "public","private": 1 1 1 1 1 1 1 1 2 1 ...
#> $ prog : Factor w/ 3 levels "general","academic",..: 1 2 3 3 2 3 3 1 2 1 ...
#> $ locus : num -1.78 0.24 -1.28 0.22 1.12 ...
#> $ concept: num 0.56 -0.35 0.34 -0.76 -0.74 ...
#> $ mot : num 1 1 0.33 1 0.67 ...
#> $ career : Factor w/ 17 levels "clerical","craftsman",..: 9 8 9 15 15 8 14 1 10 10 ...
#> $ read : num 28.3 30.5 31 31 31 ...
#> $ write : num 46.3 35.9 35.9 41.1 41.1 ...
#> $ math : num 42.8 36.9 46.1 49.2 36 ...
#> $ sci : num 44.4 33.6 39 33.6 36.9 ...
#> $ ss : num 50.6 40.6 45.6 35.6 45.6 ...
# main effects model
hsb.mod <- lm( cbind(read, write, math, sci, ss) ~
gender + race + ses + sch + prog, data=HSB)
car::Anova(hsb.mod)
#>
#> Type II MANOVA Tests: Pillai test statistic
#> Df test stat approx F num Df den Df Pr(>F)
#> gender 1 0.19207 27.8615 5 586 < 2.2e-16 ***
#> race 3 0.20268 8.5207 15 1764 < 2.2e-16 ***
#> ses 2 0.04965 2.9886 10 1174 0.0009909 ***
#> sch 1 0.01225 1.4535 5 586 0.2032987
#> prog 2 0.21466 14.1152 10 1174 < 2.2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Add some interactions
hsb.mod1 <- update(hsb.mod, . ~ . + gender:race + ses:prog)
heplot(hsb.mod1, col=palette()[c(2,1,3:6)], variables=c("read","math"))
hsb.can1 <- candisc(hsb.mod1, term="race")
heplot(hsb.can1, col=c("red", "black"))
#> Vector scale factor set to 6.5031
# show canonical results for all terms
if (FALSE) {
hsb.can <- candiscList(hsb.mod)
hsb.can
}