Skip to contents

These data consist of observations on 442 patients, with the response of interest being a quantitative measure of disease progression one year after baseline.

There are ten baseline variables: age, sex, body-mass index (bmi), average blood pressure (map) and six blood serum measurements.

Usage

data("diab")

Format

A data frame with 442 observations on the following 11 variables.

prog

disease progression, a numeric vector

age

age, a numeric vector

sex

integer, a numeric vector

bmi

body mass index, a numeric vector

map

mean arterial blood pressure, a numeric vector

tc

blood serum TC, a numeric vector

ldl

blood serum low-density lipoprotein ("bad cholersterol"), a numeric vector

hdl

blood serum high-density lipoprotein ("good cholersterol"), a numeric vector

tch

blood serum TCH, a numeric vector

ltg

blood serum lamotrigine, a numeric vector

glu

blood serum glucose, a numeric vector

Source

The dataset was taken from the web site for Efron & Hastie (2021), http://hastie.su.domains/CASI_files/DATA/diabetes.csv.

Details

Efron & Hastie describe their analysis using the standardized the centered predictor variables to be unit L2 norm

References

Efron, B., Hastie, T., Johnstone, I., & Tibshirani, R. (2004). Least Angle Regression. The Annals of Statistics, 32(2), 407-499. doi:10.1214/009053604000000067

Efron, B., & Hastie, T. (2021). Computer Age Statistical Inference, Student Edition: Algorithms, Evidence, and Data Science, Cambridge University Press. doi:10.1017/9781108914062

Examples

data(diab)
## maybe str(diab) ; plot(diab) ...