Data on amyotrophic lateral sclerosis (Lou Gehrig's disease) from Section 17.2. There are 1822 observations on individuals with ALS. The goal is to predict the rate of progression dFRS of a functional rating score, using 369 predictors based on measurements (and derivatives of these) obtained from patient visits.
Format
A data frame with 1822 rows and 371 variables. The key variables are
testset (logical indicator for training/test split) and dFRS
(response: rate of progression of the ALS functional rating score). The 369
predictor variables include:
Demographics:
Age,Sex.Male,Sex.Female, and race indicators (Race...Caucasian,Race...Asian, etc.)Family history of neurological diseases in relatives (e.g.,
Father,Mother,Brother,Sister)Neurological disease indicators (e.g.,
Neurological.Disease.ALS,Neurological.Disease.PARKINSON.S.DISEASE)Site of onset (
Site.of.Onset.Onset..Bulbar,Site.of.Onset.Onset..Limb)Symptoms (
Symptom.Atrophy,Symptom.Cramps,Symptom.Fasciculations,Symptom.Speech, etc.)Study arm indicators (
Study.Arm.ACTIVE,Study.Arm.PLACEBO)Clinical measurements with summary statistics (first, last, min, max, mean, sd, slope): ALSFRS scores, blood pressure, forced/slow vital capacity (
fvc.liters,svc.liters), respiratory rate, weight, heightALSFRS subscale items:
climbing.stairs,cutting,dressing,handwriting,salivation,speech,swallowing,turning,walking
Details
These data were kindly provided by Lester Mackey and Lilly Fang, who won the DREAM challenge prediction prize in 2012 (Kuffner et al., 2015). It includes some additional variables created by them. Their winning entry used Bayesian trees, not too different from random forests.
References
Efron, B. and Hastie, T. (2016). Computer Age Statistical Inference. Cambridge University Press, Section 17.2.
Examples
data(als)
str(als)
#> 'data.frame': 1822 obs. of 371 variables:
#> $ testset : logi TRUE TRUE TRUE TRUE TRUE TRUE ...
#> $ dFRS : num -0.915 -0.108 -0.557 -0.296 -1.087 ...
#> $ Onset.Delta : int -1181 -1324 -1061 -1736 -354 -500 -1091 -217 -820 -1037 ...
#> $ Symptom.Speech : int 1 0 0 0 1 1 0 0 0 0 ...
#> $ Symptom.WEAKNESS : int 0 1 0 1 0 1 0 1 1 1 ...
#> $ Symptom.OTHER : int 0 0 0 0 0 0 0 0 0 0 ...
#> $ Symptom.Swallowing : int 1 0 0 0 0 0 0 0 0 0 ...
#> $ Symptom.GAIT_CHANGES : int 0 0 0 0 0 0 0 0 0 0 ...
#> $ Symptom.Atrophy : int 0 0 0 0 0 0 0 0 0 0 ...
#> $ Symptom.Cramps : int 0 0 0 0 0 0 0 0 0 0 ...
#> $ Symptom.Fasciculations : int 0 0 0 0 0 0 0 0 0 0 ...
#> $ Symptom.SENSORY_CHANGES : int 0 0 0 0 0 0 0 0 0 0 ...
#> $ Symptom.Stiffness : int 0 0 0 0 0 0 0 0 0 0 ...
#> $ Symptom.. : int 0 0 0 0 0 0 0 0 0 0 ...
#> $ Site.of.Onset.Onset..Bulbar : int 1 0 0 0 1 1 0 0 1 1 ...
#> $ Site.of.Onset.Onset..Limb : int 0 1 1 1 0 0 1 1 0 0 ...
#> $ Site.of.Onset.Onset..Limb.and.Bulbar : int 0 0 0 0 0 0 0 0 0 0 ...
#> $ Race...Asian : int 0 0 0 0 0 0 0 0 0 0 ...
#> $ Race...Black.African.American : int 0 0 0 0 0 0 1 0 0 0 ...
#> $ Race...Caucasian : int 1 1 1 1 1 1 0 0 1 1 ...
#> $ Race...Other : int 0 0 0 0 0 0 0 1 0 0 ...
#> $ Age : int 38 72 46 66 70 37 41 70 67 71 ...
#> $ Sex.Female : int 0 0 0 0 0 0 0 0 0 0 ...
#> $ Sex.Male : int 0 0 1 0 0 0 1 1 0 0 ...
#> $ Aunt : int 0 0 0 0 0 0 0 0 0 0 ...
#> $ Aunt..Maternal. : int 0 0 0 0 0 0 0 0 0 0 ...
#> $ Cousin : int 0 0 0 0 0 0 0 0 0 0 ...
#> $ Father : int 0 0 0 0 0 0 0 0 0 0 ...
#> $ Grandfather..Maternal. : int 0 0 0 0 0 0 0 0 0 0 ...
#> $ Grandmother : int 0 0 0 0 0 0 0 0 0 0 ...
#> $ Grandmother..Maternal. : int 0 0 0 0 0 0 0 0 0 0 ...
#> $ Mother : int 0 0 0 1 0 0 0 0 0 0 ...
#> $ Uncle : int 0 0 0 0 0 0 0 0 0 0 ...
#> $ Uncle..Maternal. : int 0 0 0 0 0 0 0 0 0 0 ...
#> $ Uncle..Paternal. : int 0 0 0 0 0 0 0 0 0 0 ...
#> $ Son : int 0 0 0 0 0 0 0 0 0 0 ...
#> $ Daughter : int 0 0 0 0 0 0 0 0 0 0 ...
#> $ Sister : int 0 0 0 0 0 0 0 0 0 0 ...
#> $ Brother : int 0 0 0 0 0 0 0 0 0 0 ...
#> $ Family : int 0 0 0 1 0 0 0 0 0 0 ...
#> $ Neurological.Disease.OTHER : int 0 0 0 0 0 0 0 0 0 0 ...
#> $ Neurological.Disease.STROKE.NOS : int 0 0 0 1 0 0 0 0 0 0 ...
#> $ Neurological.Disease.DEMENTIA.NOS : int 0 0 0 0 0 0 0 0 0 0 ...
#> $ Neurological.Disease.PARKINSON.S.DISEASE: int 0 0 0 0 0 0 0 0 0 0 ...
#> $ Neurological.Disease.DAT : int 0 0 0 0 0 0 0 0 0 0 ...
#> $ Neurological.Disease.ALS : int 0 0 0 0 0 0 0 0 0 0 ...
#> $ Neurological.Disease.BRAIN.TUMOR : int 0 0 0 0 0 0 0 0 0 0 ...
#> $ Neurological.Disease.STROKE.ISCHEMIC : int 0 0 0 0 0 0 0 0 0 0 ...
#> $ Neurological.Disease.STROKE.HEMORRHAGIC : int 0 0 0 0 0 0 0 0 0 0 ...
#> $ Study.Arm.PLACEBO : int 0 0 1 0 0 0 1 0 0 0 ...
#> $ Study.Arm.ACTIVE : int 0 0 0 0 0 0 0 0 0 0 ...
#> $ max.alsfrs.score : int 24 28 35 30 33 23 26 29 35 33 ...
#> $ min.alsfrs.score : int 19 26 30 29 29 14 23 28 33 31 ...
#> $ last.alsfrs.score : int 21 26 30 29 33 14 25 28 33 31 ...
#> $ mean.alsfrs.score : num 21.2 27.3 31.8 29.5 31 ...
#> $ num.alsfrs.score.visits : int 4 3 4 4 4 4 4 3 4 4 ...
#> $ sum.alsfrs.score : int 85 82 127 118 124 78 100 86 138 128 ...
#> $ first.alsfrs.score.date : int 0 0 0 0 0 0 0 0 0 0 ...
#> $ last.alsfrs.score.date : int 70 67 78 90 84 91 85 35 91 84 ...
#> $ meansquares.alsfrs.score : num 455 748 1012 870 963 ...
#> $ sd.alsfrs.score : num 1.785 0.943 1.92 0.5 1.414 ...
#> $ alsfrs.score.slope : num 0 -0.909 -1.951 -0.338 0.725 ...
#> $ lessthan2.alsfrs.score : int 0 0 0 0 0 0 0 0 0 0 ...
#> $ no.alsfrs.score.data : int 0 0 0 0 0 0 0 0 0 0 ...
#> $ max.speech : int 2 4 4 4 3 2 4 4 2 1 ...
#> $ min.speech : int 2 4 4 4 2 1 4 4 2 0 ...
#> $ last.speech : int 2 4 4 4 2 1 4 4 2 0 ...
#> $ mean.speech : num 2 4 4 4 2.25 1.75 4 4 2 0.75 ...
#> $ sum.speech : int 8 12 16 16 9 7 16 12 8 3 ...
#> $ meansquares.speech : num 4 16 16 16 5.25 3.25 16 16 4 0.75 ...
#> $ sd.speech : num 0 0 0 0 0.433 ...
#> $ speech.slope : num 0 0 0 0 -0.362 ...
#> $ max.salivation : int 4 4 4 4 3 3 4 4 3 2 ...
#> $ min.salivation : int 2 4 3 4 2 1 4 3 3 1 ...
#> $ last.salivation : int 3 4 3 4 3 2 4 4 3 1 ...
#> $ mean.salivation : num 3 4 3.75 4 2.5 ...
#> $ sum.salivation : int 12 12 15 16 10 8 16 10 12 6 ...
#> $ meansquares.salivation : num 9.5 16 14.2 16 6.5 ...
#> $ sd.salivation : num 0.707 0 0.433 0 0.5 ...
#> $ salivation.slope : num 0 0 -0.39 0 0.362 ...
#> $ max.swallowing : int 4 4 4 4 3 3 4 4 3 3 ...
#> $ min.swallowing : int 4 4 4 4 2 2 4 3 3 2 ...
#> $ last.swallowing : int 4 4 4 4 3 2 4 4 3 2 ...
#> $ mean.swallowing : num 4 4 4 4 2.75 ...
#> $ sum.swallowing : int 16 12 16 16 11 11 16 10 12 9 ...
#> $ meansquares.swallowing : num 16 16 16 16 7.75 ...
#> $ sd.swallowing : num 0 0 0 0 0.433 ...
#> $ swallowing.slope : num 0 0 0 0 0 ...
#> $ max.handwriting : int 0 4 3 3 4 2 0 3 4 4 ...
#> $ min.handwriting : int 0 4 2 3 4 1 0 3 4 4 ...
#> $ last.handwriting : int 0 4 2 3 4 1 0 3 4 4 ...
#> $ mean.handwriting : num 0 4 2.25 3 4 1.25 0 3 4 4 ...
#> $ sum.handwriting : int 0 12 9 12 16 5 0 9 16 16 ...
#> $ meansquares.handwriting : num 0 16 5.25 9 16 1.75 0 9 16 16 ...
#> $ sd.handwriting : num 0 0 0.433 0 0 ...
#> $ handwriting.slope : num 0 0 -0.39 0 0 ...
#> $ max.cutting : int 1 4 3 3 4 2 0 3 4 4 ...
#> $ min.cutting : int 1 3 2 3 4 1 0 2 3 4 ...
#> $ last.cutting : int 1 3 2 3 4 1 0 2 3 4 ...
#> [list output truncated]