A company recently introduced a new health insurance provider for its employees. At the beginning of the year the employees had to choose one of three (or four) different health plan products from this provider to best suit their needs.
This dataset was modified from its original source (McNulty, 2022) for the present purposes by adding a fourth choice, sampled randomly from the original three.
Usage
data("HealthInsurance", package = "nestedLogit")
Format
A data frame with 1448 rows and 7 columns.
- product
Choice among three products, a factor with levels
"A"
,"B"
, and"C"
.- product4
Choice among four products, a factor with levels
"A"
,"B"
,"C"
, and"D"
.- age
The age of the individual, in years.
- household
The number of people living with the individual in the same household.
- position_level
Position level in the company at the time the choice was made, where 1 is is the lowest level and 5 is the highest, a numeric vector.
- gender
The gender of the individual, a factor with levels
"Female"
and"Male"
.- absent
The number of days the individual was absent from work in the year prior to the choice,
Source
Originally taken from McNulty, K. (2022). Handbook of Regression Modeling in People Analytics, https://peopleanalytics-regression-book.org/data/health_insurance.csv.
Examples
lbinary <- logits(AB_CD = dichotomy(c("A", "B"), c("C", "D")),
A_B = dichotomy("A", "B"),
C_D = dichotomy("C", "D"))
as.matrix(lbinary)
#> A B C D
#> AB_CD 0 0 1 1
#> A_B 0 1 NA NA
#> C_D NA NA 0 1
health.nested <- nestedLogit(product4 ~ age + gender * household + position_level,
dichotomies = lbinary, data = HealthInsurance)
car::Anova(health.nested)
#>
#> Analysis of Deviance Tables (Type II tests)
#>
#> Response AB_CD: {A, B} vs. {C, D}
#> LR Chisq Df Pr(>Chisq)
#> age 161.171 1 < 2.2e-16 ***
#> gender 30.412 1 3.493e-08 ***
#> household 128.772 1 < 2.2e-16 ***
#> position_level 0.044 1 0.8344
#> gender:household 16.299 1 5.408e-05 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#>
#> Response A_B: {A} vs. {B}
#> LR Chisq Df Pr(>Chisq)
#> age 229.664 1 < 2.2e-16 ***
#> gender 75.537 1 < 2.2e-16 ***
#> household 127.743 1 < 2.2e-16 ***
#> position_level 27.164 1 1.869e-07 ***
#> gender:household 0.091 1 0.7633
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#>
#> Response C_D: {C} vs. {D}
#> LR Chisq Df Pr(>Chisq)
#> age 116.663 1 < 2.2e-16 ***
#> gender 5.355 1 0.02066 *
#> household 52.861 1 3.58e-13 ***
#> position_level 0.018 1 0.89305
#> gender:household 1.545 1 0.21384
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#>
#> Combined Responses
#> LR Chisq Df Pr(>Chisq)
#> age 507.50 3 < 2.2e-16 ***
#> gender 111.30 3 < 2.2e-16 ***
#> household 309.38 3 < 2.2e-16 ***
#> position_level 27.23 3 5.278e-06 ***
#> gender:household 17.94 3 0.0004536 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
coef(health.nested)
#> AB_CD A_B C_D
#> (Intercept) -3.85986638 -2.19851256 4.826064163
#> age 0.05740364 0.17267537 -0.071680487
#> genderMale 1.46728946 -2.45841955 -0.824955433
#> household 0.40271031 -0.70434692 -0.350313558
#> position_level 0.01029949 -0.56167558 -0.009193197
#> genderMale:household -0.22931808 0.05779896 0.107014721