Quiz 6: Logistic Regression 📊

Test your knowledge of the material on logistic regression in the following quiz to see how much you learned. This is entirely private for you---no records are kept of your performance.

Questions

1. Logistic regression is used when the response variable is:

Continuous and normally distributed Binary (0/1, success/failure, yes/no) Count data following a Poisson distribution Categorical with more than two unordered categories

Logistic regression is specifically designed for binary response variables (0/1, success/failure, yes/no). It models the probability that Y = 1 using the logistic function, which ensures predictions stay within [0, 1].

2. What is the main problem with using ordinary least squares (OLS) regression for a binary response?

OLS cannot handle categorical predictors OLS predictions can fall outside the [0, 1] range and violates homogeneity of variance OLS is too computationally intensive for binary data OLS requires a larger sample size than logistic regression

OLS has two main problems with binary responses: (1) predicted values can fall outside the valid probability range of [0, 1], and (2) the variance is not constant—it equals π(1-π), which varies with the fitted values. This violates the homogeneity of variance assumption and makes hypothesis tests invalid.

3. The logit link function in logistic regression is defined as:

logit(π) = π / (1 - π) logit(π) = log[π / (1 - π)] logit(π) = 1 / (1 + exp(-π)) logit(π) = exp(π)

The logit link function is logit(π) = log[π/(1-π)], the natural logarithm of the odds. This transforms probabilities from the (0, 1) range to (-∞, +∞), allowing us to use a linear model. The inverse transformation π = 1/(1 + exp(-logit)) maps back to probabilities.

4. In a logistic regression model, the coefficient β for a predictor represents:

The change in probability for a unit increase in the predictor The change in log odds for a unit increase in the predictor The odds ratio for the predictor The correlation between the predictor and response

In logistic regression, the model is linear on the log-odds (logit) scale: logit(π) = α + βx. Therefore, β represents the change in log odds of Y = 1 for a one-unit increase in x. To get the effect on odds, exponentiate: odds multiply by e^β.

5. If a logistic regression coefficient is β = 0.5, the odds ratio (multiplicative effect on odds) is:

0.5 1.5 exp(0.5) ≈ 1.65 log(0.5) ≈ -0.69

To convert a log odds coefficient to an odds ratio (the multiplicative effect on odds), exponentiate: e^0.5 ≈ 1.65. This means the odds of Y = 1 are multiplied by 1.65 (a 65% increase) for each unit increase in the predictor.

6. In R, how do you fit a logistic regression model for a binary response variable outcome with predictors age and sex?

lm(outcome ~ age + sex) glm(outcome ~ age + sex, family = binomial) loglm(outcome ~ age + sex) logit(outcome ~ age + sex)

Use glm() with family = binomial to fit logistic regression: glm(outcome ~ age + sex, family = binomial). The 'binomial' family uses the logit link by default. You can also specify family = binomial(link = "logit") explicitly.

7. Logistic regression models are typically fit using:

Ordinary least squares (OLS) Maximum likelihood estimation (MLE) Weighted least squares (WLS) Bayesian estimation with non-informative priors

Logistic regression uses maximum likelihood estimation (MLE), which finds parameter values that maximize the probability of observing the data. This is solved iteratively because there's no closed-form solution. MLE has better statistical properties than OLS for binary data.

8. In logistic regression, hypothesis tests for individual coefficients typically use:

F-tests Wald z-tests or χ² tests t-tests ANOVA F-tests

Individual coefficients in logistic regression are tested using Wald z-tests (coefficient divided by its standard error) or equivalently χ² tests (z² follows a chi-square distribution with 1 df). The summary() output shows z values and p-values for each coefficient.

9. Effect plots from the effects package display predicted effects:

For each predictor separately, ignoring all other predictors For high-order terms in the model, averaged over (controlling for) other predictors Only for the intercept term Only for interaction terms

Effect plots show predicted effects for high-order terms (main effects, interactions) in the model, with other predictors held at typical values (means for continuous, proportions for factors). This allows you to visualize each term's effect while controlling for others.

10. In model diagnostics for logistic regression, an influential observation is characterized by:

High leverage only (unusual predictor values) Large residual only (poor fit) Both high leverage and large residual (unusual in X and Y) Small Cook's D statistic

Influential observations have both high leverage (unusual predictor values, far from X̄) AND large residuals (poor fit, Y far from ŷ). The heuristic formula is Influence ≈ Leverage × Residual². Cook's D measures this combined effect. High leverage alone or large residual alone is less problematic.