Test your knowledge of the material on logistic regression in the following quiz to see how much you learned. This is entirely private for you---no records are kept of your performance.
1. Logistic regression is used when the response variable is:
Logistic regression is specifically designed for binary response variables (0/1, success/failure, yes/no). It models the probability that Y = 1 using the logistic function, which ensures predictions stay within [0, 1].
2. What is the main problem with using ordinary least squares (OLS) regression for a binary response?
OLS has two main problems with binary responses: (1) predicted values can fall outside the valid probability range of [0, 1], and (2) the variance is not constant—it equals π(1-π), which varies with the fitted values. This violates the homogeneity of variance assumption and makes hypothesis tests invalid.
3. The logit link function in logistic regression is defined as:
The logit link function is logit(π) = log[π/(1-π)], the natural logarithm of the odds. This transforms probabilities from the (0, 1) range to (-∞, +∞), allowing us to use a linear model. The inverse transformation π = 1/(1 + exp(-logit)) maps back to probabilities.
4. In a logistic regression model, the coefficient β for a predictor represents:
In logistic regression, the model is linear on the log-odds (logit) scale: logit(π) = α + βx. Therefore, β represents the change in log odds of Y = 1 for a one-unit increase in x. To get the effect on odds, exponentiate: odds multiply by eβ.
5. If a logistic regression coefficient is β = 0.5, the odds ratio (multiplicative effect on odds) is:
To convert a log odds coefficient to an odds ratio (the multiplicative effect on odds), exponentiate: e0.5 ≈ 1.65. This means the odds of Y = 1 are multiplied by 1.65 (a 65% increase) for each unit increase in the predictor.
6. In R, how do you fit a logistic regression model for a
binary response variable outcome with predictors
age and sex?
Use glm() with family = binomial to fit logistic regression: glm(outcome ~ age + sex, family = binomial). The 'binomial' family uses the logit link by default. You can also specify family = binomial(link = "logit") explicitly.
7. Logistic regression models are typically fit using:
Logistic regression uses maximum likelihood estimation (MLE), which finds parameter values that maximize the probability of observing the data. This is solved iteratively because there's no closed-form solution. MLE has better statistical properties than OLS for binary data.
8. In logistic regression, hypothesis tests for individual coefficients typically use:
Individual coefficients in logistic regression are tested using Wald z-tests (coefficient divided by its standard error) or equivalently χ² tests (z² follows a chi-square distribution with 1 df). The summary() output shows z values and p-values for each coefficient.
9. Effect plots from the effects package display
predicted effects:
Effect plots show predicted effects for high-order terms (main effects, interactions) in the model, with other predictors held at typical values (means for continuous, proportions for factors). This allows you to visualize each term's effect while controlling for others.
10. In model diagnostics for logistic regression, an influential observation is characterized by:
Influential observations have both high leverage (unusual predictor values, far from X̄) AND large residuals (poor fit, Y far from ŷ). The heuristic formula is Influence ≈ Leverage × Residual². Cook's D measures this combined effect. High leverage alone or large residual alone is less problematic.