The use of the negative binomial distribution to account for
overdispersion in the standard Poisson model for count data in the
glm() framework, as provided by MASS::glm.nb()
might need a bit more explanation than is provided in the lecture
slides, particularly as to the interpretation of the \(\theta\) and \(\alpha = 1 / \theta\) parameters.
MASS::glm.nb() fits the NB2
parameterization of the negative binomial, where the variance is a
quadratic function of the mean:
\[\text{Var}(Y) = \mu + \frac{\mu^2}{\theta}\]
theta (\(\theta\)) is
the size / precision parameter. Its reciprocal \(\alpha = 1/\theta\) is the
overdispersion parameter:
Expressed in terms of \(\alpha\), this is:
\[\text{Var}(Y) = \mu + \alpha\mu^2 = \mu(1 + \alpha\mu)\]
As \(\theta \to \infty\) (i.e. \(\alpha \to 0\)), the variance collapses to \(\mu\) and the distribution reduces to Poisson.
data(PhdPubs, package = "vcdExtra")
# Poisson baseline
phd.pois <- glm(articles ~ ., data = PhdPubs, family = poisson)
# Negative binomial
phd.nbin <- glm.nb(articles ~ ., data = PhdPubs)
# Extract theta and alpha
theta_hat <- phd.nbin$theta
se_theta <- phd.nbin$SE.theta
alpha_hat <- 1 / theta_hat
cat(sprintf("theta (size): %.4f (SE = %.4f)\n", theta_hat, se_theta))
## theta (size): 2.2670 (SE = 0.2716)
cat(sprintf("alpha = 1/theta: %.4f\n", alpha_hat))
## alpha = 1/theta: 0.4411
The most concrete interpretation comes from writing the model as a Poisson-Gamma mixture. Each observation has its own latent rate:
\[\lambda_i = \mu_i \cdot u_i, \qquad u_i \sim \text{Gamma}(\theta,\ 1/\theta)\]
where \(u_i\) is a multiplicative individual-level random effect with:
Interpretation of \(\alpha\): the variance of the unobserved multiplier applied to each person’s expected count. It captures all the individual heterogeneity not explained by the predictors.
# Parameters of the Gamma mixing distribution for u_i
shape_u <- theta_hat # = theta
rate_u <- theta_hat # so that E(u) = shape/rate = 1
cat(sprintf("Gamma mixing distribution for u_i:\n"))
## Gamma mixing distribution for u_i:
cat(sprintf(" shape = rate = theta = %.4f\n", theta_hat))
## shape = rate = theta = 2.2670
cat(sprintf(" E(u_i) = 1 (by construction)\n"))
## E(u_i) = 1 (by construction)
cat(sprintf(" Var(u_i) = 1/theta = alpha = %.4f\n", alpha_hat))
## Var(u_i) = 1/theta = alpha = 0.4411
cat(sprintf(" SD(u_i) = %.4f\n", sqrt(alpha_hat)))
## SD(u_i) = 0.6642
# Approximate range of u_i (Gamma 5th–95th percentiles)
lo <- qgamma(0.05, shape = shape_u, rate = rate_u)
hi <- qgamma(0.95, shape = shape_u, rate = rate_u)
cat(sprintf(" Middle 90%% of u_i: [%.2f, %.2f]\n", lo, hi))
## Middle 90% of u_i: [0.21, 2.28]
u_seq <- seq(0, 4, length.out = 500)
u_df <- data.frame(
u = u_seq,
density = dgamma(u_seq, shape = shape_u, rate = rate_u)
)
ggplot(u_df, aes(x = u, y = density)) +
geom_line(linewidth = 1.1, color = "#2171b5") +
geom_area(alpha = 0.15, fill = "#2171b5") +
geom_vline(xintercept = 1, linetype = "dashed", color = "grey40") +
annotate("text", x = 1.05, y = Inf, vjust = 1.5,
label = "E(u) = 1", color = "grey40", size = 4) +
annotate("rect", xmin = lo, xmax = hi, ymin = 0, ymax = Inf,
alpha = 0.08, fill = "firebrick") +
annotate("text", x = (lo + hi) / 2, y = Inf, vjust = 3,
label = "middle 90%", color = "firebrick", size = 3.5) +
labs(
title = "Gamma mixing distribution: u_i ~ Gamma(θ, θ)",
subtitle = sprintf("θ = %.2f, α = 1/θ = %.2f, SD(u_i) = %.2f",
theta_hat, alpha_hat, sqrt(alpha_hat)),
x = expression(u[i] ~ "(multiplicative rate multiplier)"),
y = "Density"
) +
theme_bw(base_size = 13)
Gamma mixing distribution for the individual-level multiplier u_i. Each PhD student’s true rate is their predicted count μ_i multiplied by their u_i.
In plain language: even after accounting for all predictors, individual publication rates vary by a factor with SD ≈ 0.66 around the model prediction. A student at the 5th percentile of \(u_i\) publishes at 0.21× their predicted rate; one at the 95th publishes at 2.28×.
The variance inflation factor (VIF) relative to Poisson depends on the mean \(\mu\):
\[\text{VIF}(\mu) = \frac{\text{Var}(Y)}{\mu} = 1 + \frac{\mu}{\theta} = 1 + \alpha\mu\]
mu_vals <- c(1, 2, 4, 6, 10)
vif <- 1 + alpha_hat * mu_vals
data.frame(mu = mu_vals, VIF = round(vif, 2),
`Var(Y)` = round(vif * mu_vals, 2)) |>
knitr::kable(caption = "Variance inflation at selected values of μ (PhdPubs NB fit)")
| mu | VIF | Var.Y. |
|---|---|---|
| 1 | 1.44 | 1.44 |
| 2 | 1.88 | 3.76 |
| 4 | 2.76 | 11.06 |
| 6 | 3.65 | 21.88 |
| 10 | 5.41 | 54.11 |
mu_seq <- seq(0.5, 12, length.out = 200)
vif_df <- data.frame(
mu = mu_seq,
Poisson = mu_seq,
NegBin = mu_seq + alpha_hat * mu_seq^2
)
tidyr::pivot_longer(vif_df, -mu, names_to = "model", values_to = "variance") |>
ggplot(aes(x = mu, y = variance, color = model)) +
geom_line(linewidth = 1.1) +
scale_color_manual(values = c(Poisson = "#e6550d", NegBin = "#2171b5")) +
labs(
title = "Var(Y): Poisson vs. Negative Binomial",
subtitle = sprintf("NB: Var(Y) = μ + αμ², α = 1/θ = %.3f", alpha_hat),
x = "Mean μ",
y = "Var(Y)",
color = NULL
) +
theme_bw(base_size = 13) +
theme(legend.position = "inside", legend.position.inside = c(0.2, 0.8))
Variance of Y under Poisson (= μ) vs. negative binomial, as a function of the mean.
The likelihood-ratio test compares the NB log-likelihood to Poisson. Under \(H_0: \theta \to \infty\) (no overdispersion), the test statistic is on the boundary of the parameter space, so the p-value is conservative (halved \(\chi^2_1\)).
# LRT: NegBin vs Poisson
LRT <- 2 * (logLik(phd.nbin) - logLik(phd.pois))
pval <- pchisq(LRT, df = 1, lower.tail = FALSE) / 2 # boundary correction
cat(sprintf("LRT statistic: %.2f (df = 1)\n", LRT))
## LRT statistic: 179.97 (df = 1)
cat(sprintf("p-value (boundary-corrected): %.2e\n", pval))
## p-value (boundary-corrected): 2.46e-41
# AIC comparison
cat(sprintf("\nAIC — Poisson: %.1f, NegBin: %.1f, ΔAIC: %.1f\n",
AIC(phd.pois), AIC(phd.nbin), AIC(phd.pois) - AIC(phd.nbin)))
##
## AIC — Poisson: 3313.3, NegBin: 3135.4, ΔAIC: 178.0
| Quantity | Expression | PhdPubs estimate |
|---|---|---|
theta (\(\theta\)) |
size / precision | 2.267 |
alpha (\(\alpha =
1/\theta\)) |
Var(\(u_i\)) | 0.441 |
| SD of \(u_i\) | \(\sqrt{\alpha}\) | 0.664 |
| Poisson limit | \(\theta \to \infty\), \(\alpha \to 0\) | — |
| VIF at \(\mu = 4\) | \(1 + \alpha\mu\) | 2.76 |
The \(\alpha\) parameterization
(used by Stata, pscl, countreg) is more
directly interpretable as an overdispersion quantity. The \(\theta\) parameterization is more natural
as the shape of the Gamma mixing distribution. Either way, the test
\(H_0: \theta \to \infty\) (i.e. \(\alpha = 0\)) is the test against
Poisson.