Poisson Regression for Count Data: Complete Statistical & Coding Tutorial

blog-cover-image

Poisson Regression for Count Data: Complete Statistical & Coding Tutorial

Count data is ubiquitous in the real world, from the number of insurance claims an individual files, to website visits per hour, to daily transaction counts in finance. Traditional linear regression models fall short for such non-negative integer data, but Poisson regression is specifically designed for this scenario. This comprehensive tutorial covers the complete theory, statistical underpinnings, and hands-on coding for Poisson regression, guiding you from the mathematical foundation to advanced topics and real-world case studies.

1. When to Use Poisson Regression (Count Data Examples)

Poisson regression is ideal when your dependent variable is a count—a non-negative integer (0, 1, 2, ...). Typical scenarios include:

Number of insurance claims per policyholder per year
Website visits or clicks per hour
Number of doctor visits per patient per month
Defects per manufactured batch
Transaction counts in financial accounts

The key characteristics of count data:

Non-negative integers (cannot be negative or fractional)
Often have many zeros
Variance may not equal the mean (see overdispersion)

2. Mathematical Derivation from Poisson Distribution

Poisson regression is grounded in the Poisson distribution, which models the probability of a given number of events occurring in a fixed interval of time or space.

The probability mass function (pmf) of the Poisson distribution:

$$ P(Y = y) = \frac{e^{-\lambda} \lambda^y}{y!} $$

$ Y $ = observed count (0, 1, 2, ...)
$ \lambda $ = expected count (mean rate parameter, $ \lambda > 0 $)

In Poisson regression, we model $ \lambda $ as a function of explanatory variables $ X $:

$$ \lambda_i = \exp(\beta_0 + \beta_1 x_{i1} + \cdots + \beta_p x_{ip}) $$

This exponential function ensures $ \lambda $ is always positive.

3. Link Functions and the Exponential Mean

Generalized Linear Models (GLMs) require specifying a link function between the linear predictors and the mean of the distribution.

For Poisson regression, the canonical link is the log function.

The model is:

$$ \log(\lambda_i) = \beta_0 + \beta_1 x_{i1} + \cdots + \beta_p x_{ip} $$

Therefore,

$$ \lambda_i = \exp(\beta_0 + \beta_1 x_{i1} + \cdots + \beta_p x_{ip}) $$

This ensures the mean $ \lambda_i $ is always positive, suitable for counts.

4. Overdispersion: Detection and Solutions (Negative Binomial)

A key assumption of the Poisson model is that the mean equals the variance:

$$ E[Y] = Var[Y] = \lambda $$

However, in real data, the variance often exceeds the mean—this is called overdispersion.

How to Detect Overdispersion

Compute sample mean and variance: if variance > mean, overdispersion is likely.
Check the deviance or Pearson chi-square / degrees of freedom: values > 1 suggest overdispersion.

Solutions: Negative Binomial Regression

The Negative Binomial model adds a dispersion parameter $ \alpha $:

$$ Var[Y] = \lambda + \alpha \lambda^2 $$

This extra parameter allows the variance to exceed the mean.

5. Zero-Inflation: When and How to Handle It

Many count datasets have more zeros than expected under a Poisson or Negative Binomial model. This is called zero-inflation.

Typical examples:

Many people have zero insurance claims in a year.
Most users have zero purchases during a sale period.

Zero-Inflated Models

Zero-Inflated Poisson (ZIP): Mixture of a Poisson and a point mass at zero.
Zero-Inflated Negative Binomial (ZINB): Same as ZIP but with a Negative Binomial component.

These models estimate:

The probability of being in the "always zero" group
The count process for the "not always zero" group

6. Model Assumptions and Diagnostic Testing

To ensure valid inference from Poisson regression:

Count nature: Outcome is count data (non-negative integers).
Independence: Counts are independent across observations.
Linearity: The log of the mean is a linear function of predictors.
Mean-Variance equality: The mean equals the variance (check for overdispersion).

Diagnostic Testing

Residual plots: check for patterns.
Pearson and deviance residuals: large values may indicate lack of fit.
Goodness of fit statistics: compare against alternative models.

7. Python Implementation with `statsmodels`

The statsmodels package in Python is well-suited for Poisson regression.


import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf

# Example data
data = pd.DataFrame({
    'claims': [0, 1, 0, 2, 1, 3, 0, 2, 1, 0],
    'age': [25, 45, 35, 50, 23, 40, 29, 55, 38, 31]
})

# Fit Poisson regression
model = smf.glm(formula='claims ~ age', data=data, family=sm.families.Poisson())
result = model.fit()
print(result.summary())

For Negative Binomial:


model_nb = smf.glm(formula='claims ~ age', data=data, family=sm.families.NegativeBinomial())
result_nb = model_nb.fit()
print(result_nb.summary())

8. Alternative Implementations with `sklearn` and GLMs

As of scikit-learn 1.0+, you can fit a Poisson regression using HistGradientBoostingRegressor or PoissonRegressor:


from sklearn.linear_model import PoissonRegressor

X = data[['age']]
y = data['claims']

pr = PoissonRegressor(alpha=0)
pr.fit(X, y)
print("Intercept:", pr.intercept_)
print("Coef:", pr.coef_)

For more advanced GLMs, statsmodels remains the preferred library.

9. Interpretation of Coefficients: Incidence Rate Ratios

Coefficients in Poisson regression are on the log scale. To interpret their effects on the count:

Exponentiate coefficients to get Incidence Rate Ratios (IRR).

If $ \beta_1 = 0.05 $, then the IRR is $ e^{0.05} \approx 1.05 $.

This means: a one-unit increase in predictor increases the expected count by 5% (holding other variables constant).

To compute IRRs and their confidence intervals:


import numpy as np

params = result.params
conf = result.conf_int()
irr = np.exp(params)
irr_ci = np.exp(conf)
print('Incidence Rate Ratios:')
print(irr)
print('95% CI:')
print(irr_ci)

10. Model Comparison: Poisson vs. OLS for Count Data

Why not use OLS regression for count data?

OLS assumes normality and constant variance: count data are not normally distributed and variance depends on the mean.
OLS can predict negative counts: which are not possible for count data.
Poisson regression naturally handles zeros and skewness: providing more accurate and interpretable results.

Using model fit statistics (AIC, BIC, deviance), Poisson almost always outperforms OLS for count outcomes.

11. Case Study 1: Insurance Claim Frequency

Suppose you analyze the number of insurance claims per year, as a function of age and car type.


import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf

# Example dataset
data = pd.DataFrame({
    'claims': [0,1,2,0,3,2,1,0,1,4,0,2],
    'age': [25,45,35,50,23,40,29,55,38,31,47,53],
    'car_type': ['sedan','suv','suv','sedan','sedan','suv','sedan','suv','sedan','suv','sedan','suv']
})

# Fit Poisson regression
model = smf.glm(formula='claims ~ age + car_type', data=data, family=sm.families.Poisson())
result = model.fit()
print(result.summary())

Interpretation: check if age or car type significantly affects claim frequency.

12. Case Study 2: Website Traffic Modeling

Modeling the number of visits per hour as a function of time of day, day of week, and promotions.


# Generate example data
import numpy as np

np.random.seed(0)
data = pd.DataFrame({
    'visits': np.random.poisson(lam=5, size=100),
    'hour': np.random.randint(0, 24, size=100),
    'is_weekend': np.random.randint(0, 2, size=100),
    'promotion': np.random.randint(0, 2, size=100)
})

# Fit model
model = smf.glm(formula='visits ~ hour + is_weekend + promotion', data=data, family=sm.families.Poisson())
result = model.fit()
print(result.summary())

Interpret: which factors drive higher hourly traffic?

13. Case Study 3: Financial Transaction Counts

Predicting transaction counts per day per customer, using customer features.


# Example data
data = pd.DataFrame({
    'transactions': np.random.poisson(lam=2, size=50),
    'account_age_months': np.random.randint(1, 60, size=50),
    'is_premium': np.random.randint(0, 2, size=50)
})

# Fit model
model = smf.glm(formula='transactions ~ account_age_months + is_premium', data=data, family=sm.families.Poisson())
result = model.fit()
print(result.summary())

Interpret: does a premium account or account age affect transaction frequency?

14. Advanced: Time Series of Counts (INAR Models)

When counts are observed over time (e.g., daily arrivals), autocorrelation may be present. Integer-valued Autoregressive (INAR) models are designed for such data:

The INAR(1) model:

$$ Y_t = \alpha \circ Y_{t-1} + \epsilon_t $$

$ \alpha \circ Y_{t-1} $: binomial thinning operator to ensure counts remain integers
$ \epsilon_t $: new arrivals, usually Poisson distributed

Python implementation can use the arch or tscount packages for advanced time series count modeling.

15. Bayesian Poisson Regression with PyMC3

Bayesian approaches allow flexible prior specification and full uncertainty quantification.


import pymc3 as pm
import numpy as np

# Example data
np.random.seed(1)
x = np.random.randn(100)
y = np.random.poisson(lam=np.exp(0.5*x + 1))

with pm.Model() as model:
    beta0 = pm.Normal('beta0', mu=0, sigma=10)
    beta1 = pm.Normal('beta1', mu=0, sigma=10)
    mu = pm.math.exp(beta0 + beta1 * x)
    y_obs = pm.Poisson('y_obs', mu=mu, observed=y)
    trace = pm.sample(2000, tune=1000, cores=1)
    
pm.summary(trace)

This yields posterior distributions for all parameters, not just point estimates.

16. Common Pitfalls and Troubleshooting

Ignoring overdispersion: Leads to underestimated standard errors and misleading p-values.
Not checking for zero-inflation: Can result in poor model fit.
Using OLS for count data: May produce nonsensical (negative) predictions.
Perfect separation: Results in infinite coefficients—check variable coding.
Multicollinearity: Highly correlated predictors can inflate variance of estimates and make interpretation difficult. Check VIF (Variance Inflation Factor) and correlation matrices.
Omitted variable bias: Leaving out important predictors can bias coefficients. Always consider domain knowledge for variable selection.
Incorrect link function: The log link is standard for Poisson, but using other links requires strong justification.
Ignoring exposure/time at risk: When observations have different durations or exposure, include offset terms (e.g., log of exposure) in the model.
Misinterpreting coefficients: Remember, coefficients are on the log scale. Always exponentiate to interpret as incidence rate ratios.

Troubleshooting steps:

Check residuals for patterns indicating poor fit.
Compare model fit statistics (AIC, BIC) across Poisson, Negative Binomial, and zero-inflated models.
Validate predictions on a hold-out dataset or via cross-validation.
Consult subject matter experts if unexpected results occur.

17. Interview Questions on Count Data Models

What is the Poisson distribution? When is it appropriate to use?
The Poisson distribution models the probability of counts of events occurring in a fixed interval, given a known constant mean rate. Use when modeling non-negative integer counts that arise from independent events.
What are the assumptions of Poisson regression?
The outcome is a count, counts are independent, the log of the mean is a linear function of predictors, and the mean equals the variance.
How do you detect and handle overdispersion?
Compare mean and variance, examine Pearson/deviance statistics. If detected, use Negative Binomial regression.
Explain zero-inflated models. When are they needed?
When there are more zeros than expected under Poisson/NegBin, zero-inflated models add a separate process for "always-zero" cases.
How do you interpret a coefficient in Poisson regression?
Exponentiate the coefficient to get the incidence rate ratio: the multiplicative effect on the mean count for a one-unit increase in the predictor.
Why not use OLS for count data?
OLS can predict negative values, assumes constant variance, and is not suitable for skewed/count data.
What is an offset in Poisson regression?
An offset is a log(exposure) term added to the model to account for different observation periods or exposure levels.
What is the difference between Poisson and Negative Binomial regression?
Negative Binomial regression includes an extra parameter for overdispersion (variance greater than mean).
How do you assess model fit for count data models?
Use AIC/BIC, deviance statistics, residual plots, and predictive checks.
Describe a real-world example where Poisson regression is appropriate.
E.g., modeling the number of insurance claims per policyholder per year based on age, gender, and policy type.

Conclusion

Poisson regression is a powerful and interpretable tool for modeling count data, offering a principled statistical framework for rates and frequencies. Understanding overdispersion, zero-inflation, and the proper use of model diagnostics ensures robust inference. With modern Python libraries such as statsmodels, scikit-learn, and PyMC3, you can easily implement, extend, and interpret Poisson models for a wide variety of real-world applications. Whether in insurance, web analytics, healthcare, or finance, mastering Poisson regression and its extensions will elevate your statistical modeling and data science skillset.

References & Further Reading

Agresti, A. (2015). Foundations of Linear and Generalized Linear Models. Wiley.
Hilbe, J. M. (2011). Negative Binomial Regression. Cambridge University Press.
statsmodels Poisson Regression Documentation
scikit-learn PoissonRegressor
PyMC3 Poisson Regression Example
Wikipedia: Poisson Regression