Choosing the Right Observation Model

Fast-BOCPD packages each conjugate likelihood–prior pair as a Python class (fast_bocpd.models). Selecting the right model ensures changepoint probabilities reflect the structure of your data. This chapter ties the statistical assumptions to concrete data characteristics and points out how our implementation names these concepts.

Quick Decision Path

  1. Identify the measurement type

  2. For continuous data, assess tail behaviour

    • Well-behaved / normal-like → Gaussian-NIG

    • Outliers or fat tails → Student-t (fixed nu)

    • Unknown tail heaviness → Student-t (grid nu)

  3. For high-count Poisson data (λ > ~20) you may approximate with Gaussian-NIG if speed is more critical than an exact count model.

Model Comparison Snapshot

Throughput numbers are taken from Benchmark Results (100k observations, online mode) to give a sense of relative cost.

Model

Data type

Robustness

Throughput (obs/s)

Typical use

GaussianNIG

Continuous

Low

25,063

Clean sensor data

StudentTNG (fixed ν)

Continuous

High

21,796

Financial returns

StudentTNG (ν grid)

Continuous

Very high

3,471

Unknown tail heaviness

PoissonGamma

Counts

Medium

21,402

Event rates

BernoulliBeta

Binary

Exact

33,573

Success/failure streams

BinomialBeta

Proportions (k/N)

Exact

14,599

Conversion rates

GammaGamma

Positive continuous

Medium

24,290

Durations / amounts

Observation Model Details

Each class mirrors the notation in Conjugate Priors and maps cleanly to a C implementation under fast_bocpd/_c. Below are the key considerations and tuning tips per model.

GaussianNIG (GaussianNIG)

Assumptions: iid Gaussian within each regime with unknown mean and variance. Hyperparameters mu0, kappa0, alpha0, beta0 match the Normal-Inverse-Gamma prior.

Use when: data are continuous, roughly bell-shaped, and you value maximum throughput.

Tips:

  • Center data around zero and set mu0 accordingly.

  • kappa0=1 keeps the prior weak; increasing it enforces a tighter belief about the mean.

  • Larger alpha0 / beta0 shrink the variance towards a prior guess.

StudentTNG (StudentTNG)

Assumptions: Student-t likelihood obtained via a Normal-Gamma prior. Supports fixed nu or a grid of ν values. Our implementation stores a flag is_grid so the C layer knows whether to dispatch to student_t_ng.c or student_t_ng_grid.c.

Use when: data contain sporadic outliers or heavy tails.

Choosing ν:

  • nu=1 behaves like Cauchy (extreme robustness, slower adaptation).

  • nu=3 is a good default for financial/operational data.

  • Grid mode allows nu=[2, 3, 5, 10, 20] with optional nu_prior if you want the algorithm to learn the right tail heaviness at runtime.

PoissonGamma (PoissonGamma)

Assumptions: Counts per unit time with a Gamma prior on the rate lambda. Parameters alpha0/beta0 correspond to prior event counts and exposure.

Use when: you observe integer counts (clicks, failures, arrivals) and need exact handling of discrete jumps. Offline mode is especially fast for this model due to vectorised log-factorials in C.

BernoulliBeta (BernoulliBeta) & BinomialBeta (BinomialBeta)

Assumptions: Binary outcomes or aggregate successes out of n_trials. These models share identical sufficient statistics (counts of successes and total trials). Binomial-Beta exposes n_trials (via n_trials attribute) and automatically caches log_N_factorial in the C layer for numerical stability.

Use when: dealing with conversion rates, A/B tests, or thresholded signals.

GammaGamma (GammaGamma)

Assumptions: Positive continuous data with fixed shape parameter shape. Useful for dwell times, monetary amounts, or any strictly-positive metric.

Use when: the distribution is skewed (right tail) but strictly positive. Because the shape is fixed, choose shape=1 for exponential-like data or >1 for more symmetric positive data.

Hazard Function Interaction

Observation models specify how data behave between changepoints; the hazard function (fast_bocpd.hazard.ConstantHazard) specifies when changepoints occur on average. The two interact via the BOCPD recursion:

  • High hazards (small lambda_) expect frequent regime changes. Use them when your metrics fluctuate often (e.g., user traffic with daily resets).

  • Low hazards (large lambda_) assume long, stable runs. Pair them with robust models (Student-t) to avoid false positives when rare outliers appear.

Practical Recommendations

  1. Start with either GaussianNIG or StudentTNG depending on outlier expectations. Switch to discrete models only when your data type demands it.

  2. Keep priors weak (kappa0 = alpha0 = beta0 = 1) while prototyping. Tighten them only if you have domain knowledge.

  3. Use grid Student-t sparingly; reserve it for critical applications where robustness matters more than throughput.

  4. For proportions with a large denominator (n_trials >= 50), consider whether a Gaussian approximation is sufficient—Binomial-Beta is exact but slower.

For a full statistical comparison and benchmark numbers, refer to Model Comparison and Benchmark Results.

Example:

model = PoissonGamma(
    alpha0=1.0,   # Prior shape
    beta0=1.0     # Prior rate
)

Real-world use cases:

  • Website clicks per hour

  • Server errors per day

  • Customer arrivals per minute

  • Defects per product batch

Tuning tips:

  • Prior mean is alpha0 / beta0

  • Set this to your expected event rate

  • Higher α₀ and β₀ (with same ratio) = stronger prior

Bernoulli (Beta Prior)

Statistical Model:

\[\begin{split}x_t | p &\sim \text{Bernoulli}(p) \\\\ p &\sim \text{Beta}(\alpha_0, \beta_0)\end{split}\]

When to use:

  • Binary outcomes (success/failure, yes/no)

  • Probability estimation (coin flips, conversion)

  • Data is 0 or 1

Strengths:

  • Fastest model (~34,000 obs/sec)

  • Perfect for A/B testing changepoint detection

  • Simple, interpretable

Weaknesses:

Only for binary data

Example:

model = BernoulliBeta(
    alpha0=1.0,   # Prior successes
    beta0=1.0     # Prior failures
)

Real-world use cases:

  • Conversion rate changes (user clicked? yes/no)

  • Manufacturing defects (pass/fail)

  • Medical outcomes (recovered? yes/no)

  • Coin fairness testing

Tuning tips:

  • alpha0=beta0=1 is uniform prior (no preference)

  • alpha0=beta0=0.5 is Jeffreys prior (uninformative)

  • alpha0 and beta0 can be thought of as “pseudocounts”

Binomial (Beta Prior)

Statistical Model:

\[\begin{split}x_t | p, N &\sim \text{Binomial}(N, p) \\\\ p &\sim \text{Beta}(\alpha_0, \beta_0)\end{split}\]

When to use:

  • Proportion data (k successes out of N trials)

  • Batch testing (10 out of 100 users converted)

  • N is fixed and known

Strengths:

  • Generalizes Bernoulli (Bernoulli is Binomial with N=1)

  • Fast (~15,000 obs/sec)

  • Natural for proportion changepoints

Example:

model = BinomialBeta(
    alpha0=1.0,   # Prior successes
    beta0=1.0,    # Prior failures
    n_trials=10   # Fixed N per observation
)

Real-world use cases:

  • Batch conversion rates (10 users, 3 converted → x=3)

  • Clinical trials (20 patients, 12 responded → x=12)

  • Quality control (sample 50 items, 2 defective → x=2)

Tuning tips:

  • n_trials must match your data (every x_t is out of N trials)

  • Same prior tuning as Bernoulli

Gamma (Gamma Prior)

Statistical Model:

\[\begin{split}x_t | k, \theta &\sim \text{Gamma}(k, \theta) \\\\ \theta &\sim \text{Gamma}(\alpha_0, \beta_0)\end{split}\]

When to use:

  • Positive continuous data (x > 0)

  • Right-skewed distributions

  • Waiting times, durations, sizes

Strengths:

  • Flexible (can model various shapes)

  • Fast (~24,000 obs/sec)

  • Conjugate prior (efficient updates)

Weaknesses:

Requires choosing fixed shape parameter k

Example:

model = GammaGamma(
    alpha0=1.0,   # Prior shape
    beta0=1.0     # Prior rate
)

Real-world use cases:

  • Customer lifetime value (always positive, skewed)

  • Inter-arrival times (time between events)

  • File sizes, transaction amounts

  • Rainfall amounts (0 for no rain, positive otherwise)

Common Mistakes to Avoid

  1. Using Gaussian for count data

    data = [1, 2, 3, ...] with GaussianNIG Use PoissonGamma for counts

  2. Using Poisson for continuous data

    data = [1.5, 2.3, 3.7, ...] with PoissonGamma Use GaussianNIG or StudentTNG

  3. Ignoring outliers

    Financial data with GaussianNIG (will false alarm on every outlier) Use StudentTNG for robustness

  4. Grid ν without need

    Using nu=[2,3,5,10,20] when fixed nu=3 is fine Grid mode is 6x slower; use only if tail shape is very uncertain

When in Doubt

Start with Student-t (fixed ν=3):

model = StudentTNG(mu0=0, kappa0=1, alpha0=1, beta0=1, nu=3)

It’s: - Robust to outliers - Fast enough for most applications - Works for most continuous data

Then experiment: - If no outliers detected → Try GaussianNIG (faster) - If heavy tails suspected → Try grid mode or lower ν - If data is counts → Switch to PoissonGamma

Next Steps