Choosing the Right Observation Model

Fast-BOCPD packages each conjugate likelihood–prior pair as a Python class (fast_bocpd.models). Selecting the right model ensures changepoint probabilities reflect the structure of your data. This chapter ties the statistical assumptions to concrete data characteristics and points out how our implementation names these concepts.

Quick Decision Path

Identify the measurement type
- Binary (0/1 outcomes) → fast_bocpd.BernoulliBeta
- Proportion (successes out of known \(N\)) → fast_bocpd.BinomialBeta
- Counts (integers ≥ 0) → fast_bocpd.PoissonGamma
- Positive continuous (strictly > 0) → fast_bocpd.GammaGamma
- General continuous → fast_bocpd.GaussianNIG or fast_bocpd.StudentTNG
For continuous data, assess tail behaviour
- Well-behaved / normal-like → Gaussian-NIG
- Outliers or fat tails → Student-t (fixed nu)
- Unknown tail heaviness → Student-t (grid nu)
For high-count Poisson data (λ > ~20) you may approximate with Gaussian-NIG if speed is more critical than an exact count model.

Model Comparison Snapshot

Throughput numbers are taken from Benchmark Results (100k observations, online mode) to give a sense of relative cost.

Model	Data type	Robustness	Throughput (obs/s)	Typical use
GaussianNIG	Continuous	Low	25,063	Clean sensor data
StudentTNG (fixed ν)	Continuous	High	21,796	Financial returns
StudentTNG (ν grid)	Continuous	Very high	3,471	Unknown tail heaviness
PoissonGamma	Counts	Medium	21,402	Event rates
BernoulliBeta	Binary	Exact	33,573	Success/failure streams
BinomialBeta	Proportions (k/N)	Exact	14,599	Conversion rates
GammaGamma	Positive continuous	Medium	24,290	Durations / amounts

Observation Model Details

Each class mirrors the notation in Conjugate Priors and maps cleanly to a C implementation under fast_bocpd/_c. Below are the key considerations and tuning tips per model.

GaussianNIG (`GaussianNIG`)

Assumptions: iid Gaussian within each regime with unknown mean and variance. Hyperparameters mu0, kappa0, alpha0, beta0 match the Normal-Inverse-Gamma prior.

Use when: data are continuous, roughly bell-shaped, and you value maximum throughput.

Tips:

Center data around zero and set mu0 accordingly.
kappa0=1 keeps the prior weak; increasing it enforces a tighter belief about the mean.
Larger alpha0 / beta0 shrink the variance towards a prior guess.

StudentTNG (`StudentTNG`)

Assumptions: Student-t likelihood obtained via a Normal-Gamma prior. Supports fixed nu or a grid of ν values. Our implementation stores a flag is_grid so the C layer knows whether to dispatch to student_t_ng.c or student_t_ng_grid.c.

Use when: data contain sporadic outliers or heavy tails.

Choosing ν:

nu=1 behaves like Cauchy (extreme robustness, slower adaptation).
nu=3 is a good default for financial/operational data.
Grid mode allows nu=[2, 3, 5, 10, 20] with optional nu_prior if you want the algorithm to learn the right tail heaviness at runtime.

PoissonGamma (`PoissonGamma`)

Assumptions: Counts per unit time with a Gamma prior on the rate lambda. Parameters alpha0/beta0 correspond to prior event counts and exposure.

Use when: you observe integer counts (clicks, failures, arrivals) and need exact handling of discrete jumps. Offline mode is especially fast for this model due to vectorised log-factorials in C.

BernoulliBeta (`BernoulliBeta`) & BinomialBeta (`BinomialBeta`)

Assumptions: Binary outcomes or aggregate successes out of n_trials. These models share identical sufficient statistics (counts of successes and total trials). Binomial-Beta exposes n_trials (via n_trials attribute) and automatically caches log_N_factorial in the C layer for numerical stability.

Use when: dealing with conversion rates, A/B tests, or thresholded signals.

GammaGamma (`GammaGamma`)

Assumptions: Positive continuous data with fixed shape parameter shape. Useful for dwell times, monetary amounts, or any strictly-positive metric.

Use when: the distribution is skewed (right tail) but strictly positive. Because the shape is fixed, choose shape=1 for exponential-like data or >1 for more symmetric positive data.

Hazard Function Interaction

Observation models specify how data behave between changepoints; the hazard function (fast_bocpd.hazard.ConstantHazard) specifies when changepoints occur on average. The two interact via the BOCPD recursion:

High hazards (small lambda_) expect frequent regime changes. Use them when your metrics fluctuate often (e.g., user traffic with daily resets).
Low hazards (large lambda_) assume long, stable runs. Pair them with robust models (Student-t) to avoid false positives when rare outliers appear.

Practical Recommendations

Start with either GaussianNIG or StudentTNG depending on outlier expectations. Switch to discrete models only when your data type demands it.
Keep priors weak (kappa0 = alpha0 = beta0 = 1) while prototyping. Tighten them only if you have domain knowledge.
Use grid Student-t sparingly; reserve it for critical applications where robustness matters more than throughput.
For proportions with a large denominator (n_trials >= 50), consider whether a Gaussian approximation is sufficient—Binomial-Beta is exact but slower.

For a full statistical comparison and benchmark numbers, refer to Model Comparison and Benchmark Results.

Example:

model = PoissonGamma(
    alpha0=1.0,   # Prior shape
    beta0=1.0     # Prior rate
)

Real-world use cases:

Website clicks per hour
Server errors per day
Customer arrivals per minute
Defects per product batch

Tuning tips:

Prior mean is alpha0 / beta0
Set this to your expected event rate
Higher α₀ and β₀ (with same ratio) = stronger prior

Bernoulli (Beta Prior)

Statistical Model:

\[\begin{split}x_t | p &\sim \text{Bernoulli}(p) \\\\ p &\sim \text{Beta}(\alpha_0, \beta_0)\end{split}\]

When to use:

Binary outcomes (success/failure, yes/no)
Probability estimation (coin flips, conversion)
Data is 0 or 1

Strengths:

Fastest model (~34,000 obs/sec)
Perfect for A/B testing changepoint detection
Simple, interpretable

Weaknesses:

Only for binary data

Example:

model = BernoulliBeta(
    alpha0=1.0,   # Prior successes
    beta0=1.0     # Prior failures
)

Real-world use cases:

Conversion rate changes (user clicked? yes/no)
Manufacturing defects (pass/fail)
Medical outcomes (recovered? yes/no)
Coin fairness testing

Tuning tips:

alpha0=beta0=1 is uniform prior (no preference)
alpha0=beta0=0.5 is Jeffreys prior (uninformative)
alpha0 and beta0 can be thought of as “pseudocounts”

Binomial (Beta Prior)

Statistical Model:

\[\begin{split}x_t | p, N &\sim \text{Binomial}(N, p) \\\\ p &\sim \text{Beta}(\alpha_0, \beta_0)\end{split}\]

When to use:

Proportion data (k successes out of N trials)
Batch testing (10 out of 100 users converted)
N is fixed and known

Strengths:

Generalizes Bernoulli (Bernoulli is Binomial with N=1)
Fast (~15,000 obs/sec)
Natural for proportion changepoints

Example:

model = BinomialBeta(
    alpha0=1.0,   # Prior successes
    beta0=1.0,    # Prior failures
    n_trials=10   # Fixed N per observation
)

Real-world use cases:

Batch conversion rates (10 users, 3 converted → x=3)
Clinical trials (20 patients, 12 responded → x=12)
Quality control (sample 50 items, 2 defective → x=2)

Tuning tips:

n_trials must match your data (every x_t is out of N trials)
Same prior tuning as Bernoulli

Gamma (Gamma Prior)

Statistical Model:

\[\begin{split}x_t | k, \theta &\sim \text{Gamma}(k, \theta) \\\\ \theta &\sim \text{Gamma}(\alpha_0, \beta_0)\end{split}\]

When to use:

Positive continuous data (x > 0)
Right-skewed distributions
Waiting times, durations, sizes

Strengths:

Flexible (can model various shapes)
Fast (~24,000 obs/sec)
Conjugate prior (efficient updates)

Weaknesses:

Requires choosing fixed shape parameter k

Example:

model = GammaGamma(
    alpha0=1.0,   # Prior shape
    beta0=1.0     # Prior rate
)

Real-world use cases:

Customer lifetime value (always positive, skewed)
Inter-arrival times (time between events)
File sizes, transaction amounts
Rainfall amounts (0 for no rain, positive otherwise)

Common Mistakes to Avoid

Using Gaussian for count data

data = [1, 2, 3, ...] with GaussianNIG Use PoissonGamma for counts
Using Poisson for continuous data

data = [1.5, 2.3, 3.7, ...] with PoissonGamma Use GaussianNIG or StudentTNG
Ignoring outliers

Financial data with GaussianNIG (will false alarm on every outlier) Use StudentTNG for robustness
Grid ν without need

Using nu=[2,3,5,10,20] when fixed nu=3 is fine Grid mode is 6x slower; use only if tail shape is very uncertain

When in Doubt

Start with Student-t (fixed ν=3):

model = StudentTNG(mu0=0, kappa0=1, alpha0=1, beta0=1, nu=3)

It’s: - Robust to outliers - Fast enough for most applications - Works for most continuous data

Then experiment: - If no outliers detected → Try GaussianNIG (faster) - If heavy tails suspected → Try grid mode or lower ν - If data is counts → Switch to PoissonGamma

Next Steps

Tuning Parameters - How to set hyperparameters
Interpreting Results - Understanding model outputs
Conjugate Priors - Mathematical details
Model Comparison - Statistical comparison

Choosing the Right Observation Model

Quick Decision Path

Model Comparison Snapshot

Observation Model Details

GaussianNIG (GaussianNIG)

StudentTNG (StudentTNG)

PoissonGamma (PoissonGamma)

BernoulliBeta (BernoulliBeta) & BinomialBeta (BinomialBeta)

GammaGamma (GammaGamma)

Hazard Function Interaction

Practical Recommendations

Bernoulli (Beta Prior)

Binomial (Beta Prior)

Gamma (Gamma Prior)

Common Mistakes to Avoid

When in Doubt

Next Steps

GaussianNIG (`GaussianNIG`)

StudentTNG (`StudentTNG`)

PoissonGamma (`PoissonGamma`)

BernoulliBeta (`BernoulliBeta`) & BinomialBeta (`BinomialBeta`)

GammaGamma (`GammaGamma`)