Choosing the Right Observation Model
====================================

Fast-BOCPD packages each conjugate likelihood–prior pair as a Python class
(``fast_bocpd.models``). Selecting the right model ensures changepoint
probabilities reflect the structure of your data. This chapter ties the
statistical assumptions to concrete data characteristics and points out
how our implementation names these concepts.

Quick Decision Path
-------------------

1. **Identify the measurement type**

   * **Binary** (0/1 outcomes) → :class:`fast_bocpd.BernoulliBeta`
   * **Proportion** (successes out of known :math:`N`) →
     :class:`fast_bocpd.BinomialBeta`
   * **Counts** (integers ≥ 0) → :class:`fast_bocpd.PoissonGamma`
   * **Positive continuous** (strictly > 0) → :class:`fast_bocpd.GammaGamma`
   * **General continuous** →
     :class:`fast_bocpd.GaussianNIG` or :class:`fast_bocpd.StudentTNG`

2. **For continuous data, assess tail behaviour**

   * *Well-behaved / normal-like* → Gaussian-NIG
   * *Outliers or fat tails* → Student-t (fixed ``nu``)
   * *Unknown tail heaviness* → Student-t (grid ``nu``)

3. **For high-count Poisson data (λ > ~20)** you may approximate with
   Gaussian-NIG if speed is more critical than an exact count model.

Model Comparison Snapshot
-------------------------

Throughput numbers are taken from :doc:`../benchmarks/results` (100k
observations, online mode) to give a sense of relative cost.

.. list-table::
   :header-rows: 1
   :widths: 18 22 18 18 24

   * - Model
     - Data type
     - Robustness
     - Throughput (obs/s)
     - Typical use
   * - GaussianNIG
     - Continuous
     - Low
     - 25,063
     - Clean sensor data
   * - StudentTNG (fixed ν)
     - Continuous
     - High
     - 21,796
     - Financial returns
   * - StudentTNG (ν grid)
     - Continuous
     - Very high
     - 3,471
     - Unknown tail heaviness
   * - PoissonGamma
     - Counts
     - Medium
     - 21,402
     - Event rates
   * - BernoulliBeta
     - Binary
     - Exact
     - 33,573
     - Success/failure streams
   * - BinomialBeta
     - Proportions (k/N)
     - Exact
     - 14,599
     - Conversion rates
   * - GammaGamma
     - Positive continuous
     - Medium
     - 24,290
     - Durations / amounts

Observation Model Details
-------------------------

Each class mirrors the notation in :doc:`../theory/conjugate_priors` and
maps cleanly to a C implementation under ``fast_bocpd/_c``. Below are the
key considerations and tuning tips per model.

GaussianNIG (``GaussianNIG``)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

*Assumptions:* iid Gaussian within each regime with unknown mean and
variance. Hyperparameters ``mu0``, ``kappa0``, ``alpha0``, ``beta0`` match
the Normal-Inverse-Gamma prior.

*Use when:* data are continuous, roughly bell-shaped, and you value
maximum throughput.

*Tips:*

* Center data around zero and set ``mu0`` accordingly.
* ``kappa0=1`` keeps the prior weak; increasing it enforces a tighter
  belief about the mean.
* Larger ``alpha0`` / ``beta0`` shrink the variance towards a prior guess.

StudentTNG (``StudentTNG``)
~~~~~~~~~~~~~~~~~~~~~~~~~~~

*Assumptions:* Student-t likelihood obtained via a Normal-Gamma prior.
Supports **fixed** ``nu`` or a **grid** of ν values. Our implementation
stores a flag ``is_grid`` so the C layer knows whether to dispatch to
``student_t_ng.c`` or ``student_t_ng_grid.c``.

*Use when:* data contain sporadic outliers or heavy tails.

*Choosing ν:*

* ``nu=1`` behaves like Cauchy (extreme robustness, slower adaptation).
* ``nu=3`` is a good default for financial/operational data.
* Grid mode allows ``nu=[2, 3, 5, 10, 20]`` with optional ``nu_prior`` if
  you want the algorithm to learn the right tail heaviness at runtime.

PoissonGamma (``PoissonGamma``)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

*Assumptions:* Counts per unit time with a Gamma prior on the rate
``lambda``. Parameters ``alpha0``/``beta0`` correspond to prior event
counts and exposure.

*Use when:* you observe integer counts (clicks, failures, arrivals) and
need exact handling of discrete jumps. Offline mode is especially fast for
this model due to vectorised log-factorials in C.

BernoulliBeta (``BernoulliBeta``) & BinomialBeta (``BinomialBeta``)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

*Assumptions:* Binary outcomes or aggregate successes out of ``n_trials``.
These models share identical sufficient statistics (counts of successes
and total trials). Binomial-Beta exposes ``n_trials`` (via
``n_trials`` attribute) and automatically caches ``log_N_factorial`` in the
C layer for numerical stability.

*Use when:* dealing with conversion rates, A/B tests, or thresholded
signals.

GammaGamma (``GammaGamma``)
~~~~~~~~~~~~~~~~~~~~~~~~~~~

*Assumptions:* Positive continuous data with fixed shape parameter ``shape``.
Useful for dwell times, monetary amounts, or any strictly-positive metric.

*Use when:* the distribution is skewed (right tail) but strictly positive.
Because the shape is fixed, choose ``shape=1`` for exponential-like data
or >1 for more symmetric positive data.

Hazard Function Interaction
---------------------------

Observation models specify *how* data behave between changepoints; the
hazard function (``fast_bocpd.hazard.ConstantHazard``) specifies *when*
changepoints occur on average. The two interact via the BOCPD recursion:

* High hazards (small ``lambda_``) expect frequent regime changes. Use
  them when your metrics fluctuate often (e.g., user traffic with daily
  resets).
* Low hazards (large ``lambda_``) assume long, stable runs. Pair them with
  robust models (Student-t) to avoid false positives when rare outliers
  appear.

Practical Recommendations
-------------------------

1. Start with either GaussianNIG or StudentTNG depending on outlier
   expectations. Switch to discrete models only when your data type
   demands it.
2. Keep priors weak (`kappa0 = alpha0 = beta0 = 1`) while prototyping.
   Tighten them only if you have domain knowledge.
3. Use grid Student-t sparingly; reserve it for critical applications
   where robustness matters more than throughput.
4. For proportions with a large denominator (``n_trials >= 50``), consider
   whether a Gaussian approximation is sufficient—Binomial-Beta is exact
   but slower.

For a full statistical comparison and benchmark numbers, refer to
:doc:`../theory/model_comparison` and :doc:`../benchmarks/results`.

**Example:**

.. code-block:: python

   model = PoissonGamma(
       alpha0=1.0,   # Prior shape
       beta0=1.0     # Prior rate
   )

**Real-world use cases:**

- Website clicks per hour
- Server errors per day
- Customer arrivals per minute
- Defects per product batch

**Tuning tips:**

- Prior mean is ``alpha0 / beta0``
- Set this to your expected event rate
- Higher α₀ and β₀ (with same ratio) = stronger prior

Bernoulli (Beta Prior)
~~~~~~~~~~~~~~~~~~~~~~~

**Statistical Model:**

.. math::

   x_t | p &\sim \text{Bernoulli}(p) \\\\
   p &\sim \text{Beta}(\alpha_0, \beta_0)

**When to use:**

- Binary outcomes (success/failure, yes/no)
- Probability estimation (coin flips, conversion)
- Data is 0 or 1

**Strengths:**

- Fastest model (~34,000 obs/sec)
- Perfect for A/B testing changepoint detection
- Simple, interpretable

**Weaknesses:**

Only for binary data

**Example:**

.. code-block:: python

   model = BernoulliBeta(
       alpha0=1.0,   # Prior successes
       beta0=1.0     # Prior failures
   )

**Real-world use cases:**

- Conversion rate changes (user clicked? yes/no)
- Manufacturing defects (pass/fail)
- Medical outcomes (recovered? yes/no)
- Coin fairness testing

**Tuning tips:**

- ``alpha0=beta0=1`` is uniform prior (no preference)
- ``alpha0=beta0=0.5`` is Jeffreys prior (uninformative)
- ``alpha0`` and ``beta0`` can be thought of as "pseudocounts"

Binomial (Beta Prior)
~~~~~~~~~~~~~~~~~~~~~~

**Statistical Model:**

.. math::

   x_t | p, N &\sim \text{Binomial}(N, p) \\\\
   p &\sim \text{Beta}(\alpha_0, \beta_0)

**When to use:**

- Proportion data (k successes out of N trials)
- Batch testing (10 out of 100 users converted)
- N is fixed and known

**Strengths:**

- Generalizes Bernoulli (Bernoulli is Binomial with N=1)
- Fast (~15,000 obs/sec)
- Natural for proportion changepoints

**Example:**

.. code-block:: python

   model = BinomialBeta(
       alpha0=1.0,   # Prior successes
       beta0=1.0,    # Prior failures
       n_trials=10   # Fixed N per observation
   )

**Real-world use cases:**

- Batch conversion rates (10 users, 3 converted → x=3)
- Clinical trials (20 patients, 12 responded → x=12)
- Quality control (sample 50 items, 2 defective → x=2)

**Tuning tips:**

- ``n_trials`` must match your data (every x_t is out of N trials)
- Same prior tuning as Bernoulli

Gamma (Gamma Prior)
~~~~~~~~~~~~~~~~~~~

**Statistical Model:**

.. math::

   x_t | k, \theta &\sim \text{Gamma}(k, \theta) \\\\
   \theta &\sim \text{Gamma}(\alpha_0, \beta_0)

**When to use:**

- Positive continuous data (x > 0)
- Right-skewed distributions
- Waiting times, durations, sizes

**Strengths:**

- Flexible (can model various shapes)
- Fast (~24,000 obs/sec)
- Conjugate prior (efficient updates)

**Weaknesses:**

Requires choosing fixed shape parameter k

**Example:**

.. code-block:: python

   model = GammaGamma(
       alpha0=1.0,   # Prior shape
       beta0=1.0     # Prior rate
   )

**Real-world use cases:**

- Customer lifetime value (always positive, skewed)
- Inter-arrival times (time between events)
- File sizes, transaction amounts
- Rainfall amounts (0 for no rain, positive otherwise)

Common Mistakes to Avoid
-------------------------

1. **Using Gaussian for count data**
   
   ``data = [1, 2, 3, ...]`` with ``GaussianNIG``
   Use ``PoissonGamma`` for counts

2. **Using Poisson for continuous data**

   ``data = [1.5, 2.3, 3.7, ...]`` with ``PoissonGamma``
   Use ``GaussianNIG`` or ``StudentTNG``

3. **Ignoring outliers**

   Financial data with ``GaussianNIG`` (will false alarm on every outlier)
   Use ``StudentTNG`` for robustness

4. **Grid ν without need**

   Using ``nu=[2,3,5,10,20]`` when fixed ``nu=3`` is fine
   Grid mode is 6x slower; use only if tail shape is very uncertain

When in Doubt
-------------

**Start with Student-t (fixed ν=3):**

.. code-block:: python

   model = StudentTNG(mu0=0, kappa0=1, alpha0=1, beta0=1, nu=3)

It's:
- Robust to outliers
- Fast enough for most applications
- Works for most continuous data

Then experiment:
- If no outliers detected → Try ``GaussianNIG`` (faster)
- If heavy tails suspected → Try grid mode or lower ν
- If data is counts → Switch to ``PoissonGamma``

Next Steps
----------

- :doc:`tuning_parameters` - How to set hyperparameters
- :doc:`interpreting_results` - Understanding model outputs
- :doc:`../theory/conjugate_priors` - Mathematical details
- :doc:`../theory/model_comparison` - Statistical comparison