Performance Tips
================

The C backend is already highly optimised, but a few practical choices can
deliver substantial speedups in real workloads.

Choose the Right Mode
---------------------

* **Online (``BOCPD.update``)** – minimal latency, one observation at a
  time. Python call overhead dominates when data arrive slowly.
* **Batch (``BOCPD.batch_update``)** – processes contiguous NumPy arrays
  using a single FFI call. Expect 30–50% higher throughput because the C
  loop runs uninterrupted.

Feed Contiguous NumPy Arrays
----------------------------

When calling ``batch_update`` (or passing grid parameters to Student-t),
use ``np.ascontiguousarray`` to avoid implicit copies. The bindings
already do this for internal buffers, but pre-allocating contiguous arrays
lets you reuse memory and avoid repeated validation.

Set ``max_run_length`` Appropriately
------------------------------------

* Large values increase memory and per-update work (more run lengths to
  propagate). Only track run lengths you care about.
* Rule of thumb: ``max_run_length`` ≈ 3× expected regime duration.
* If you need longer history but can tolerate approximation, consider
  downsampling the input series.

Avoid Expensive Models When Not Needed
--------------------------------------

* Student-t grid is ~6× slower than fixed-ν. Reserve it for critical
  robustness requirements.
* Binomial-Beta overhead grows with ``n_trials`` because of binomial
  coefficients. If ``n_trials`` is large and stable, rescale to a smaller
  effective sample per time step or approximate with Gaussian.

Leverage Offline Warm-Up
------------------------

Before entering a strict real-time loop, call ``batch_update`` with a
historical window. This initialises sufficient statistics so the online
phase runs at steady-state speed (avoiding the transient when all
run-length probabilities start at zero).

Disabling Strict Validation (Advanced)
--------------------------------------

Discrete models validate inputs (integers, binary) on the Python side to
prevent invalid data from reaching C. When you trust upstream data and
benchmarking shows validation overhead matters (~5–10% for small batches),
set ``strict=False`` on ``PoissonGamma``/``BernoulliBeta``/``BinomialBeta``.
Be careful: invalid values will then propagate undefined behaviour.

Profiling Tips
--------------

* Use ``benchmark_fast_bocpd.py`` under ``benchmarks/scripts`` to test new
  configurations consistently.
* Enable ``BOCPD_DEBUG_CHECKS`` (compile-time flag) only while debugging.
  It zeroes buffers for safety but reduces throughput.
* If you need to profile the Python layer, wrap updates inside
  ``numpy.errstate`` / ``time.perf_counter`` loops and measure several
  thousand iterations to minimise timer noise.

Following these guidelines keeps the BOCPD loop fast and predictable,
allowing the C implementation to remain the bottleneck rather than Python
bookkeeping.