Performance Tips

The C backend is already highly optimised, but a few practical choices can deliver substantial speedups in real workloads.

Choose the Right Mode

Online (``BOCPD.update``) – minimal latency, one observation at a time. Python call overhead dominates when data arrive slowly.
Batch (``BOCPD.batch_update``) – processes contiguous NumPy arrays using a single FFI call. Expect 30–50% higher throughput because the C loop runs uninterrupted.

Feed Contiguous NumPy Arrays

When calling batch_update (or passing grid parameters to Student-t), use np.ascontiguousarray to avoid implicit copies. The bindings already do this for internal buffers, but pre-allocating contiguous arrays lets you reuse memory and avoid repeated validation.

Set `max_run_length` Appropriately

Large values increase memory and per-update work (more run lengths to propagate). Only track run lengths you care about.
Rule of thumb: max_run_length ≈ 3× expected regime duration.
If you need longer history but can tolerate approximation, consider downsampling the input series.

Avoid Expensive Models When Not Needed

Student-t grid is ~6× slower than fixed-ν. Reserve it for critical robustness requirements.
Binomial-Beta overhead grows with n_trials because of binomial coefficients. If n_trials is large and stable, rescale to a smaller effective sample per time step or approximate with Gaussian.

Leverage Offline Warm-Up

Before entering a strict real-time loop, call batch_update with a historical window. This initialises sufficient statistics so the online phase runs at steady-state speed (avoiding the transient when all run-length probabilities start at zero).

Disabling Strict Validation (Advanced)

Discrete models validate inputs (integers, binary) on the Python side to prevent invalid data from reaching C. When you trust upstream data and benchmarking shows validation overhead matters (~5–10% for small batches), set strict=False on PoissonGamma/BernoulliBeta/BinomialBeta. Be careful: invalid values will then propagate undefined behaviour.

Profiling Tips

Use benchmark_fast_bocpd.py under benchmarks/scripts to test new configurations consistently.
Enable BOCPD_DEBUG_CHECKS (compile-time flag) only while debugging. It zeroes buffers for safety but reduces throughput.
If you need to profile the Python layer, wrap updates inside numpy.errstate / time.perf_counter loops and measure several thousand iterations to minimise timer noise.

Following these guidelines keeps the BOCPD loop fast and predictable, allowing the C implementation to remain the bottleneck rather than Python bookkeeping.