Performance Tips

The C backend is already highly optimised, but a few practical choices can deliver substantial speedups in real workloads.

Choose the Right Mode

  • Online (``BOCPD.update``) – minimal latency, one observation at a time. Python call overhead dominates when data arrive slowly.

  • Batch (``BOCPD.batch_update``) – processes contiguous NumPy arrays using a single FFI call. Expect 30–50% higher throughput because the C loop runs uninterrupted.

Feed Contiguous NumPy Arrays

When calling batch_update (or passing grid parameters to Student-t), use np.ascontiguousarray to avoid implicit copies. The bindings already do this for internal buffers, but pre-allocating contiguous arrays lets you reuse memory and avoid repeated validation.

Set max_run_length Appropriately

  • Large values increase memory and per-update work (more run lengths to propagate). Only track run lengths you care about.

  • Rule of thumb: max_run_length ≈ 3× expected regime duration.

  • If you need longer history but can tolerate approximation, consider downsampling the input series.

Avoid Expensive Models When Not Needed

  • Student-t grid is ~6× slower than fixed-ν. Reserve it for critical robustness requirements.

  • Binomial-Beta overhead grows with n_trials because of binomial coefficients. If n_trials is large and stable, rescale to a smaller effective sample per time step or approximate with Gaussian.

Leverage Offline Warm-Up

Before entering a strict real-time loop, call batch_update with a historical window. This initialises sufficient statistics so the online phase runs at steady-state speed (avoiding the transient when all run-length probabilities start at zero).

Disabling Strict Validation (Advanced)

Discrete models validate inputs (integers, binary) on the Python side to prevent invalid data from reaching C. When you trust upstream data and benchmarking shows validation overhead matters (~5–10% for small batches), set strict=False on PoissonGamma/BernoulliBeta/BinomialBeta. Be careful: invalid values will then propagate undefined behaviour.

Profiling Tips

  • Use benchmark_fast_bocpd.py under benchmarks/scripts to test new configurations consistently.

  • Enable BOCPD_DEBUG_CHECKS (compile-time flag) only while debugging. It zeroes buffers for safety but reduces throughput.

  • If you need to profile the Python layer, wrap updates inside numpy.errstate / time.perf_counter loops and measure several thousand iterations to minimise timer noise.

Following these guidelines keeps the BOCPD loop fast and predictable, allowing the C implementation to remain the bottleneck rather than Python bookkeeping.