Performance Tips
The C backend is already highly optimised, but a few practical choices can deliver substantial speedups in real workloads.
Choose the Right Mode
Online (``BOCPD.update``) – minimal latency, one observation at a time. Python call overhead dominates when data arrive slowly.
Batch (``BOCPD.batch_update``) – processes contiguous NumPy arrays using a single FFI call. Expect 30–50% higher throughput because the C loop runs uninterrupted.
Feed Contiguous NumPy Arrays
When calling batch_update (or passing grid parameters to Student-t),
use np.ascontiguousarray to avoid implicit copies. The bindings
already do this for internal buffers, but pre-allocating contiguous arrays
lets you reuse memory and avoid repeated validation.
Set max_run_length Appropriately
Large values increase memory and per-update work (more run lengths to propagate). Only track run lengths you care about.
Rule of thumb:
max_run_length≈ 3× expected regime duration.If you need longer history but can tolerate approximation, consider downsampling the input series.
Avoid Expensive Models When Not Needed
Student-t grid is ~6× slower than fixed-ν. Reserve it for critical robustness requirements.
Binomial-Beta overhead grows with
n_trialsbecause of binomial coefficients. Ifn_trialsis large and stable, rescale to a smaller effective sample per time step or approximate with Gaussian.
Leverage Offline Warm-Up
Before entering a strict real-time loop, call batch_update with a
historical window. This initialises sufficient statistics so the online
phase runs at steady-state speed (avoiding the transient when all
run-length probabilities start at zero).
Disabling Strict Validation (Advanced)
Discrete models validate inputs (integers, binary) on the Python side to
prevent invalid data from reaching C. When you trust upstream data and
benchmarking shows validation overhead matters (~5–10% for small batches),
set strict=False on PoissonGamma/BernoulliBeta/BinomialBeta.
Be careful: invalid values will then propagate undefined behaviour.
Profiling Tips
Use
benchmark_fast_bocpd.pyunderbenchmarks/scriptsto test new configurations consistently.Enable
BOCPD_DEBUG_CHECKS(compile-time flag) only while debugging. It zeroes buffers for safety but reduces throughput.If you need to profile the Python layer, wrap updates inside
numpy.errstate/time.perf_counterloops and measure several thousand iterations to minimise timer noise.
Following these guidelines keeps the BOCPD loop fast and predictable, allowing the C implementation to remain the bottleneck rather than Python bookkeeping.