Skip to content

Parallelization of loo using mirai and mori#378

Draft
florence-bockting wants to merge 6 commits into
loo-v3.0.0from
parallelization
Draft

Parallelization of loo using mirai and mori#378
florence-bockting wants to merge 6 commits into
loo-v3.0.0from
parallelization

Conversation

@florence-bockting

Copy link
Copy Markdown
Contributor

Summary

Fixes #308

  • Replaces parallel::mclapply() / parLapply() with mirai + mori for per-observation parallelism (cross-platform, including Windows).
  • Adds three parallelism modes:
    • per-call cores,
    • persistent session pool (loo.daemons / LOO_DAEMONS), and
    • user-managed mirai::daemons() (remote/SSH/HPC).
  • Parallel output matches serial; only scheduling changes.

What changed

Core (R/parallel.R): with_loo_daemons(), loo_map(), loo_pool_is_local(), loo_persist_config().

Parallelized functions: loo() (function method), psis()/sis()/tis(), relative_eff(), loo_subsample(), loo_moment_match(), loo_model_weights().

Pool precedence: connected pool (user or persistent) always wins → cores is ignored. Local pools use mori zero-copy for broadcast objects (e.g. draws); remote pools serialize.

Also: mirai + mori in DESCRIPTION; vignettes/loo2-parallel.Rmd; tests/testthat/test_parallel.R; benchmark/ scripts + bench-comparison.md.

Review guide

  1. vignettes/loo2-parallel.Rmd shows the user-facing model
  2. R/parallel.R includes the pool lifecycle + loo_map() transport
  3. See as example loo.function: R/loo.Rwith_loo_daemons()loo_map(broadcast = list(draws = ...))
  4. tests/testthat/test_parallel.R include serial/parallel equivalence, pool precedence
  5. benchmark/README.md includes first attempt of a small baseline vs new comparison (see first results in benchmark/bench-comparison.md)

Initial benchmarks (Linux, one machine): loo.function + large draws benefits most (~4× with persistent pool); matrix psis() does not (communication-bound); per-call pool pays ~1s spawn/teardown per call.

Follow-up work

  • Reviewing: Please have a look at the current implementation and check it for correctness and usability. Any comments and improvements are welcome.
  • Benchmarking: master vs this branch across problem sizes, all (or selected number of) parallelized functions, OSes (Linux/macOS/Windows), and metrics (wall-clock, allocation, peak RSS). You can for example extend benchmark/.
  • LSAT case study: posteriorDB lsat-data; showcase speedup and all three parallelism modes (one-off, persistent pool, simulation loop).
  • Remote SSH: two-machine test; verify correctness, measure speedup, document setup (mirai::daemons(url = ..., remote = ssh_config(...))).
  • Documentation: expand vignettes/loo2-parallel.Rmd with assumptions, when-to-use guidance, function-specific notes, memory model.

Current limitations of implementation

  • Matrix psis() rarely speeds up (large data shipped per worker).
  • Per-call cores > 1 can be slower than serial on small problems without loo.daemons.
  • Remote SSH untested in CI

@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown

This is how benchmark results would change (along with a 95% confidence interval in relative change) if 2bee14f is merged into master:

  • ❗🐌loo_function: 1.98s -> 2.09s [+4.63%, +6.14%]
  • 🚀loo_matrix: 1.9s -> 1.87s [-2.08%, -0.46%]
    Further explanation regarding interpretation and methodology can be found in the documentation.

@florence-bockting florence-bockting mentioned this pull request Jul 1, 2026
6 tasks
@florence-bockting florence-bockting changed the base branch from master to loo-v3.0.0 July 1, 2026 09:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Parallelisation of loo using Mirai

1 participant