Skip to content

fix(test-cluster): manage containers via testcontainers' Docker client so start/stop share one runtime#624

Merged
leekeiabstraction merged 1 commit into
apache:mainfrom
beryllw:fix-fluss-cluster-stop
Jun 17, 2026
Merged

fix(test-cluster): manage containers via testcontainers' Docker client so start/stop share one runtime#624
leekeiabstraction merged 1 commit into
apache:mainfrom
beryllw:fix-fluss-cluster-stop

Conversation

@beryllw

@beryllw beryllw commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Purpose

Linked issue: close #623

Brief change log

  • stop_cluster, FlussTestingCluster::stop, and the start pre-clean now force-remove via the shared client. A 404 (already gone) is treated as success; other errors warn instead of being silently swallowed.
  • all_containers_exist lists containers through the client with an anchored name filter (now async).
  • New run_blocking helper drives the async teardown from the synchronous stop() callers, working both inside a tokio runtime (async integration tests) and from the atexit cleanup handler.
  • Imports the Docker types via the testcontainers::bollard re-export, so the crate takes no direct bollard dependency (Cargo.toml/Cargo.lock unchanged on balance).

Tests

Verified against a live Podman cluster:

start            -> zookeeper / coordinator / tablet containers Up
start (again)    -> idempotent, no recreation, no name conflict
stop             -> all three containers removed

cargo build / cargo clippy clean for the crate (the 3 pre-existing main.rs readiness-poll lints are untouched and out of scope).

API and Format

Documentation

@leekeiabstraction leekeiabstraction left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the improvements. Left some small comments.

Comment on lines +225 to +238
for name in self.container_names() {
// Anchored exact-name match; `all(false)` = running only, so a stopped
// leftover counts as absent and gets recreated.
let mut filters = HashMap::new();
filters.insert("name".to_string(), vec![format!("^{name}$")]);
let options = ListContainersOptionsBuilder::default()
.all(false)
.filters(&filters)
.build();
match docker.list_containers(Some(options)).await {
Ok(list) if !list.is_empty() => continue,
_ => return false,
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Seems like we're making a list_containers per-container. Anyway to make this more efficient?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late reply, +1. I've already addressed it in a follow-up PR (#627).

Comment on lines +576 to +596
let mut filters = HashMap::new();
filters.insert(
"name".to_string(),
vec![
format!("zookeeper-{name}"),
format!("coordinator-server-{name}"),
format!("tablet-server-{name}-"),
],
);
let options = ListContainersOptionsBuilder::default()
.all(true)
.filters(&filters)
.build();

let containers = match docker.list_containers(Some(options)).await {
Ok(containers) => containers,
Err(e) => {
eprintln!("warning: failed to list cluster containers: {e}");
return;
}
};

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This looks similar to parts of all_containers_exist, can these be refactored?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left it inline for now since the error handling differs slightly.

@charlesdong1991 charlesdong1991 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great improvement! 👍

let mut filters = HashMap::new();
filters.insert(
"name".to_string(),
vec![

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a concern, nor it will cause any issues here

but just add a note: when using prefix here, if two different clusters share the prefix of name (e.g. one cluster has name of test and there other has test2 or something), i think when stopping, it can accidentally stop both of them although purpose might be only stopping containers of test cluster

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for review, good catch — fixed in a follow-up PR (#627). The prefix is only used for narrowing, and removal is guarded by an exact local name check, so test and test2 won't interfere.

@leekeiabstraction leekeiabstraction left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving to get this improvement merged. The earlier comments from me were nits.

TY for your contribution!

@leekeiabstraction leekeiabstraction merged commit b98ab9c into apache:main Jun 17, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fluss-test-cluster stop reports success but leaves containers running

3 participants