Skip to content

fix(manifest): stream Coana output and surface the real failure reason#1353

Merged
Martin Torp (mtorp) merged 3 commits into
v1.xfrom
martin/socket-cli-strange-bug
Jun 5, 2026
Merged

fix(manifest): stream Coana output and surface the real failure reason#1353
Martin Torp (mtorp) merged 3 commits into
v1.xfrom
martin/socket-cli-strange-bug

Conversation

@mtorp
Copy link
Copy Markdown
Contributor

@mtorp Martin Torp (mtorp) commented Jun 4, 2026

Problem

socket manifest gradle . (and kotlin/scala) can fail in CI with:

✖ Coana command failed (exit code 1): command failed

…with no Coana/gradle output and no indication of why it failed — and, as it turns out, no guarantee that "Coana" is even what failed.

Since #1352 (CLI 1.1.114), socket manifest {gradle,kotlin,scala} (the latter including sbt-based projects) delegate Socket facts generation to the Coana CLI via spawnCoanaDlx, passing { stdio: 'inherit' } so the build-tool and Coana output streams to the terminal.

Bug 1 — the requested stdio was dropped

On the dlx path (the default in CI) that stdio was silently dropped:

  • runCoanaManifestFacts passes { stdio: 'inherit' } as the 4th arg (spawnExtra).
  • shadowNpmBase configures the child's stdio from its options arg, not the registry-spawn extra arg (which only attaches metadata to the result — confirmed in @socketsecurity/registry).
  • So Coana ran with the default piped stdio (['pipe','pipe','pipe','ipc']), its output never reached the terminal, and a real failure collapsed to Coana command failed (exit code 1): command failed.

It's a "works on my machine" bug: the SOCKET_CLI_COANA_LOCAL_PATH and npm-install fallback branches go through spawnCoanaScriptViaNode, which does read spawnExtra.stdio, so local testing showed full output. Only the real dlx path hid it.

Bug 2 — the error blamed Coana, and the launcher was muzzled

On the dlx path the spawned process is the package-manager launcher (npx / pnpm dlx / yarn dlx), which downloads @coana-tech/cli and only then runs it. So a failure can be the launcher dying before Coana ever started (registry/network error, 404/auth, ENOENT). But buildDlxErrorResult unconditionally said Coana command failed, and spawnCoanaDlx forced --silent (npm loglevel silent), which hid npm's own download/launch errors. The result: a bare exit code, no cause, and a message that wrongly blamed Coana for something that may have failed before Coana ran.

The underlying exit-1 trigger is project-specific — most commonly an unresolved Gradle dependency, which facts generation treats as fatal (the legacy --pom default did not). Reproduced locally: coana manifest gradle on a project with an unresolvable dep exits 1, writes no facts file, and prints a --ignore-unresolved / --exclude-configs hint — all of which were being swallowed.

Fix

  • stdio is honored on the dlx path. spawnCoanaDlx now promotes the requested stdio (from spawnExtra, falling back to options) into the dlx launcher options, aligning the dlx path with the local-path and npm-install branches. No behavior change for callers that pass stdio: 'pipe' via spawnExtra (e.g. socket fix) — the default was already pipe.
  • Launcher is no longer muzzled. Dropped the forced --silent on the Coana launcher; shadowNpmBase still defaults to --loglevel error, so real download/launch errors surface while success stays quiet.
  • Honest failure messages. buildDlxErrorResult only claims a launch failure when it is certain — a spawn-level error (a string code like ENOENT, where the launcher binary could not start, so Coana provably never ran) → Failed to launch Coana via the package manager … the launcher could not start. A non-zero exit or signal is genuinely ambiguous (Coana may have started and died, e.g. OOM, or the launcher may have failed to download), so it gets neutral wording → Coana failed to run via the package manager (exit code N). Only a captured Coana banner yields Coana command failed. It also falls back to captured stdout when stderr is empty, since Coana logs some failures to stdout.

Related: the npx-launcher fallback (#1327)

#1327 added an npm install + node fallback for when the dlx launcher fails before Coana starts. That fallback is shared by this path; shouldFallbackOnDlxError fires it for spawn errors (ENOENT), signals, or exit codes ≥ 128not for a small-integer exit. A broken npx that exits 1 (common for npm download failures) is therefore not auto-recovered today.

So this could be the same broken-npx problem as #1327, just a manifestation the gate misses. This PR makes that case diagnosable (the npm error is now visible and the message no longer blames Coana), but does not change the fallback gating.

Review note (Cursor Bugbot)

Bugbot flagged that with stdio: 'inherit' the spawn rejection carries no captured output, so the Coana CLI version banner check is always false and a Coana process that started then died by signal/exit ≥ 128 (e.g. OOM, exit 137) could be reported as a launcher failure. Confirmed empirically that coana manifest gradle writes everything to stdout and never prints that banner, so banner-based detection never worked for the inherit paths regardless. The fix above addresses the reported harm by only asserting a launch failure for a definitive spawn-level error and staying neutral for ambiguous signal/exit codes (and dropping the matching "before Coana started" wording from the fallback warning).

The fallback retry on signal/exit ≥ 128 is unchanged, pre-existing #1327 behavior. Reliably suppressing a retry when Coana actually ran would require capturing the launcher output to disambiguate, which conflicts with live streaming (and the reachability spinner) — a possible follow-up, not done here.

Tests

  • dlx.test.mts: spawnExtra.stdio / options.stdio are forwarded into the launcher options; --silent is no longer passed; definitive (ENOENT) vs ambiguous (signal / ≥ 128) messages are asserted; buildDlxErrorResult surfaces stdout when stderr is empty.
  • pnpm check:tsc, eslint, and the manifest + reachability + fix suites pass.

Workaround (pre-fix)

Re-run with --ignore-unresolved / --exclude-configs=…, or --pom to restore the legacy pom.xml output.

@mtorp Martin Torp (mtorp) changed the base branch from main to v1.x June 5, 2026 05:36
`socket manifest {gradle,kotlin,scala}` delegate Socket facts generation to
the Coana CLI via spawnCoanaDlx, passing `{ stdio: 'inherit' }` so the
build-tool and Coana output streams to the user. On the dlx path that stdio
was silently dropped: shadowNpmBase configures the child's stdio from its
`options` arg, not the registry-spawn `extra` arg, so Coana ran piped and its
output — including the actual failure reason — never reached the user. A
generation failure then collapsed to an unhelpful
"Coana command failed (exit code 1): command failed" with no detail.

- spawnCoanaDlx now promotes the requested stdio (from spawnExtra, falling
  back to options) into the dlx launcher options, aligning the dlx path with
  the local-path and npm-install branches that already honor spawnExtra.stdio.
- buildDlxErrorResult falls back to captured stdout when stderr is empty,
  since Coana logs some failures (e.g. unresolved dependencies) to stdout.

Add regression tests and bump the CLI to 1.1.115.
@mtorp Martin Torp (mtorp) force-pushed the martin/socket-cli-strange-bug branch from 57f06d6 to 0a6162e Compare June 5, 2026 05:39
…he launcher

On the dlx path the spawned process is the package-manager launcher
(npx / pnpm dlx / yarn dlx), which downloads @coana-tech/cli and only then runs
it. A failure there may be the launcher dying before Coana ever started (e.g.
the package failed to download), yet buildDlxErrorResult always claimed "Coana
command failed", and `silent: true` (npm loglevel silent) hid npm's own
download/registry errors — so the user got a bare exit code with no cause and no
hint that Coana itself may never have run.

- buildDlxErrorResult now distinguishes three cases: Coana booted (banner seen)
  -> "Coana command failed"; launcher died before Coana started (spawn error /
  signal / exit >= 128) -> "Failed to launch Coana via the package manager ...";
  ambiguous small-int exit -> neutral "Coana failed to run via the package
  manager" (no false blame on Coana).
- Drop the forced `--silent` on the Coana launcher; shadowNpmBase still defaults
  to `--loglevel error`, so real launcher/download errors surface while success
  stays quiet.
- Factor the launcher-vs-Coana heuristic into dlxLauncherFailedBeforeCoana and
  coanaBannerSeen, shared by shouldFallbackOnDlxError and the message builder.

Known gap: the npm-install + node fallback (#1327) still fires only for spawn
errors / signals / exit >= 128, not for a small-int exit. A broken npx that
exits 1 (common for download failures) is therefore not yet auto-recovered.

Add regression tests for the new wording and the un-muzzled launcher.
@mtorp Martin Torp (mtorp) marked this pull request as ready for review June 5, 2026 06:19
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes using default effort and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix is ON. A cloud agent has been kicked off to fix the reported issue.

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit d3d3b8e. Configure here.

Comment thread src/utils/dlx.mts
Bugbot flagged that with stdio:'inherit' the spawn rejection carries no
captured output, so coanaBannerSeen is always false and a Coana process that
started then died by signal / exit >= 128 (e.g. OOM, exit 137) was wrongly
reported as "the launcher exited before Coana started".

Empirically, `coana manifest gradle` writes all output to stdout and never
prints the "Coana CLI version" banner, so banner-based detection never worked
for the manifest / reachability (inherit) paths anyway. Only a spawn-level
error (a string `code` like ENOENT) definitively proves the launcher never
started.

- buildDlxErrorResult now claims a launch failure ONLY for a spawn-level error;
  signals and non-zero exits get neutral wording ("Coana failed to run via the
  package manager (exit code N)") since we cannot tell launcher-vs-Coana apart
  without captured output.
- Soften the fallback warning to drop the "before Coana started" claim (it
  fires on the ambiguous >= 128 / signal cases too).

The npm-install + node fallback gating is unchanged (pre-existing #1327
behavior): it still retries on spawn errors / signals / exit >= 128. Fully
suppressing a retry when Coana actually ran would require capturing the
launcher output, which conflicts with live streaming (and the reachability
spinner); left as a possible follow-up.

Update tests for the definitive (ENOENT) vs ambiguous (signal / >= 128) split.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Independently verified the diagnosis against the code — it's correct, and the fix targets the real root cause. Notes for the record:

Bug 1 confirmed at the exact mechanism. shadowNpmBase derives the child stdio from its options arg, not the registry-spawn extra arg — src/shadow/npm-base.mts:94: const stdio = ensureIpcInStdio(getOwn(spawnOpts, 'stdio')), while extra (line 137) only rides along on the result. spawnCoanaDlx passed { stdio: 'inherit' } as the 4th (spawnExtra) arg, so on the dlx path it was dropped and Coana ran piped. The local-path/npm-install branches go through spawnCoanaScriptViaNode (dlx.mts:240, spawnExtra?.['stdio'] || 'inherit'), which is why it only reproduced in CI and not in local SOCKET_CLI_COANA_LOCAL_PATH testing. Solid "works on my machine" call.

It's a latent bug exposed by #1352, not introduced by it. Confirmed the failing path is live on origin/v1.x at 1.1.114: coana-manifest-facts.mts passes stdio:'inherit', and dlx.mts still forced silent:true + trailing spawnExtra. Making facts-via-Coana the default (and unresolved deps fatal) is what newly drove CI through this path; the legacy --pom default didn't.

Banner nuance (non-blocking follow-up). The coanaBannerSeen docstring already notes it can't fire under inherited stdio, and the message builder correctly degrades to the neutral "failed to run via the package manager" instead of false-blaming Coana — good. The one residual: in shouldFallbackOnDlxError, an inherited-stdio failure with a signal / exit ≥ 128 from a Coana that did boot (e.g. OOM 137) now falls into dlxLauncherFailedBeforeCoana and triggers a wasted (though harmless, output-visible) npm-install retry, because the banner short-circuit is blind under inherit. Gating that branch to ENOENT-only under inherit would fix it but regress the #1327 launcher recovery for the manifest CI path, so that's not the move.

Suggested fix (future PR): make boot-detection independent of captured stdio via a tiny boot marker. socket-cli passes SOCKET_CLI_COANA_BOOT_MARKER=<tmpfile> in the Coana env; Coana writeFileSyncs it as the first thing in startup; then coanaBooted = existsSync(marker) replaces/augments coanaBannerSeen and works under inherit and pipe. That upgrades both the message ("Coana command failed" vs launcher) and the fallback decision (don't re-run a booted Coana) to be certain regardless of stdio mode. A real stdout/stderr tee would also work but needs surgery in the shadow-spawn layer; the marker is cheaper and we own both sides.

None of that blocks this PR — it's a clean, correct, well-tested fix for the reported issue.

@mtorp Martin Torp (mtorp) merged commit 7d481f7 into v1.x Jun 5, 2026
12 checks passed
@mtorp Martin Torp (mtorp) deleted the martin/socket-cli-strange-bug branch June 5, 2026 06:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants