Support Sync and Async on non-blocking fibers via Fiber.blocking.#460
Open
samuel-williams-shopify wants to merge 4 commits into
Open
Support Sync and Async on non-blocking fibers via Fiber.blocking.#460samuel-williams-shopify wants to merge 4 commits into
samuel-williams-shopify wants to merge 4 commits into
Conversation
Previously, calling Sync or Async from a non-blocking fiber with no scheduler (e.g. inside an Enumerator or a bare Fiber.new) raised 'RuntimeError: Running scheduler on non-blocking fiber!', because the reactor's event loop must run on a blocking fiber. The reactor is now run within Fiber.blocking, so the scheduler always runs on a blocking fiber regardless of the calling fiber. In addition, when a task finishes it now explicitly transfers control back to the scheduler via Fiber.scheduler&.transfer. This ensures the event loop regains control even when it is running on a fiber that is not the thread's root fiber (e.g. a fiber owned by another transfer-based scheduler), instead of relying on fiber termination returning control to the expected fiber. The transfer is placed after the task body so that a critical exception propagating out of the task is not swallowed. Assisted-By: devx/5f4cfab6-1d74-4100-8755-82d7c62c70ab
The unconditional Fiber.scheduler&.transfer at task completion left task fibers parked (alive but finished) instead of terminating, which breaks Fiber#alive? assumptions in io-event (timers/waiters transfer into finished fibers) and caused downstream servers to hang. The Fiber.blocking change alone fixes the reported cases (Enumerator, bare Fiber.new) since those enter the reactor fiber via resume, so task termination returns control correctly. The transfer-based-scheduler case requires the Ruby-level fix in https://bugs.ruby-lang.org/issues/20081 and is not addressed here. Also fix RuboCop block delimiter spacing in tests. Assisted-By: devx/5f4cfab6-1d74-4100-8755-82d7c62c70ab
- Do not bump version.rb in the PR. - Move the release note under ## Unreleased. - Trim the non-blocking fiber tests to a focused set. Assisted-By: devx/5f4cfab6-1d74-4100-8755-82d7c62c70ab
It exercised a workaround the fix does not use, and was a slow (2s) deadlock-via-timeout test. Assisted-By: devx/5f4cfab6-1d74-4100-8755-82d7c62c70ab
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
SyncandAsynccould not be invoked from a non-blocking fiber that has no scheduler — e.g. inside anEnumeratorblock or a bareFiber.new { ... }.resume. Doing so raised:Minimal reproduction:
This shows up whenever
Sync/Asyncis reached beneath a bare non-blocking fiber (job runners,Enumerator, other libraries), and previously required an awkward caller-side workaround such asFiber.new(blocking: true) { Sync { ... } }.resume.Change
Run the reactor within
Fiber.blocking(kernel/sync.rb,kernel/async.rb). When there is no current task and no scheduler, the reactor is started insideFiber.blocking { ... }, so the event loop always runs on a blocking fiber — on the current fiber, preserving fiber identity and fiber-locals (no extra fiber is spawned). Task fibers still terminate normally, so control returns to the reactor loop via the resume chain as before.Tests
test/kernel/sync.rb— a newwith "a non-blocking fiber"context covering: running the scheduler on a non-blocking fiber, return values, asynchronous work with blocking operations (barrier + semaphore), exception propagation, fiber-storage preservation, nestedSync, use within anEnumerator(value yielded afterSyncreturns), and the clean-failure case when trying to yield to the enumerator from inside the block.test/kernel/async.rb— runningAsyncon a non-blocking fiber, running work to completion, and passing options through.Notes
This handles the common case, where the reactor fiber is entered via
resume(which coversEnumerator, bareFiber.new { ... }.resume, and typical job runners). The case where the reactor fiber is entered viaFiber#transferand is off the resume chain (e.g. owned by another transfer-based scheduler) is not addressed here — it requires the Ruby-level fix tracked in https://bugs.ruby-lang.org/issues/20081, since fiber termination needs to return control to the transferring fiber.