feat(dynamo): add target_executorch setting to keep output-allocator ops in PyTorch#4355
feat(dynamo): add target_executorch setting to keep output-allocator ops in PyTorch#4355shoumikhin wants to merge 1 commit into
Conversation
424b09f to
85fa1eb
Compare
|
@shoumikhin why can't downstream operators consume output allocated outputs? Also name wise, I'd rather a specific name like |
26f7d08 to
de6d90a
Compare
…pendent ops in PyTorch Some converters require a TensorRT output allocator because their output shape is data-dependent (for example aten.nonzero and boolean aten.index.Tensor). A TensorRT engine that needs an output allocator cannot be consumed by every downstream runtime that executes the compiled program. This adds a fallback_data_dependent_ops compile setting (default False). When enabled, an operator runs in PyTorch instead of TensorRT iff the converter selected for that specific node requires an output allocator. When disabled (the default), behavior is unchanged. The decision is made per node in the partitioner operator-support check (TorchTensorRTOperatorSupport and OpSupportTester), which asks the converter registry which converter it would select for the node (honoring capability_validator). This matters for targets like aten.index.Tensor that register two converters: boolean indexing requires an output allocator and falls back, while ordinary integer gather indexing stays on TensorRT. Details: - Wired through compile() and cross_compile_for_windows(); the check runs during partitioning, which both entry points reach through compile_module(). It is intentionally not exposed on convert_exported_program_to_serialized_trt_engine(), where a single serialized engine cannot contain PyTorch fallbacks. - Combining fallback_data_dependent_ops with require_full_compilation raises a clear error, since routing ops to PyTorch contradicts full compilation. - CompilationSettings.__setstate__ defaults the new field so older pickles load. Tests (tests/py/dynamo/models/test_fallback_data_dependent_ops.py): default value; old-pickle compatibility; the per-node support decision for nonzero (CPU); the require_full_compilation conflict; and an end to end GPU test that a data-dependent op falls back to PyTorch. Signed-off-by: shoumikhin <shoumikhin@meta.com>
de6d90a to
8da5cd6
Compare
|
Thanks, good questions.
It's the production side, not consumption. For data-dependent ops like
Done, renamed to |
Description
Some converters require a TensorRT output allocator because their output shape is
data-dependent (for example
aten.nonzeroand booleanaten.index.Tensor). ATensorRT engine that needs an output allocator cannot be consumed by every downstream
runtime that executes the compiled program (for instance, runtimes that rely on
ahead-of-time static memory planning and cannot size an output whose shape is only
known after the engine runs).
This adds a
fallback_data_dependent_opscompile setting (defaultFalse). Whenenabled, an operator runs in PyTorch instead of TensorRT iff the converter selected
for that specific node requires an output allocator. When disabled (the default),
behavior is unchanged.
Details
(
TorchTensorRTOperatorSupportandOpSupportTester), which asks the converterregistry which converter it would select for the node (honoring
capability_validator). This matters for targets likeaten.index.Tensorthatregister two converters: boolean indexing requires an output allocator and falls
back, while ordinary integer gather indexing stays on TensorRT.
compile()andcross_compile_for_windows(); the check runs duringpartitioning, which both entry points reach through
compile_module(). It isintentionally not exposed on
convert_exported_program_to_serialized_trt_engine(),where a single serialized engine cannot contain PyTorch fallbacks.
fallback_data_dependent_opswithrequire_full_compilationraises aclear error, since routing ops to PyTorch contradicts full compilation.
CompilationSettings.__setstate__defaults the new field so older pickles load.Tests
tests/py/dynamo/models/test_fallback_data_dependent_ops.py:Falseand is settable;False;nonzero(CPU, no GPU needed);require_full_compilationraises;nonzero) falls back to PyTorch.Type of change
Checklist