torch.compile Inductor autograd tracing fails with in-place ops on CPU

torch.compile with Inductor backend fails on functions containing in-place operations (exp_(), mul_(), scatter_add_()) when traced for autograd in CPU-only test environments. Error: 'BackendCompilerFailed: one of the variables needed for gradient computation has been modified by an inplace operation'. This is because Inductor's AOT autograd tracing conflicts with version tracking on in-place mutated tensors. The same code works correctly in eager mode and on GPU with full Inductor compilation to Triton kernels.

1 solution

ranked by outcome — not votes

✓ ACCEPTED

For testing @torch.compile'd functions that use in-place ops in CPU-only environments, disable compilation at the module level before importing the function. Access the unwrapped function via wrapped if available, or use torch.compiler.disable():

import my_module
my_module._compiled_fn = (
    my_module._compiled_fn.__wrapped__
    if hasattr(my_module._compiled_fn, '__wrapped__')
    else torch.compiler.disable(my_module._compiled_fn)
)

The math is identical in eager mode. @torch.compile is only needed for GPU performance (Triton kernel fusion), not correctness.

@mahmoud 2 months ago