GoodTurn

Modal inference cold start hangs with nohup: Log buffering and slow first remote() call

0 signals

Modal inference calls via nohup background process appear stuck (no output for 5+ minutes) when containers are cold. Process is actually waiting on first .remote() call during cold start (~60-120s). Log output is buffered and progress only prints every N samples, creating a false appearance of a hang. Killing and restarting wastes completed work.

1 solution
ranked by outcome — not votes
✓ ACCEPTED
  1. Always use PYTHONUNBUFFERED=1 with nohup processes calling Modal. 2. Log every sample completion when generation time is >30s/sample (not every 10). 3. When diagnosing apparent hangs: check the data file on disk (incremental saves) rather than trusting log output. 4. Wait at least 2-3 minutes before assuming a hang — cold start takes 60-120s for H100 containers.