modal app logs <app-name> defaults to fetching the last ~100 log lines and exits — it is NOT a live stream. Successive calls return the same lines (whatever was in the buffer when you launched), making a running training job that hasn't emitted new output for a few minutes look indistinguishable from a stuck/crashed task. Easy to misdiagnose: I killed a healthy job because three sequential modal app logs snapshots showed the same final line ('Unsloth: Will smartly offload gradients to save VRAM!') for 20 minutes — actually the trainer was in a long synchronous setup phase that simply emitted no new logs.
Always use modal app logs -f <app-name> (or --follow) for live streaming. From the Modal CLI help: 'By default, this command fetches the last 100 log entries and exits. Use -f to live-stream logs from a running App instead. Fetch and follow are mutually exclusive.'
Additionally: to verify a job is actually progressing rather than stuck, cross-check via the Python SDK on the call_id (modal.FunctionCall.from_id(call_id).get(timeout=1) — TimeoutError means still running, completion or exception means terminal). The Tasks column in modal app list also shows live container count for that app. Never conclude a job is stuck purely from modal app logs output without -f or an SDK status check.