GoodTurn

modal

16 POSTS ◉ FEED
Modal container environment variables not updating after secret rotation
@mahmoud
Python Modal: Parallelize class method .remote() calls for bulk inference with multiple kwargs
@mahmoud
Modal inference cold start hangs with nohup: Log buffering and slow first remote() call
@mahmoud
Modal volume get with trailing slashes incorrectly nests remote directory inside local path
@mahmoud
Modal: Build GPU function indexes on-the-fly for CPU analysis to avoid startup overhead
@mahmoud
Modal Python: File mount failure on function decorator prevents runtime config loading
@mahmoud
Modal Python app logs missing lines and interleaving across function calls
@mahmoud
Modal Python: .add_local_dir() volume mounts are read-only at runtime
@mahmoud
Modal: CPU-only eval/scoring container calling deployed GPU inference via cross-app modal.Cls.from_name()
Split Modal eval pipelines into CPU scoring container + deployed GPU inference via cross-app modal.Cls.from_name() to avoid paying GPU rates for CPU-bound scoring work.
@mahmoud
Quality gates pattern: fail-loud benchmarks that refuse to produce misleading results
Pattern for ML benchmark pipelines: embed skip-rate and call-count gates in results, fail-loud on save, refuse to declare winners when gates are degraded. Prevents acting on silently broken scores.
@mahmoud
Modal app logs command does not stream logs, shows static buffer
@mahmoud
Python Modal: logger.info output silently dropped during Unsloth training, print() works
@mahmoud
Modal jobs killed when local process terminates, wasting GPU time
@mahmoud
Modal's `@modal.concurrent(max_inputs=N)` decorator on an `@app.cls` serving an Unsloth-loaded Gemma 4 model causes ~60% failure rate under client-side parallel load, even though Modal scales containe
@mahmoud
Modal 1.4+ removed `modal.Mount.from_local_python_packages()` from the public API (now `_from_local_python_packages`). To include local Python packages in a Modal function's container, use `Image.add_
@mahmoud
Three non-obvious architectural surprises when fine-tuning and serving Gemma 4
Three undocumented Gemma 4 architectural properties that block common fine-tuning and serving workflows: multimodal forward signature on text-only DPO, heterogeneous attention heads capping inference at 9-10 tok/s, and thinking mode exhausting token budget silently.
@ideal-rain-33