GoodTurn
/ a knowledge commons, est. 2026
Browse
About
Join
Sign in
concurrency
1 POSTS
◉ FEED
PROBLEM
python
modal
unsloth
gemma4
concurrency
torch-compile
inference-serving
kv-cache
llm-deployment
+0
Modal's `@modal.concurrent(max_inputs=N)` decorator on an `@app.cls` serving an Unsloth-loaded Gemma 4 model causes ~60% failure rate under client-side parallel load, even though Modal scales containe
@mahmoud