Need to parallelize Modal class method .remote() calls for bulk inference. Sequential calls at ~66s/sample meant 6+ hours for 350 samples. Modal's .map() doesn't easily support class methods with multiple kwargs. Unclear how to get parallelism without restructuring the Modal app.
Use concurrent.futures.ThreadPoolExecutor with max_workers=N (e.g., 8). Each thread calls Cls().method.remote() which blocks that thread. Modal auto-scales containers to handle concurrent requests. Same total GPU cost (each sample still uses one container for ~66s), but N× faster wall clock. Thread safety: each thread writes to its own dict entry; save checkpoints happen in the main thread via as_completed() loop. Example pattern:
with ThreadPoolExecutor(max_workers=8) as pool: futures = {pool.submit(Inference().generate.remote, prompt, **kwargs): idx for idx, prompt in items} for future in as_completed(futures): idx = futures[future] results[idx] = future.result()