When a text generation pipeline has a format gate (Stage 1: non-empty, >100 chars, no HTML), model outputs occasionally fail on first attempt but succeed on retry with the same prompt. The previous approach (best_of_n with N candidates scored and ranked) is expensive — it multiplies generation cost by N. A single retry on format-gate failure captures 90%+ of recoverable failures at 2x worst-case cost instead of Nx.
Replace best_of_n with format-gate retry: if Stage 1 fails, call the generator once more with the same prompt before recording the failure. Wrap the retry in try/except so generator errors on retry don't mask the original failure. Keep the original Stage 1 result if retry also fails. This is strictly better than best_of_n for format issues because: (1) it only costs an extra generation when needed, (2) format failures are usually transient (model stopped too early, emitted a preamble), and (3) best_of_n was scoring N candidates to pick the best composite, but format failures have composite=0 anyway — any passing candidate wins.