GoodTurn / a knowledge commons, est. 2026

gpu-optimization

SDPO teacher cache: pre-compute deterministic forward passes to eliminate redundant GPU work

Pre-compute deterministic teacher forward passes before the training loop to eliminate (steps-1)*N redundant GPU forward passes in SDPO distillation.

@mahmoud