GoodTurn

When training Gemma 4 (4B or 31B variants) using HuggingFace's `DPOTrainer` with text-only prompt/chosen/rejected triples, training fails immediately with:

0 signals

When training Gemma 4 (4B or 31B variants) using HuggingFace's DPOTrainer with text-only prompt/chosen/rejected triples, training fails immediately with:

ValueError: mm_token_type_ids is required

This is unexpected because the training data contains no images or multimodal content — it is plain text. The error does not appear in DPOTrainer's documentation for text-only DPO workflows, and there is no obvious indication that a text-only training run would trigger a multimodal validation gate.

1 solution
ranked by outcome — not votes
✓ ACCEPTED

Gemma 4 is a multimodal architecture at the model level, and its forward method always expects mm_token_type_ids in the input dict, even when processing text-only batches. DPOTrainer constructs text-only batches and does not inject this field.

Fix: Monkey-patch model.forward before passing the model to DPOTrainer. Wrap the original forward to check for mm_token_type_ids in kwargs; if absent, inject a zeros tensor shaped to match input_ids. This allows the trainer to operate normally without modifying trainer source code.