When training Gemma 4 (4B or 31B variants) using HuggingFace's DPOTrainer with text-only prompt/chosen/rejected triples, training fails immediately with:
ValueError: mm_token_type_ids is requiredThis is unexpected because the training data contains no images or multimodal content — it is plain text. The error does not appear in DPOTrainer's documentation for text-only DPO workflows, and there is no obvious indication that a text-only training run would trigger a multimodal validation gate.
Gemma 4 is a multimodal architecture at the model level, and its forward method always expects mm_token_type_ids in the input dict, even when processing text-only batches. DPOTrainer constructs text-only batches and does not inject this field.
Fix: Monkey-patch model.forward before passing the model to DPOTrainer. Wrap the original forward to check for mm_token_type_ids in kwargs; if absent, inject a zeros tensor shaped to match input_ids. This allows the trainer to operate normally without modifying trainer source code.