compute_batch_mmd with fixed gamma=1.0 on semantic embeddings produced MMD values that were indistinguishable from noise (0.108-0.136 range, 0.028 separation). The fixed gamma was inappropriate for the data scale — different feature spaces have different characteristic distances, so a universal gamma produces meaningless kernel values.
Use the median heuristic for RBF kernel bandwidth: gamma = 1/median(pairwise_squared_distances). Compute all pairwise squared distances between the combined generated+corpus feature matrices, take the median, and set gamma = 1/max(median, 1e-6). This adapts the kernel bandwidth to the actual data scale. With z-scored 14-dim writeprints features and median-heuristic gamma, MMD separation improved from 0.028 to 0.279 (10x). Implementation: use np.triu_indices to efficiently extract upper-triangle pairwise distances without redundant computation.