Mitigating Perceptual Judgment Bias in Multimodal LLM-as-a-Judge via Perceptual Perturbation and Reward Modeling

KAIST AI
*Equal Contribution Corresponding author: kateshim@kaist.ac.kr
Interpolation end reference image.

Perceptual judgment bias in MLLM judges. (a) When perceptual capability is insufficient, a judge may produce incorrect visual descriptions (a2) and assign high scores (a3) to perceptually wrong responses (a2). (b) Even when the judge's own perception aligns with humans (b2), it may still prefer (b5) visually inconsistent responses (b3) compared to the response with correct perception (b4). We introduce Perception-Judge, an MLLM judge trained with reinforcement learning on a systematically designed perception-grounded dataset, PPJD, which effectively mitigates these perceptual biases in MLLM judgment (a4), (b6)

Abstract

Recent multimodal large language models have demonstrated strong reasoning ability, yet their reliability as automated evaluators remains limited by a critical weakness: when visual evidence conflicts with textual cues, MLLM judges tend to reward plausible narratives over perceptually correct answers. We identify and systematically analyze this phenomenon, which we term \textit{Perceptual Judgment Bias}. Through controlled visual perturbations, existing multimodal judges frequently anchor on the response text instead of their own visual perception, leading to inconsistent and non-verifiable evaluations. To address this issue, we introduce the \textit{Perceptually Perturbed Judgment Dataset}, which constructs minimally edited counterfactual responses that isolate perceptual errors and enable verifiable supervision. Building on this dataset, we develop a unified training framework that optimizes a verifiable batch-ranking reward with GRPO, achieving coherent global ordering without explicit pairwise labels. Experiments across diverse MLLM-as-a-Judge benchmarks show that our approach substantially improves perceptual fidelity, ranking coherence, and alignment with human evaluation. Our method establishes a principled and scalable paradigm for training multimodal judges that are perceptually grounded, interpretable, and robust to visual–reasoning conflicts.

Pipeline Overview

Interpolation end reference image.

Using perceptual perturbations, we construct the Perceptually Perturbed Judgment Dataset (PPJD). For each correct response, PPJD generates two perturbed variants. The first response, \(r_{\texttt{r}_{\textit{p}}}\), is produced by altering visually grounded attributes while preserving the original structure. The second response, \(r_{\texttt{r}_{\textit{p+r}}}\), is created by additionally degrading the original reasoning, resulting in a fully corrupted answer. We define the preference order of judgments as \(r_{\texttt{c}} \succ r_{\texttt{r}_{\textit{p}}} \succ r_{\texttt{r}_{\textit{p+r}}}\). Utilizing these ordered triplets, we train the model with GRPO, where the reward is computed using the Levenshtein distance.

Related Links

There's a lot of excellent work that was introduced around the same time as ours.

Progressive Encoding for Neural Optimization introduces an idea similar to our windowed position encoding for coarse-to-fine optimization.

D-NeRF and NR-NeRF both use deformation fields to model non-rigid scenes.

Some works model videos with a NeRF by directly modulating the density, such as Video-NeRF, NSFF, and DyNeRF

There are probably many more by the time you are reading this. Check out Frank Dellart's survey on recent NeRF papers, and Yen-Chen Lin's curated list of NeRF papers.

BibTeX

@article{park2021nerfies,
  author    = {Park, Keunhong and Sinha, Utkarsh and Barron, Jonathan T. and Bouaziz, Sofien and Goldman, Dan B and Seitz, Steven M. and Martin-Brualla, Ricardo},
  title     = {Nerfies: Deformable Neural Radiance Fields},
  journal   = {ICCV},
  year      = {2021},
}