R2-Seg: Training-Free OOD Medical Tumor Segmentation via Anatomical Reasoning and Statistical Rejection

1Carnegie Mellon University  ·  2University of Cambridge  ·  3Zhejiang University
4ETH Zurich  ·  5University of Illinois Urbana-Champaign
CVPR 2026 Oral
*Equal Contribution  ·  Corresponding Author

LLM-Guided Reasoning

Anatomical planning via LLM to localize anchor organs and generate multi-scale ROI proposals.

Statistical Rejection

GPU-backed MMD permutation tests with FDR control to suppress false-positive candidates.

Training-Free

Zero parameter updates — compatible with test-time augmentation, no catastrophic forgetting.

Abstract

Foundation models for medical image segmentation struggle under out-of-distribution (OOD) shifts, often producing fragmented false positives on OOD tumors. We introduce R2-Seg, a training-free framework for robust OOD tumor segmentation that operates via a two-stage Reason-and-Reject process. First, the Reason step employs an LLM-guided anatomical reasoning planner to localize organ anchors and generate multi-scale ROIs. Second, the Reject step applies two-sample statistical testing to candidates generated by a frozen foundation model (BiomedParse) within these ROIs. This statistical rejection filter retains only candidates significantly different from normal tissue, effectively suppressing false positives. Our framework requires no parameter updates, making it compatible with zero-update test-time augmentation and avoiding catastrophic forgetting. On multi-center and multi-modal tumor segmentation benchmarks, R2-Seg substantially improves Dice, specificity, and sensitivity over strong baselines and the original foundation models.

OOD Test-Time Adaptation

When vision embeddings are well separated, the model can distinguish foreground from background by aligning text embeddings with a single decision boundary (left).

In medical imaging, however, protocols vary across scanners, tumor sites, and modalities. Vision embeddings for OOD samples become poorly separated, biasing the decision boundary so that background structures are misclassified as tumors — leading to high false-positive rates and potentially harmful over-diagnosis (right).

In-distribution vs out-of-distribution embedding separation

R2-Seg Pipeline

Top row: LLM-based segmentation planning and ROI construction. Middle row: BioMedParse-based tumor segmentation and candidate extraction. Bottom row: Statistical two-sample test and false discovery rate control.

R2-Seg pipeline overview showing the Reason and Reject stages

Experiments

Visualization of segmentation results for both in-distribution and out-of-distribution tumor types.

Segmentation results for ID and OOD tumors
Quantitative results
Ablation study

BibTeX

@misc{shen2025r2segtrainingfreeoodmedical,
      title={R$^{2}$Seg: Training-Free OOD Medical Tumor Segmentation via Anatomical Reasoning and Statistical Rejection},
      author={Shuaike Shen and Ke Liu and Jiaqing Xie and Shangde Gao and Chunhua Shen and Ge Liu and Mireia Crispin-Ortuzar and Shangqi Gao},
      year={2025},
      eprint={2511.12691},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2511.12691},
}