LLM-Guided Reasoning
Anatomical planning via LLM to localize anchor organs and generate multi-scale ROI proposals.
Statistical Rejection
GPU-backed MMD permutation tests with FDR control to suppress false-positive candidates.
Training-Free
Zero parameter updates — compatible with test-time augmentation, no catastrophic forgetting.
Abstract
Foundation models for medical image segmentation struggle under out-of-distribution (OOD) shifts, often producing fragmented false positives on OOD tumors. We introduce R2-Seg, a training-free framework for robust OOD tumor segmentation that operates via a two-stage Reason-and-Reject process. First, the Reason step employs an LLM-guided anatomical reasoning planner to localize organ anchors and generate multi-scale ROIs. Second, the Reject step applies two-sample statistical testing to candidates generated by a frozen foundation model (BiomedParse) within these ROIs. This statistical rejection filter retains only candidates significantly different from normal tissue, effectively suppressing false positives. Our framework requires no parameter updates, making it compatible with zero-update test-time augmentation and avoiding catastrophic forgetting. On multi-center and multi-modal tumor segmentation benchmarks, R2-Seg substantially improves Dice, specificity, and sensitivity over strong baselines and the original foundation models.
OOD Test-Time Adaptation
When vision embeddings are well separated, the model can distinguish foreground from background by aligning text embeddings with a single decision boundary (left).
In medical imaging, however, protocols vary across scanners, tumor sites, and modalities. Vision embeddings for OOD samples become poorly separated, biasing the decision boundary so that background structures are misclassified as tumors — leading to high false-positive rates and potentially harmful over-diagnosis (right).
R2-Seg Pipeline
Top row: LLM-based segmentation planning and ROI construction. Middle row: BioMedParse-based tumor segmentation and candidate extraction. Bottom row: Statistical two-sample test and false discovery rate control.
Experiments
Visualization of segmentation results for both in-distribution and out-of-distribution tumor types.
BibTeX
@misc{shen2025r2segtrainingfreeoodmedical,
title={R$^{2}$Seg: Training-Free OOD Medical Tumor Segmentation via Anatomical Reasoning and Statistical Rejection},
author={Shuaike Shen and Ke Liu and Jiaqing Xie and Shangde Gao and Chunhua Shen and Ge Liu and Mireia Crispin-Ortuzar and Shangqi Gao},
year={2025},
eprint={2511.12691},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2511.12691},
}