This paper introduces Idempotent Test-Time Training (IT3),
a novel approach to addressing the challenge of distribution shift.
While supervised-learning methods assume matching train and test distributions, this is rarely the case for machine learning systems deployed in the real world. Test-Time Training (TTT) approaches address this by adapting models during inference, but they are limited by a domain-specific auxiliary task. IT3 is based on the universal property of idempotence. An idempotent operator is one that can be applied sequentially without changing the result beyond the initial application, that is
For easy integration of IT³, use the
torch-ttt
library. The
IT3Engine
class offers a seamless API for test-time training with virtually any architecture and task. You can try IT³ directly in this
Colab notebook
.
Integrating torch-ttt
’s IT³ engine into your model is as simple as:
engine = IT3Engine(
model=network, # Original model
features_layer_name="layer_name" # The layer to which output features will be added
)
Original Model
Modified Model
Idempotent Test-Time Training (IT³) enables the model to improve predictions on corrupted or unfamiliar data by optimizing itself during inference. In the example below, the model refines its output closer to the Ground Truth after applying IT³, compared to the Not Optimized version. Our approach results in more accurate and robust predictions in real-world scenarios where data distribution may shift unexpectedly.
Input Image
Not Optimized
Optimized
Ground Truth
Idempotent Test-Time Training (IT³) enhances the model’s ability to generalize to out-of-distribution (OOD) data. By applying optimization during inference, IT³ adjusts predictions for data that differs significantly from the training set, resulting in lower error rates across different OOD levels.
Age results on OOD images.
Airfoil results on OOD shapes.
Car results on OOD shapes.
Roads results on OOD images.
Test accuracy (%) on ImageNet-C across 5 corruption levels.
Idempotence serves as a strong signal of model reliability under distribution shift. In the first plot, we observe a strong negative correlation (−0.94) between idempotence error and accuracy on Corrupted ImageNet—batches with more stable predictions (lower idempotence error) consistently yield better performance. In the second plot, idempotence error clearly separates in-distribution from out-of-distribution samples, with OOD inputs showing much higher errors. IT³ reduces these errors during inference, aligning OOD representations more closely with the training distribution and boosting accuracy.
Accuracy vs Idempotence.
Idempotence vs. Out-of-Distributionness.
These plots show the predictions of various TTT methods on an aerial segmentation task. As can be seen, our approach consistently improves prediction quality and outperforms other popular methods (best viewed in high resolution).
Input Image
TTT (bs=16)
Q=30.63
ActMAD (bs=16)
Q=31.66
IT³ (bs=16)
Q=55.79
Ground Truth
Input Image
TTT (bs=16)
Q=16.98
ActMAD (bs=16)
Q=16.11
IT³ (bs=16)
Q=40.64
Ground Truth
Input Image
TTT (bs=16)
Q=53.72
ActMAD (bs=16)
Q=50.39
IT³ (bs=16)
Q=77.17
Ground Truth
@article{durasov2025it,
title={{IT}\${\textasciicircum}3\$: Idempotent Test-Time Training},
author={Nikita Durasov and Assaf Shocher and Doruk Oner and Gal Chechik and Alexei A Efros and Pascal Fua},
booktitle={Forty-second International Conference on Machine Learning},
year={2025},
url={https://openreview.net/forum?id=MHiTGWDbIb}
}