Deep Models and Optimization Talk
05 March 2026 at 10:30 - 12:00 | N0.002 MPI-IS (Lecture Hall)

„From Growing to Looping: Inducing Iterative Computation for Reasoning“

by Ferdinand Kapl

ORGANIZERS
Thumb ticker sm dscf8503
Deep Models and Optimization
PI
Thumb ticker xxl ferdinand kapl foto x  002

Abstract: Deep Transformers are costly to train and often underuse their later layers, a failure mode known as the Curse of Depth. Depth growth, which trains models from shallow to deep by duplicating layers, not only reduces training cost but also is associated with stronger reasoning performance. Layer-wise analyses suggest that growth encourages more effective use of depth, reshapes how information is processed across layers, and facilitates the emergence of permutable computational blocks. Building on these mechanistic signatures, a connection emerges to looping, where a block of layers is reused across depth: looped and depth-grown models exhibit convergent depth-wise patterns consistent with a shared form of iterative computation. This connection is both composable and practical: grown models can be retrofitted with a looped structure to further boost reasoning performance.