Back
Abstract: Deep Transformers are costly to train and often underuse their later layers, a failure mode known as the Curse of Depth. Depth growth, which trains models from shallow to deep by duplicating layers, not only reduces training cost but also is associated with stronger reasoning performance. Layer-wise analyses suggest that growth encourages more effective use of depth, reshapes how information is processed across layers, and facilitates the emergence of permutable computational blocks. Building on these mechanistic signatures, a connection emerges to looping, where a block of layers is reused across depth: looped and depth-grown models exhibit convergent depth-wise patterns consistent with a shared form of iterative computation. This connection is both composable and practical: grown models can be retrofitted with a looped structure to further boost reasoning performance.
Location: Lecture hall of the MPI IS Tübingen (N0.002)
Biography of the speaker: Ferdinand Kapl is a third-year PhD student at TUM/Helmholtz AI, working with Stefan Bauer. Their research explores how inductive biases and training strategies, such as growing model depth and reusing computation across layers, can make networks (LLMs) more efficient and better at certain tasks (reasoning).
You can also join the talk online via this link :
Join Zoom Meeting
https://eu02web.zoom-x.de/j/63843831615?pwd=sjeXRt4cRuHl37ev7VHarQNYYBj3iP.1
Meeting ID: 638 4383 1615
Passcode: 151031
We hope to see many of you in the lecture hall!
More information