Publications
Michael E. Sander, Gabriel Peyré. Towards Understanding the Universality of Transformers for Next-Token Prediction. Preprint.
Michael E. Sander. Deeper Learning: Residual Networks, Neural Differential Equations and Transformers, in Theory and Action. PhD Manuscript.
Michael E. Sander, Raja Giryes, Taiji Suzuki, Mathieu Blondel, Gabriel Peyré. How do Transformers perform In-Context Autoregressive Learning?. ICML, 2024 Paper, GitHub
Pierre Marion, Yu-Han Wu, Michael E. Sander, Gérard Biau. Implicit regularization of deep residual networks towards neural ODEs. ICLR, 2024 (Spotlight). Paper, GitHub
Michael E. Sander, Joan Puigcerver, Josip Djolonga, Gabriel Peyré, Mathieu Blondel. Fast, Differentiable and Sparse Top-k: a Convex Analysis Perspective. ICML, 2023. Paper, GitHub
Michael E. Sander, Pierre Ablin, Gabriel Peyré. Do Residual Neural Networks discretize Neural Ordinary Differential Equations? NeurIPS, 2022. Paper, GitHub
Samy Jelassi, Michael E. Sander, Yuanzhi Li. Vision Transformers provably learn spatial structure. NeurIPS, 2022. Paper
Michael E. Sander, Pierre Ablin, Mathieu Blondel, Gabriel Peyré. Sinkformers: Transformers with Doubly Stochastic Attention. AISTATS, 2022. Paper, GitHub, short presentation
Michael E. Sander, Pierre Ablin, Mathieu Blondel, Gabriel Peyré. Momentum Residual Neural Networks. ICML, 2021. Paper, GitHub, short presentation