–
I am a Research Scientist at Google DeepMind.
Before that, I obtained my PhD degree at Ecole Normale Supérieure (ENS Paris), under the supervision of Gabriel Peyré and Mathieu Blondel.
I graduated from École Polytechnique (X2016) and have a master degree from ENS Paris-Saclay in mathematics, vision and learning (MVA), as well as a master degree from Sorbonne Université in mathematics (Modelling).
Contact: michael (dot) sander (at) polytechnique (dot) org
Papers
Maxime Guigon, Lucas Dixon, Michaël E. Sander. A Study on Hidden Layer Distillation for Large Language Model Pre-Training. Preprint, 2026. Paper
Quentin Berthet, Yu-Han Wu, Clement Crepy, Romuald Elie, Klaus Greff, Michaël E. Sander. MIND: Monge Inception Distance for Generative Models Evaluation. Preprint, 2026. Paper
Lev Fedorov, Michaël E. Sander, Romuald Elie, Pierre Marion, Mathieu Laurière. Clustering in Deep Stochastic Transformers. ICML, 2026. Paper
Germain Vivier-Ardisson, Michaël E. Sander, Axel Parmentier, Mathieu Blondel. Differentiable Knapsack and Top-k Operators via Dynamic Programming. Preprint, 2026. Paper
Mathieu Blondel, Michaël E. Sander, Germain Vivier-Ardisson, Tianlin Liu, Vincent Roulet. Autoregressive Language Models are Secretly Energy-Based Models: Insights into the Lookahead Capabilities of Next-Token Prediction. ICML, 2025. Paper
Gemini Team. Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities. Preprint, 2025. Paper
Michaël E. Sander, Vincent Roulet, Tianlin Liu, Mathieu Blondel. Joint Learning of Energy-based Models and their Partition Function. ICML, 2025. Paper
Vincent Roulet, Tianlin Liu, Nino Vieillard, Michael E. Sander, Mathieu Blondel. Loss Functions and Operators Generated by f-Divergences. ICML, 2025. Paper
Michaël E. Sander, Gabriel Peyré. Towards Understanding the Universality of Transformers for Next-Token Prediction. ICLR, 2025. Paper
Michaël E. Sander. Deeper Learning: Residual Networks, Neural Differential Equations and Transformers, in Theory and Action. 2024. PhD Manuscript.
Michaël E. Sander, Raja Giryes, Taiji Suzuki, Mathieu Blondel, Gabriel Peyré. How do Transformers perform In-Context Autoregressive Learning?. ICML, 2024. Paper, GitHub
Pierre Marion, Yu-Han Wu, Michaël E. Sander, Gérard Biau. Implicit regularization of deep residual networks towards neural ODEs. ICLR, 2024 (Spotlight). Paper, GitHub
Michaël E. Sander, Tom Sander, Maxime Sylvestre. Unveiling the secrets of paintings: deep neural networks trained on high-resolution multispectral images for accurate attribution and authentication. Conference on Quality Control by Artificial Vision, 2023. Paper
Michaël E. Sander, Joan Puigcerver, Josip Djolonga, Gabriel Peyré, Mathieu Blondel. Fast, Differentiable and Sparse Top-k: a Convex Analysis Perspective. ICML, 2023. Paper, GitHub
Michaël E. Sander, Pierre Ablin, Gabriel Peyré. Do Residual Neural Networks discretize Neural Ordinary Differential Equations? NeurIPS, 2022. Paper, GitHub
Samy Jelassi, Michaël E. Sander, Yuanzhi Li. Vision Transformers provably learn spatial structure. NeurIPS, 2022. Paper
Michaël E. Sander, Pierre Ablin, Mathieu Blondel, Gabriel Peyré. Sinkformers: Transformers with Doubly Stochastic Attention. AISTATS, 2022. Paper, GitHub, short presentation
Michaël E. Sander, Pierre Ablin, Mathieu Blondel, Gabriel Peyré. Momentum Residual Neural Networks. ICML, 2021. Paper, GitHub, short presentation
