I am a Research Scientist at Google DeepMind.

Before that, I obtained my PhD degree at Ecole Normale Supérieure (ENS Paris), under the supervision of Gabriel Peyré and Mathieu Blondel.

I graduated from École Polytechnique (X2016) and have a master degree from ENS Paris-Saclay in mathematics, vision and learning (MVA), as well as a master degree from Sorbonne Université in mathematics (Modelling).

Contact: michael (dot) sander (at) polytechnique (dot) org

Papers

  • Maxime Guigon, Lucas Dixon, Michaël E. Sander. A Study on Hidden Layer Distillation for Large Language Model Pre-Training. Preprint, 2026. Paper

  • Quentin Berthet, Yu-Han Wu, Clement Crepy, Romuald Elie, Klaus Greff, Michaël E. Sander. MIND: Monge Inception Distance for Generative Models Evaluation. Preprint, 2026. Paper

  • Lev Fedorov, Michaël E. Sander, Romuald Elie, Pierre Marion, Mathieu Laurière. Clustering in Deep Stochastic Transformers. ICML, 2026. Paper

  • Germain Vivier-Ardisson, Michaël E. Sander, Axel Parmentier, Mathieu Blondel. Differentiable Knapsack and Top-k Operators via Dynamic Programming. Preprint, 2026. Paper

  • Mathieu Blondel, Michaël E. Sander, Germain Vivier-Ardisson, Tianlin Liu, Vincent Roulet. Autoregressive Language Models are Secretly Energy-Based Models: Insights into the Lookahead Capabilities of Next-Token Prediction. ICML, 2025. Paper

  • Gemini Team. Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities. Preprint, 2025. Paper

  • Michaël E. Sander, Vincent Roulet, Tianlin Liu, Mathieu Blondel. Joint Learning of Energy-based Models and their Partition Function. ICML, 2025. Paper

  • Vincent Roulet, Tianlin Liu, Nino Vieillard, Michael E. Sander, Mathieu Blondel. Loss Functions and Operators Generated by f-Divergences. ICML, 2025. Paper

  • Michaël E. Sander, Gabriel Peyré. Towards Understanding the Universality of Transformers for Next-Token Prediction. ICLR, 2025. Paper

  • Michaël E. Sander. Deeper Learning: Residual Networks, Neural Differential Equations and Transformers, in Theory and Action. 2024. PhD Manuscript.

  • Michaël E. Sander, Raja Giryes, Taiji Suzuki, Mathieu Blondel, Gabriel Peyré. How do Transformers perform In-Context Autoregressive Learning?. ICML, 2024. Paper, GitHub

  • Pierre Marion, Yu-Han Wu, Michaël E. Sander, Gérard Biau. Implicit regularization of deep residual networks towards neural ODEs. ICLR, 2024 (Spotlight). Paper, GitHub

  • Michaël E. Sander, Tom Sander, Maxime Sylvestre. Unveiling the secrets of paintings: deep neural networks trained on high-resolution multispectral images for accurate attribution and authentication. Conference on Quality Control by Artificial Vision, 2023. Paper

  • Michaël E. Sander, Joan Puigcerver, Josip Djolonga, Gabriel Peyré, Mathieu Blondel. Fast, Differentiable and Sparse Top-k: a Convex Analysis Perspective. ICML, 2023. Paper, GitHub

  • Michaël E. Sander, Pierre Ablin, Gabriel Peyré. Do Residual Neural Networks discretize Neural Ordinary Differential Equations? NeurIPS, 2022. Paper, GitHub

  • Samy Jelassi, Michaël E. Sander, Yuanzhi Li. Vision Transformers provably learn spatial structure. NeurIPS, 2022. Paper

  • Michaël E. Sander, Pierre Ablin, Mathieu Blondel, Gabriel Peyré. Sinkformers: Transformers with Doubly Stochastic Attention. AISTATS, 2022. Paper, GitHub, short presentation

  • Michaël E. Sander, Pierre Ablin, Mathieu Blondel, Gabriel Peyré. Momentum Residual Neural Networks. ICML, 2021. Paper, GitHub, short presentation