–
I am a Research Scientist at Google DeepMind.
Before that, I obtained my PhD degree at Ecole Normale Supérieure (ENS Paris), under the supervision of Gabriel Peyré and Mathieu Blondel.
I graduated from École Polytechnique (X2016) and have a master degree from ENS Paris-Saclay in mathematics, vision and learning (MVA), as well as a master degree from Sorbonne Université in mathematics (Modelling).
Contact: michael (dot) sander (at) polytechnique (dot) org
Papers
Maxime Guigon, Lucas Dixon, M.S.. A Study on Hidden Layer Distillation for Large Language Model Pre-Training. Preprint, 2026. Paper
Quentin Berthet, Yu-Han Wu, Clement Crepy, Romuald Elie, Klaus Greff, M.S.. MIND: Monge Inception Distance for Generative Models Evaluation. Preprint, 2026. Paper
Germain Vivier-Ardisson, M.S., Axel Parmentier, Mathieu Blondel. Differentiable Knapsack and Top-k Operators via Dynamic Programming. Preprint, 2026. Paper
Lev Fedorov, M.S., Romuald Elie, Pierre Marion, Mathieu Laurière. Clustering in Deep Stochastic Transformers. ICML, 2026 (Spotlight). Paper
Mathieu Blondel, M.S., Germain Vivier-Ardisson, Tianlin Liu, Vincent Roulet. Autoregressive Language Models are Secretly Energy-Based Models: Insights into the Lookahead Capabilities of Next-Token Prediction. ICML, 2026. Paper
Gemini Team. Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities. Preprint, 2025. Paper
M.S., Vincent Roulet, Tianlin Liu, Mathieu Blondel. Joint Learning of Energy-based Models and their Partition Function. ICML, 2025. Paper
Vincent Roulet, Tianlin Liu, Nino Vieillard, M.S., Mathieu Blondel. Loss Functions and Operators Generated by f-Divergences. ICML, 2025. Paper
M.S., Gabriel Peyré. Towards Understanding the Universality of Transformers for Next-Token Prediction. ICLR, 2025. Paper
M.S.. Deeper Learning: Residual Networks, Neural Differential Equations and Transformers, in Theory and Action. 2024. PhD Manuscript.
M.S., Raja Giryes, Taiji Suzuki, Mathieu Blondel, Gabriel Peyré. How do Transformers perform In-Context Autoregressive Learning?. ICML, 2024. Paper, GitHub
Pierre Marion, Yu-Han Wu, M.S., Gérard Biau. Implicit regularization of deep residual networks towards neural ODEs. ICLR, 2024 (Spotlight). Paper, GitHub
M.S., Tom Sander, Maxime Sylvestre. Unveiling the secrets of paintings: deep neural networks trained on high-resolution multispectral images for accurate attribution and authentication. Conference on Quality Control by Artificial Vision, 2023. Paper
M.S., Joan Puigcerver, Josip Djolonga, Gabriel Peyré, Mathieu Blondel. Fast, Differentiable and Sparse Top-k: a Convex Analysis Perspective. ICML, 2023. Paper, GitHub
M.S., Pierre Ablin, Gabriel Peyré. Do Residual Neural Networks discretize Neural Ordinary Differential Equations? NeurIPS, 2022. Paper, GitHub
Samy Jelassi, M.S., Yuanzhi Li. Vision Transformers provably learn spatial structure. NeurIPS, 2022. Paper
M.S., Pierre Ablin, Mathieu Blondel, Gabriel Peyré. Sinkformers: Transformers with Doubly Stochastic Attention. AISTATS, 2022. Paper, GitHub, short presentation
M.S., Pierre Ablin, Mathieu Blondel, Gabriel Peyré. Momentum Residual Neural Networks. ICML, 2021. Paper, GitHub, short presentation
