Leopoldo Sarra

Research

Foundation models, representation learning, formal methods, scientific discovery, and reinforcement learning.

Agentic AI & Formal Methods

Current work at Axiomatic AI / Head of AI Research

Building machine-checkable, reusable infrastructure for scientific reasoning through Lean4 libraries and AI-powered autoformalization, to bring to physics the same success formal methods recently had in mathematics.

Lean Together 2026 Talk

AxProverBase: A Minimal Agent for Automated Theorem Proving

A minimal agent for automated theorem proving in Lean4, combining LLM reasoning with formal verification to prove real-world mathematical theorems.

arXiv

SorryDB: A Real-World Benchmark for Theorem Provers

A dynamically-updating benchmark of theorem proving tasks sourced from real GitHub projects, measuring how well AI provers can complete real-world Lean theorems.

arXiv

Foundation Models for Science

Flatiron Institute / Flatiron Research Fellow

Training multimodal foundation models that learn shared representations across scientific data types (images, spectra, time series, and metadata) to enable cross-domain transfer and accelerate discovery.

Invited Talk @ CERN

AION-1

Omnimodal foundation model for astronomical sciences, processing 15+ data types including images, spectra, time series, and metadata.

NeurIPS 2025

AstroCLIP

Cross-modal foundation model for galaxies using contrastive learning to align galactic images and spectroscopic data into a shared embedding space.

MNRAS

Spectral Tokenization

Universal self-supervised representation learning for astronomical spectra, enabling transfer across instruments and wavelength ranges.

arXiv

Joint Embeddings go Temporal

Adapting Joint Embedding Predictive Architectures (JEPA) for scientific time series, learning temporal representations without contrastive pairs.

NeurIPS 2024 TSALM Workshop

Exploration in Reinforcement Learning

Google DeepMind / Research Scientist Intern

Novelty-based Exploration

A representation-based approach to long-horizon exploration that achieves state-of-the-art performance in hard sparse-reward environments like Montezuma's Revenge, using learned representations to drive novelty-seeking behavior over extended time horizons.

ICLR 2024 (Spotlight)

Artificial Scientific Discovery

Max Planck Institute for the Science of Light / Doctoral Researcher

Developing machine learning methods for automated scientific reasoning, from unsupervised feature extraction to optimal experimental design and program synthesis.

Workshop on Artificial Scientific Discovery

Renormalized Mutual Information

An unsupervised method for extracting physically meaningful features from complex systems. The resulting paper was selected as the cover for Physical Review Letters Vol. 126, Issue 20.

Phys. Rev. Lett. Video Code

Bayesian Experimental Design

Deep active learning approach for optimally designing experiments on quantum many-body systems, maximizing information gain per measurement.

Mach. Learn.: Sci. Technol. Code

Program Synthesis for Quantum Circuits

Using program synthesis techniques to automatically discover reusable quantum circuit components, decomposing complex unitaries into interpretable building blocks.

Mach. Learn.: Sci. Technol. Code

Reinforcement Learning with Learned Gadgets

An RL agent that automatically discovers reusable quantum circuit primitives (gadgets) and composes them to solve hard optimization problems on real quantum hardware.

Commun. Phys.

PhD Thesis: Artificial Scientific Discovery

Machine learning methods for automated scientific reasoning, combining information-theoretic approaches, active learning, and program synthesis to accelerate discovery in quantum physics.

Thesis

Statistical Physics, Biophysics and Quantum Information

Sapienza University / collaborations

Spin Glasses

Study of longitudinal fluctuations of the Sherrington-Kirkpatrick model, conducted under the supervision of Nobel laureate Giorgio Parisi at Sapienza University.

J. Stat. Mech.

Gene Expression Patterns

Maximum entropy models for patterns of gene expression, applying statistical physics methods to understand biological systems.

Phys. Rev. E

Device-Independent Quantum Tests

Experimental semi-device-independent tests of quantum channels, bridging quantum information theory with laboratory experiments.

Quantum Sci. Technol.

Research

Agentic AI & Formal Methods

AxProverBase: A Minimal Agent for Automated Theorem Proving

SorryDB: A Real-World Benchmark for Theorem Provers

Foundation Models for Science

AION-1

AstroCLIP

Spectral Tokenization

Joint Embeddings go Temporal

Exploration in Reinforcement Learning

Novelty-based Exploration

Artificial Scientific Discovery

Renormalized Mutual Information

Bayesian Experimental Design

Program Synthesis for Quantum Circuits

Reinforcement Learning with Learned Gadgets

PhD Thesis: Artificial Scientific Discovery

Statistical Physics, Biophysics and Quantum Information

Spin Glasses

Gene Expression Patterns

Device-Independent Quantum Tests

Software

Publications

Get in touch