Gaël Varoquaux: computer / data / health science

Stepping up as probabl’s CSO to supercharge scikit-learn and its ecosystem

Note

Probabl’s get together, in falls 2025

I’m thrilled to announce that I’m stepping up as Probabl’s CSO (Chief Science Officer) to supercharge scikit-learn and its ecosystem, pursuing my dreams of tools that help go from data to impact.

Scikit-learn, a central tool

Scikit-learn is central …

14 January 2026

2025 highlights: AI research and code

AI is everywhere. Can you see it here?

Note

Some highlights about my work in 2025: progress on tabular-learning stands out, a publication on unpacking trade-off and consequences of scale in AI, and of course progress on the open-source data-science and machine learning stack.

As 2026 starts, I’m looking …

02 January 2026

Maïc, you lived 100 years, what changed?

At Maïc’s 100th birthday, I asked her “you lived 100 years, what was the most important change for you?”. She mentioned “Internet”. I asked, why was the Internet important to her eyes? Because this is how she kept close contact with her loved ones, sharing travels or discussing everyday …

29 October 2025

A national recognition; but science and open source are bitter victories

I have recently been awarded France’s national order of merit, for my career, in science, in open source, and around AI.

The speech that I gave carries messages important to me (French below; it flows better).

Contents

Speech translated to English
Le texte d’origine, en Français

10 October 2025

TabICL: Pretraining the best tabular learner

Note

TabICL is a state-of-the-art tabular learner [Qu et al 2025]. The key is its very rich prior, that is baked in a pre-trained architecture -a table foundation model-, and leveraged by in-context-learning. Thanks to clever choices, it is fast and scalable, efficient even without a GPU.

Contents

Recent progress …

09 July 2025

AI agents that use tools

Modern AIs acquire new capabilities by combining tools to perform a complex task, controlling them like an agent. Unlike traditional programming, they define the sequences of actions themselves.

Note

This post was originally published in French as part of my scientific chronicle in Les Echos.

Modern AIs are increasingly using …

04 July 2025

AIs that break down questions reason better

The key to the most powerful conversational AIs is to reason by breaking down a complex task into simpler subproblems. Why is this crucial, and how does it work?

Note

This post was originally published in French as part of my scientific chronicle in Les Echos.

The recent release of …

20 June 2025

Science must drive the narratives that shape society

I would like to take a brief moment to reflect on what drives me as an academic.

Academia’s root are in creating knowledge and sharing it. We, academics, have a role to play in shaping society. In computer science, we sometimes focus on the creation of technology. Here, creation …

01 March 2025

AI super-intelligent to play Go, and math?

Since 2017, an AI has been defeating the best Go experts, despite the game being particularly challenging. Such “super intelligence” is rare, but it could also emerge in fundamental mathematics.

Note

This post was originally published in French as part of my scientific chronicle in Les Echos.

Imitation is not …

19 February 2025

AI for health: the impossible necessity of unbiased data

Is unbiased data important to build health AI? Yes!

Can there be unbiased data? No!

Building health on biased data discriminates

The notion of bias depends on the intended use.

In medicine, we have seen the importance of tuning devices and decisions for the target population. The problem is not …

13 February 2025

2024 highlights: of computer science and society

Note

For me, 2024 was full of back and forth between research, software, and connecting these to society. Here, I lay out some highlights on AI and society, as well as research and software, around tabular AI and language models.

As 2025 starts, I’m looking back on 2024. It …

01 January 2025

When AIs must overcome the data

Improving conversational artificial intelligences or simpler prediction engines involves overcoming biases, that is, going beyond the limits of data. But the notion of bias is subtle, as it depends on the goals.

Note

This post was originally published in French as part of my scientific chronicle in Les Echos.

In …

22 December 2024

Do AIs reason or recite?

Despite their apparent intelligence, conversational artificial intelligences often lack logic. The debate rages on: do they reason or do they recite snatches of text memorized on the Internet?

Note

This post was originally published in French as part of my scientific chronicle in Les Echos. I updated it with new …

19 October 2024

CARTE: toward table foundation models

Note

Foundation models, pretrained and readily usable for many downstream tasks, have changed the way we process text, images, and sound. Can we achieve similar breakthroughs for tables? Here I explain why with “CARTE”, we’ve made significant headway.

Contents

Pre-training for data tables: hopes and challenges
- Pre-training is a …

19 July 2024

Skrub 0.2.0: tabular learning made easy

We just released skrub 0.2.0. This release markedly simplifies learning on complex dataframes.

model = tabular_learner(‘classifier’)

Simple, yet solid default baseline

The highlight of the release is the tabular_learner function, which facilitates creating pipelines that readily perform machine learning on dataframes, adding preprocessing to a scikit-learn compatible learner …

03 July 2024

Promoting open-source, from inria to :probabl.

Note

Open-source efforts around scikit-learn at Inria are spinning off to a new enterprise, Probabl, in charge of sustainable development of a data-science commons.

Contents

Prelude: funding scikit-learn is hard
The birth of a new ambition
Probabl, a mission-driven enterprise
Probabl is already having an impact
My position within Probabl …

09 June 2024

People underestimate how impactful Scikit-learn continues to be

Note

François Chollet rightfully said that people often underestimate the impact of scikit-learn. I give here a few illustrations to back his claim.

A few days ago, François Chollet (the creator of Keras, the library that that democratized deep learning) posted:

Indeed, scikit-learn continues to be the most popular machine …

27 November 2023

Comité de l’intelligence artificielle: vision et stratégie nationale

English summary

I have been appointed to the government-level panel of experts on AI, to set the national vision and strategy in France.

J’ai l’honneur d’être nommé au comité de l’intelligence artificielle du gouvernement Français.

La mission qui nous est confiée d’éclairer l’action publique …

20 September 2023

2022, a new scientific adventure: machine learning for health and social sciences

A retrospective on last year (2022): I embarked on a new scientific adventure, assembling a team focused on developing machine learning for health and social science. The team has existed for almost a year, and the vision is nice shaping up. Let me share with you illustrations of where we …

31 January 2023

My Mayavi story: discovering open source communities

The Mayavi Python software, and my personal history: A thread on Python and scipy ecosystems, building open source codebase, and meeting really cool and friendly people

I am writing today as a goodbye to the project: I used to be one of the core contributors and maintainers but have been …

10 July 2022

Gaël Varoquaux

Latest publications

Feed

News and thoughts

Stepping up as probabl’s CSO to supercharge scikit-learn and its ecosystem

Scikit-learn, a central tool

2025 highlights: AI research and code

Maïc, you lived 100 years, what changed?

A national recognition; but science and open source are bitter victories

TabICL: Pretraining the best tabular learner

AI agents that use tools

AIs that break down questions reason better

Science must drive the narratives that shape society

AI super-intelligent to play Go, and math?

Imitation is not …

AI for health: the impossible necessity of unbiased data

2024 highlights: of computer science and society

When AIs must overcome the data

Do AIs reason or recite?

CARTE: toward table foundation models

Skrub 0.2.0: tabular learning made easy

model = tabular_learner(‘classifier’)

Promoting open-source, from inria to :probabl.

People underestimate how impactful Scikit-learn continues to be

Comité de l’intelligence artificielle: vision et stratégie nationale

2022, a new scientific adventure: machine learning for health and social sciences

My Mayavi story: discovering open source communities