Latest publications

Feed

News and thoughts

AI super-intelligent to play Go, and math?

Since 2017, an AI has been defeating the best Go experts, despite the game being particularly challenging. Such “super intelligence” is rare, but it could also emerge in fundamental mathematics.

Note

This post was originally published in French as part of my scientific chronicle …

AI for health: the impossible necessity of unbiased data

Is unbiased data important to build health AI? Yes!

Can there be unbiased data? No!

Building health on biased data discriminates

The notion of bias depends on the intended use.


In medicine, we have seen the importance of tuning devices and decisions for the …

2024 highlights: of computer science and society

Note

For me, 2024 was full of back and forth between research, software, and connecting these to society. Here, I lay out some highlights on AI and society, as well as research and software, around tabular AI and language models.

As 2025 starts, I’m looking back on 2024. It …

When AIs must overcome the data

Improving conversational artificial intelligences or simpler prediction engines involves overcoming biases, that is, going beyond the limits of data. But the notion of bias is subtle, as it depends on the goals.

Image generated with "ChatGPT", with the prompt "Please generate an image of a robot arm wrestling a figure made of numbers. This figure does not look like a robot, but more like a human, however it is made of numbers."

Note

This post was originally published in French as part of my …

Do AIs reason or recite?

Despite their apparent intelligence, conversational artificial intelligences often lack logic. The debate rages on: do they reason or do they recite snatches of text memorized on the Internet?

Image generated with "ChatGPT", with the prompt "Please generate an image of a robot with a stream of numbers coming out of his mouth. The robot is on the left, facing right, and the numbers flow, as if they were sound."

Note

This post was originally published in French as part of my scientific chronicle in Les …

CARTE: toward table foundation models

Note

Foundation models, pretrained and readily usable for many downstream tasks, have changed the way we process text, images, and sound. Can we achieve similar breakthroughs for tables? Here I explain why with “CARTE”, we’ve made significant headway.

Contents

  • Pre-training for data tables: hopes and challenges
    • Pre-training is a …

Skrub 0.2.0: tabular learning made easy

We just released skrub 0.2.0. This release markedly simplifies learning on complex dataframes.

model = tabular_learner(‘classifier’)

Simple, yet solid default baseline

The highlight of the release is the tabular_learner function, which facilitates creating pipelines that readily perform machine learning on dataframes, adding preprocessing to a scikit-learn compatible learner …

Promoting open-source, from inria to :probabl.

Note

Open-source efforts around scikit-learn at Inria are spinning off to a new enterprise, Probabl, in charge of sustainable development of a data-science commons.

Contents

  • Prelude: funding scikit-learn is hard
  • The birth of a new ambition
  • Probabl, a mission-driven enterprise
  • Probabl is already having an impact
  • My position within Probabl …

People underestimate how impactful Scikit-learn continues to be

Note

François Chollet rightfully said that people often underestimate the impact of scikit-learn. I give here a few illustrations to back his claim.

A few days ago, François Chollet (the creator of Keras, the library that that democratized deep learning) posted:

Tweet from François Chollet: "People underestimate how impactful scikit-learn continues to be"

Indeed, scikit-learn continues to be the most popular machine …

Comité de l’intelligence artificielle: vision et stratégie nationale

English summary

I have been appointed to the government-level panel of experts on AI, to set the national vision and strategy in France.


J’ai l’honneur d’être nommé au comité de l’intelligence artificielle du gouvernement Français.

La mission qui nous est confiée d’éclairer l’action publique …

2022, a new scientific adventure: machine learning for health and social sciences

A retrospective on last year (2022): I embarked on a new scientific adventure, assembling a team focused on developing machine learning for health and social science. The team has existed for almost a year, and the vision is nice shaping up. Let me share with you illustrations of where we …

My Mayavi story: discovering open source communities

The Mayavi Python software, and my personal history: A thread on Python and scipy ecosystems, building open source codebase, and meeting really cool and friendly people

I am writing today as a goodbye to the project: I used to be one of the core contributors and maintainers but have been …

2021 highlight: Decoding brain activity to new cognitive paradigms

Broad decoding models that can specialize to discriminate closely-related mental process with limited data

TL;DR

Decoding models can help isolating which mental processes are implied by the activation of given brain structures. But to support a broad conclusion, they must be trained on many studies, a difficult problem given …

Hiring an engineer and post-doc to simplify data science on dirty data

Note

Join us to work on reinventing data-science practices and tools to produce robust analysis with less data curation.

It is well known that data cleaning and preparation are a heavy burden to the data scientist.

Dirty data research

In the dirty data project, we have been conducting machine-learning research …

Hiring someone to develop scikit-learn community and industry partners

Note

With the growth of scikit-learn and the wider PyData ecosystem, we want to recruit in the Inria scikit-learn team for a new role. Departing from our usual focus on excellence in algorithms, statistics, or code, we want to add to the team someone with some technical understanding, but an …

2020: my scientific year in review

The year 2020 has undoubtedly been interesting: the covid19 pandemic stroke while I was on a work sabbatical in Montréal, at the MNI and the MILA, and it pushed further my interest in machine learning for health-care. My highlights this year revolve around basic and applied data-science for health.

Highlights …

Technical discussions are hard; a few tips

Note

This post discuss the difficulties of communicating while developing open-source projects and tries to gives some simple advice.

A large software project is above all a social exercise in which technical experts try to reach good decisions together, for instance on github pull requests. But communication is difficult, in …

Jean Dechoux, June 13rd 1923 – Feb 9th 2020

Jean Dechoux was born between the first and the second world wars, in a small French town, close to Germany. His family was that of poor farmers, who would work in coal mines to make up for the small size of their crops.

He grew to become a pulmonologist, heading …

Survey of machine-learning experimental methods at NeurIPS2019 and ICLR2020

Note

A simple survey asking authors of two leading machine-learning conferences a few quantitative questions on their experimental procedures.

How do machine-learning researchers run their empirical validation? In the context of a push for improved reproducibility and benchmarking, this question is important to develop new tools for model comparison. We …

2019: my scientific year in review

My current research spans wide: from brain sciences to core data science. My overall interest is to build methodology drawing insights from data for questions that have often been addressed qualitatively. If I can highlight a few publications from 2019 [1], the common thread would be computational statistics, from dirty …