Latest tweets

 

Latest posts

Comparing distributions: Kernels estimate good representations, l1 distances give good tests

Note

Given two set of observations, are they drawn from the same distribution? Our paper Comparing distributions: l1 geometry improves kernel two-sample testing at the NeurIPS 2019 conference revisits this classic statistical problem known as “two-sample testing”.

This post explains the context and the paper with a bit of hand …

Getting a big scientific prize for open-source software

Note

An important acknowledgement for a different view of doing science: open, collaborative, and more than a proof of concept.

A few days ago, Loïc Estève, Alexandre Gramfort, Olivier Grisel, Bertrand Thirion, and myself received the “Académie des Sciences Inria prize for transfer”, for our contributions to the scikit-learn project …

2018: my scientific year in review

From a scientific perspective, 2018 [1] was once again extremely exciting thank to awesome collaborators (at Inria, with DirtyData, and our local scikit-learn team). Rather than going over everything that we did in 2018, I would like to give a few highlights: We published major work using machine learning to …

A foundation for scikit-learn at Inria

We have just announced that a foundation will be supporting scikit-learn at Inria [1]: scikit-learn.fondation-inria.fr

Growth and sustainability

This is an exciting turn for us, because it enables us to receive private funding. As a result, we will be able to have secure employment for some existing core …

Sprint on scikit-learn, in Paris and Austin

Two weeks ago, we held a scikit-learn sprint in Austin and Paris. Here is a brief report, on progresses and challenges.

Several sprints

We actually held two sprint in Austin: one open sprint, at the scipy conference sprints, which was open to new contributors, and one core sprint, for more …

Our research in 2017: personal scientific highlights

In my opinion the scientific highlights of 2017 for my team were on multivariate predictive analysis for brain imaging: a brain decoder more efficient and faster than alternatives, improvement clinical predictions by predicting jointly multiple traits of subjects, decoding based on the raw time-series of brain activity, and a personnal …

Beyond computational reproducibility, let us aim for reusability

Note

Scientific progress calls for reproducing results. Due to limited resources, this is difficult even in computational sciences. Yet, reproducibility is only a means to an end. It is not enough by itself to enable new scientific results. Rather, new discoveries must build on reuse and modification of the state …

Scikit-learn Paris sprint 2017

Two week ago, we held in Paris a large international sprint on scikit-learn. It was incredibly productive and fun, as always. We are still busy merging in the work, but I think that know is a good time to try to summarize the sprint.

A massive workforce

We had a …

Our research in 2016: personal scientific highlights

Year 2016 has been productive for science in my team. Here are some personal highlights: bridging artificial intelligence tools to human cognition, markers of neuropsychiatric conditions from brain activity at rest, algorithmic speedups for matrix factorization on huge datasets…


Artificial-intelligence convolutional networks map well the human visual system

Eickenberg et …

Data science instrumenting social media for advertising is responsible for todays politics

To my friends developing data science for the social media, marketing, and advertising industries,

It is time to accept that we have our share of responsibility in the outcome of the US elections and the vote on Brexit. We are not creating the society that we would like. Facebook, Twitter …