Gaël Varoquaux: computer / data / health science

Technical discussions are hard; a few tips

Note

This post discuss the difficulties of communicating while developing open-source projects and tries to gives some simple advice.

A large software project is above all a social exercise in which technical experts try to reach good decisions together, for instance on github pull requests. But communication is difficult, in …

28 May 2020

Jean Dechoux, June 13rd 1923 – Feb 9th 2020

Jean Dechoux was born between the first and the second world wars, in a small French town, close to Germany. His family was that of poor farmers, who would work in coal mines to make up for the small size of their crops.

He grew to become a pulmonologist, heading …

16 February 2020

Survey of machine-learning experimental methods at NeurIPS2019 and ICLR2020

Note

A simple survey asking authors of two leading machine-learning conferences a few quantitative questions on their experimental procedures.

How do machine-learning researchers run their empirical validation? In the context of a push for improved reproducibility and benchmarking, this question is important to develop new tools for model comparison. We …

22 January 2020

2019: my scientific year in review

My current research spans wide: from brain sciences to core data science. My overall interest is to build methodology drawing insights from data for questions that have often been addressed qualitatively. If I can highlight a few publications from 2019 [1], the common thread would be computational statistics, from dirty …

05 January 2020

Comparing distributions: Kernels estimate good representations, l1 distances give good tests

Note

Given two set of observations, are they drawn from the same distribution? Our paper Comparing distributions: l1 geometry improves kernel two-sample testing at the NeurIPS 2019 conference revisits this classic statistical problem known as “two-sample testing”.

This post explains the context and the paper with a bit of hand …

08 December 2019

Getting a big scientific prize for open-source software

Note

An important acknowledgement for a different view of doing science: open, collaborative, and more than a proof of concept.

A few days ago, Loïc Estève, Alexandre Gramfort, Olivier Grisel, Bertrand Thirion, and myself received the “Académie des Sciences Inria prize for transfer”, for our contributions to the scikit-learn project …

01 December 2019

2018: my scientific year in review

From a scientific perspective, 2018 [1] was once again extremely exciting thank to awesome collaborators (at Inria, with DirtyData, and our local scikit-learn team). Rather than going over everything that we did in 2018, I would like to give a few highlights: We published major work using machine learning to …

03 January 2019

A foundation for scikit-learn at Inria

We have just announced that a foundation will be supporting scikit-learn at Inria [1]: scikit-learn.fondation-inria.fr

Growth and sustainability

This is an exciting turn for us, because it enables us to receive private funding. As a result, we will be able to have secure employment for some existing core …

17 September 2018

Sprint on scikit-learn, in Paris and Austin

Two weeks ago, we held a scikit-learn sprint in Austin and Paris. Here is a brief report, on progresses and challenges.

Several sprints

We actually held two sprint in Austin: one open sprint, at the scipy conference sprints, which was open to new contributors, and one core sprint, for more …

01 August 2018

Our research in 2017: personal scientific highlights

In my opinion the scientific highlights of 2017 for my team were on multivariate predictive analysis for brain imaging: a brain decoder more efficient and faster than alternatives, improvement clinical predictions by predicting jointly multiple traits of subjects, decoding based on the raw time-series of brain activity, and a personnal …

31 December 2017

Beyond computational reproducibility, let us aim for reusability

Note

Scientific progress calls for reproducing results. Due to limited resources, this is difficult even in computational sciences. Yet, reproducibility is only a means to an end. It is not enough by itself to enable new scientific results. Rather, new discoveries must build on reuse and modification of the state …

19 September 2017

Scikit-learn Paris sprint 2017

Two week ago, we held in Paris a large international sprint on scikit-learn. It was incredibly productive and fun, as always. We are still busy merging in the work, but I think that know is a good time to try to summarize the sprint.

A massive workforce

We had a …

23 June 2017

Our research in 2016: personal scientific highlights

Year 2016 has been productive for science in my team. Here are some personal highlights: bridging artificial intelligence tools to human cognition, markers of neuropsychiatric conditions from brain activity at rest, algorithmic speedups for matrix factorization on huge datasets…

Artificial-intelligence convolutional networks map well the human visual system

Eickenberg et …

31 December 2016

Data science instrumenting social media for advertising is responsible for todays politics

To my friends developing data science for the social media, marketing, and advertising industries,

It is time to accept that we have our share of responsibility in the outcome of the US elections and the vote on Brexit. We are not creating the society that we would like. Facebook, Twitter …

11 November 2016

Unison 2.48 binaries for ARM

I have built static binaries of Unision 2.48 for ARM

23 July 2016

Better Python compressed persistence in joblib

New persistence in joblib enables low-overhead storage of big data contained in arbitrary objects

20 May 2016

Of software and Science. Reproducible science: what, why, and how

At MLOSS 15 we brainstormed on reproducible science, discussing why we care about software in computer science. Here is a summary blending notes from the discussions with my opinion.

“Without engineering, science is not more than philosophy” — the community

How do we enable better Science? Why do we do software …

16 December 2015

Nilearn 0.2: more powerful machine learning for neuroimaging

After 6 months of efforts, We just released version 0.2 of nilearn, dedicated to making machine learning in neuroimaging easier and more powerful.

This release integrates the features of the july sprint, and more.

Highlights

Better documentation …

13 December 2015

Job offer: data crunching brain functional connectivity for biomarkers

My research group is looking to fill a post-doc position on learning biomarkers from functional connectivity.

Scientific context

The challenge is to use resting-state fMRI at the level of a population to understand how intrinsic functional connectivity captures pathologies and other cognitive phenotypes. Rest fMRI is a promising tool for …

08 December 2015

MLOSS 2015: wising up to building open-source machine learning

Note

The 2015 edition of the machine learning open source software (MLOSS) workshop was full of very mature discussions that I strive to report here.

I give links to the videos. Some machine-learning researchers have great thoughts about growing communities of coders, about code as a process and a deliverable …

28 November 2015

News and thoughts – Page 2