Gaël Varoquaux

Sun 01 December 2019


Getting a big scientific prize for open-source software


An important acknowledgement for a different view of doing science: open, collaborative, and more than a proof of concept.

A few days ago, Loïc Estève, Alexandre Gramfort, Olivier Grisel, Bertrand Thirion, and myself received the “Académie des Sciences Inria prize for transfer”, for our contributions to the scikit-learn project. To put things simply, it’s quite a big deal to me, because I feel that it illustrates a change of culture in academia.

Recognizing an open view of scientific contributions

It is a great honor, because the selection was made by the members of the Académie des Sciences, very accomplished scientists with impressive contributions to science. The “Académie” is the hallmark of fundamental academic science in France. To me, this prize is also symbolic because it recognizes an open view of academic research and transfer, a view that sometimes felt as not playing according to the incentives. We started scikit-learn as a crazy endeavor, a bit of a hippy science thing. People didn’t really take us seriously. We were working on software, and not publications. We were doing open source, while industrial transfer is made by creating startups or filing patents. We were doing Python, while academic machine learning was then done in Matlab, and industrial transfer in C++. We were not pursuing the latest publications, while these are thought to be research’s best assets. We were interested in reaching out to non experts, while partners considered as interesting have qualified staff.

Quality and openness, at the cost of quantity and control

No. We did it different. We reached out to an open community. We did BSD-licensed code. We worked to achieve quality at the cost of quantity. We cared about installation issues, on-boarding biologists or medical doctors, playing well with the wider scientific Python ecosystem. We gave decision power to people outside of Inria, sometimes whom we had never met in real life. We made sure that Inria was never the sole actor, the sole stake-holder. We never pushed our own scientific publications in the project. We limited complexity, trading off performance for ease of use, ease of installation, ease of understanding.

As a consequence, we slowly but surely assembled a large community. In such a community, the sum is greater than the parts. The breadth of interlocutors and cultures slows movement down, but creates better results, because these results are understandable to many and usable on a diversity of problems. The consequence of this quality is that we were progressively used in more and more places: industrial data-science labs, startups, research in applied or fundamental statistical learning, teaching. Ironically, the institutional world did not notice. It got hard, next to impossible, to get funding [1]. A few years ago, I was told by a central governmental agency that we, open-source zealots, were destroying an incredible amount of value by giving away for free the production of research [2]. The French report on AI, lead by a Fields medal, cited tensorflow and theano –a discontinued software–, but ignored scikit-learn; maybe because we were doing “boring science”?

But, scikit-learn’s amazing community continued plowing forward. We grew so much that we were heard from the top. The prize from the Académie shows that we managed to capture the attention of senior scientists with open-source software, because this software is really having a worldwide impact in many disciplines.

Presenting scikit-learn at the Academie Des Sciences

An accomplishment of the community

There were only five of us on stage, as the prize is for Inria permanent staff. But this is of course not a fair account of how the project has grown and what made it successful.

In 2011, at the first international sprint, I felt something was happening: Incredible people whom I had never met before were sitting next to me, working very hard on solving problems with me. This experience of being united to solve difficult problems is something amazing. And I deeply thank every single person who has worked on this project, the 1500 contributors, many of those that I have never met, in particular the core team who is committed to making sure that every detail of scikit-learn is solid and serves the users. The team that has assembled over the years is of incredible quality.

The promises of data science need open source

The world does not understand how much the promises of data science, for today and tomorrow, need open source projects, easy to install and to use by everybody. These projects are like roads and bridges: they are needed for growth thought no one wants to pay for maintaining them. I hope that I can use the podium that the prize will give us to stress the importance of the battle that we are fighting.

[1]Getting funding from the government implied too much politics and risks. For these reasons, I turned to private donors, in a foundation.
[2]Inria always supported us, and often paid developers in my team out of its own pockets.

PS: As an another illustration of the culture change toward openness in science, it was announced during the ceremony that the “Compte Rendu de l’Académie des Sciences” is becoming open access, without publication charges!

Go Top