Gaël Varoquaux

Mon 17 September 2018

←Home

A foundation for scikit-learn at Inria

We have just announced that a foundation will be supporting scikit-learn at Inria [1]: scikit-learn.fondation-inria.fr

Growth and sustainability

This is an exciting turn for us, because it enables us to receive private funding. As a result, we will be able to have secure employment for some existing core contributors, and to hire more people on the team. The goal is to help sustaining quality (more frequent releases?) and to tackle some ambitious features.

A foundation? What and why?

Open source lives and thrives by its base, the community of developers. And scikit-learn is a fantastic example of these dynamics. Because of its grass-root origins, it has focused on features that matter for the small and the many, such as ease of use and statistical models that work well in data-poor situations. Over the years, decisions have been based on their technical merit, rather than the importance of displaying a list of features that are trendy. A consequence of the breadth of contributors with different backgrounds is the library tends to be well-suited for many applications, including some models that are less mainstream.

People with dedicated time to support the community

That said, over time this is an increasing need for a core team of maintainers. As the library gets bigger, is it more and more difficult to have a full view of what is happening. Integration of new features, quality assurances, and releases are best done by developers who can dedicate a large amount of time to the library. Also, ambitious changes to the library, such as improving the parallel computing engine, need long efforts. For many years, we have always had people with dedicated time to support the community. In France, we were going through hoops to find public money to found them. As someone who has done this effort, I can tell you that is a complicated one [2].

The ability to receive money from sponsors will enable us to scale up our operations. I was initially worried that we would have difficulties finding partners that accepted to give us money without asking for control on the project. However, I was proven wrong, and we have found a small set of great partners.

What will people work on? How will decisions be made?

It can be a difficult exercise to balance how money is used in a community-driven project. The project should not loose its drive where the community of developers is important. Interests of the sponsors should not prime over interests of the user base.

We will make sure that the money that the foundation receives is invested for the interest of the community. We have a technical committee that supervises the activity of the foundation. Its decisions will be informed by the community [3]. For this, we have an advisory board composed of core contributors of scikit-learn. Beside the advisory board, the technical committee also comprises a delegate from each sponsor. I am excited about the input that our partners will provide us on the priorities for them, as they represent various industries. Voting power will be spread so that sponsors and community have the same voting power.

Why not an existing foundation such as NumFOCUS, or the PSF?

There are several reasons why we choose this particular legal vessel. Our endeavor is slightly from the prominent foundations in our ecosystem, NumFocus and the PSF (Python Software Foundation).

The first important aspect is that we want to employ full-time developers. Different countries have very different legal frameworks, and it is really hard to transfer money overseas in a non profit. Physical assets like employing people or owning real estate is even harder. We needed something in France. And there might be a need for something else in another country at some point.

Another reason to be embedded in the Inria foundation is that it is giving us a really good deal. We basically get legal advice, accounting, office space, and IT support, for an 8% overhead. This is an excellent deal and is part of the sponsoring efforts that Inria will keep doing.

Last, we feel that a foundation targeting specifically scikit-learn can raise money from different people than other foundations. I think that there is value having multiple foundations seeking money for open-source software. Indeed, a foundation builds a case and an image, to convince donors. Different donors require a different case and a different image. For instance the president of NumFOCUS argues for a name less focused on numerics. Yet, too wide of a scope can dilute the image.

We have in mind to make it easy for other foundations to support scikit-learn. We have majors contributors at leading institutions, such as Andreas Mueller at Columbia or Joel Nothman at Sydney university. It is important that these institutions can easily gather donations too, in the legal framework suited to their country. Hence the name reflects that the foundation is embedded at Inria, leaving room for other initiatives.

What’s the scope?

The scope of our work is everything scikit-learn related. It is not the whole pydata or scipy ecosystem: it is focused on scikit-learn. But we will not hesitate contribute fixes and enhancements to neighboring projects, like in the past, even all the way up to core Python [4].


I’m am very excited. A strong team of full-time contributors will allow us to do ambitious things with scikit-learn.

Join us

We will be recruiting! See our positions. Come work with us in Paris.

I want to end by thanking the amazing men and women who have been contributing to scikit-learn, and are with us in this fantastic adventure! The energy that is in this project is incredible. We are are launching this effort thank to you, and to empower you even more.


[1]I am quite proud that over the years, my group has employed Olivier Grisel, Joris van den Bossche (working on pandas in addition to scikit-learn), Guillaume Lemaître (working on imbalanced-learn in addition to scikit-learn), Jérémie du Boisberranger, Tom Moreau, Loic Estève, Fabian Pedregosa, to name only a few. All these people, and the many others students that we have payed part time to work on software, have had an structuring impact on our ecosystem, going beyond the bounds of scikit-learn and touching many aspects of computing in Python. However, because of the constraints of research funding in France, public money forced my to hire them with short-term contracts.
[2]Technically, it is a tax-deductible scikit-learn consortium inside the Inria foundation, which is an non-profit entity related to Inria.
[3]Details on the goverance of the foundation can be found at https://scikit-learn.fondation-inria.fr/en/mission-and-governance
[4]For instance Olivier and Tom have been making parallelism more robust in Python 3.7 (amongst various issues https://bugs.python.org/issue33056 and https://bugs.python.org/issue31699). Olivier helped defining the new pickling protocol, crucial to efficient persistence. This is hard work. Yet it is important, because it benefits all libraries.
Go Top