Gaël Varoquaux

Sat 20 November 2010


ICA versus PCA in the scikit-learn: the value of code over pictures

When I was trying to get an intuitive feeling of the difference between Independent Component Analysis (ICA) and Principal Component Analysis (PCA), I wrote a few Python scripts producing some visualizations explaining the difference that have had a bit of success.

During the last sprint on scikit-learn, a machine learning toolkit in Python, we cleaned up the ICA code that I had been using, and we added it to the scikit, along with an example inspired from this earlier toy problem.

While the pictures are not as pretty as the initial ones I had done (because we wanted to keep the example as simple as possible), I am very happy that this discussion is know more than a set of static pictures, but comes with runnable code.

This illustrates very well my feelings on the future of scientific code and scientific research: paper, books, teaching materials, on numerical methods or computational science are greatly enhanced when they come with highly-readable code that illustrates their purpose, because the reader can start asking questions to the algorithm. Hopefully, the documentation of scientific programming toolkits will become the textbooks of tomorrow. We still have a lot of work to.

It’s funny, I just realized that my vision on software might have been strongly influenced by the fact that my mother, a high-school math teacher, spent endless nights when I was a teenager working on Geoplan, a software for teaching geometry by interaction with figures.

Go Top