10 Jan

Book review: NumPy 1.5 Beginner’s guide

Packt publishing sent me a copy of NumPy 1.5 Beginner’s guide by Ivan Idris.

The book actually covers more than only numpy: it is a full introduction to numerical computing with Python. The table of contents is the following:

  • NumPy Quick Start
  • Beginning with NumPy Fundamentals
  • Get into Terms with Commonly Used Functions
  • Convenience Functions for Your Convenience
  • Working with Matrices and ufuncs
  • Move Further with NumPy Modules
  • Peeking Into Special Routines
  • Assure Quality with Testing
  • Plotting with Matplotlib
  • When NumPy is Not Enough: SciPy and Beyond

The book is easy to read, as it requires no specific expertise other than knowing basic Python programming. It is full of examples and exercises, which is really great for learning. I find the style of the author, Ivan Idris, particularly amusing and relaxing, engaging the reader with questions, challenges, or even jokes (“Have a go hero”).

With regards to the formatting and the print, the book is written in large fonts, with sectioning information, tips and exercises clearly standing out.

It is full of practical information, such as how to install the software, or where to get help. Finally, One thing that I appreciated, is that the examples are typed in IPython. Each time I teach, I like to use IPython, because it is full of features to help plotting, debugging and profiling numerical code. The book even has a little introduction to some useful IPython features.

After an introduction to the work flow, the book explores array manipulation such as creation or reshaping, followed by some simple numerics and the battery of array-based operations on functions and polynomials. Then it presents linear algebra and signal processing basics (FFT). It also covers the financial functions that are present in numpy and mentions testing, which is very important to achieve quality code. The book finishes with matplotlib and scipy, two modules that are important to know to go further.

The examples are mostly drawn from statistics or financial applications, such as computing running averages on stock quotes. Basic math explanations, such as the definition of the Moore-Penrose pseudo-inverse, are given when needed.

To conclude, I enjoyed this book and I think that it is a nice addition to my library. It answers exactly it’s title: it is well-suited for beginners wanting to learn numpy. On the other hand, I would not recommend it as a reference material, or as a book to learn more general scientific or numerical computing with Python.

07 Jan

Joblib beta release: fast compressed persistence + Python 3

Joblib 0.6: better I/O and Python 3 support

Happy new year, every one. I have just released Joblib 0.6.0 beta. The highlights of the 0.6 release are a reworked enhanced pickler, and Python 3 support.

Many thanks go to the contributors to the 0.5.X series (Fabian Pedregosa, Yaroslav Halchenko, Kenneth C. Arnold, Alexandre Gramfort, Lars Buitinck, Bala Subrahmanyam Varanasi, Olivier Grisel, Ralf Gommers, Juan Manuel Caicedo Carvajal, and myself). In particular Fabian made sure that Joblib worked under Python 3.

In this blog post, I’d like to discuss a bit more the compressed persistence engine, as it illustrates well key factors in implementing and using compressed serialization.

Fast compressed persistence

One of the key components of joblib is it’s ability to persist arbitrary Python objects, and read them back very quickly. It is particularly efficient for containers that do their heavy lifting with numpy arrays. The trick to achieving great speed has been to save in separate files the numpy arrays, and load them via memmapping.

However, one drawback of joblib, is that the caching mechanism may end up using a lot of disk space. As a result, there is strong interest in having compressed storage, provided it doesn’t slow down the library too much. Another use case that I have in mind for fast compressed persistence, is implementing out of core computation.

There are some great compressed I/O libraries for Python, for instance Pytables. You may wonder why the need to code yet another one. The answer is that joblib is pure Python, depending only on the standard library (numpy is optional), but also that the goal here is black-box persistence of arbitrary objects.

Comparing I/O speed and compression to other libraries

Implementing efficient compressed storage was a bit of a struggle and I learned a lot. Rather than going into the details straight away, let me first discuss a few benchmarks of the resulting code. Benching such feature is very hard, first because you are fighting with the disk cache, second because they performances depends very much on the data at hand (some data compress better than others), last because they are three interesting metrics: disk space used, write speed, and read speed.

Dataset used - I chose to compare the different strategies on some datasets that I work with, namely the probabilistic brain atlases MNI 1mm (62Mb uncompressed) and Juelich 2mm (105Mb uncompressed). Whether the data is represented as a Fortran-ordered array, or a C-ordered array is important for the I/O performance. This data is normally stored to disk compressed using the domain-specific Nifti format (.nii files), accessed in Python with the Nibabel library.

Libraries used - I benched different compression strategies in joblib against Nibabel’s Nifti I/O, compressed or not, and against using Pytables to store the data buffer (without the meta-informations). Pytables exposed a variety of compression strategies, with different speed compromises. In addition, I benched numpy’s builtin save_compressed.

I would like to stress that I am comparing a general purpose persistence engine (joblib) to specific I/O libraries either optimized for the data (Nifti), or requiring some massaging to enable persistence (pytables).






Comparing to other libraries

Actual numbers can be found here.

Take home messages - The graphs are not crystal-clear, but a few tendencies appear:

  • Pytables with LZO or blosc compression is the king of the hill for read and write speed.
  • I/O of compressed data is often faster than with uncompressed data for a good compression algorithm.
  • Joblib with Zlib compression level 1 performs honorably in terms of speed with only the Python standard library and no compiled code.
  • Read time of memmapping (with nibabel or joblib) is negligeable (it is tiny on the graphs), however the loading time appears when you start accessing the data.
  • Passing in arrays with a memory layout (Fortran versus C order) that the I/O library doesn’t expect can really slow down writing.
  • Compressing with Zlib compression-level 1 gets you most of the disk space gains for a reasonable cost in write/read speed.
  • Compressing with Zlib compression-level 9 (not shown on the figures) doesn’t buy you much in disk space, but costs a lot in writing time.

Benching datasets richer than pure arrays

The datasets used so far are pretty much composed of one big array, a 4D smooth spatial map. I wanted to test on more datasets, to see how the performances varied with data type and richness. For this, I used the datasets of the scikit-learn, real life data of various nature, described here:

  • 20 news - 20 usenet news group: this data mainly consists of text, and not numpy arrays.
  • LFW people - Labeled faces in the wild, many pictures of different people’s face.
  • LFW pairs - Labeled faces in the wild, pairs of pictures for each individual. This is a high entropy dataset, it does not have much redundant information.
  • Olivetti - Olivetti dataset: centered pictures of faces.
  • Juelich(F) - Our previous Juelich atlas
  • Big people - The LFW people dataset, but repeated 4 times, to put a strain on memory resources.
  • MNI(F) - Our previous MNI atlas
  • Species - Occurence of species measured in latin America, with a lot of missing data.

Testing compression strategies on various datasets

Actual numbers can be found here.

What this tells us - The main message from these benchmarks is that datasets with redundant information, i.e. that compress well, give fast I/O. This is not surprising. In particular, good compression can give good I/O on text (20 news). Another result, more of a sanity check, is that compressed I/O on big data (Big people, ) works as well as on smaller data. Earlier code would start to swap. Finally, I conclude from these graphs, that compression levels from 1 to 3 buy you most of the gains for reasonable costs, and that going up to 9 is not recommended, unless you know that your data can be compressed a lot (species).

Lessons learned

I’ll keep this paragraph short, because the information is really in joblib’s code and comments. Don’t hesitate to have a look, it’s BSD-licenced, so you are free to borrow what you please.

  1. Memory copies, of arrays, but also of strings and byte streams can really slow you down with big data.
  2. To avoid copies with numpy arrays, fully embrace numpy’s strided memory model. For instance, you do not need to save arrays in C order, if they are given to you in a different order. Accessing the memory in the wrong striding direction explains the poor write performance of pytables on Fortran-ordered Juelich.
  3. When dealing with the file system, the OS makes so much magic (e.g. prefetching) that clever hacks tend not to work: always benchmark.
  4. Depending on the size of the data, it may be more efficient to store subsets in different files: it introduces ‘chunk’ that avoid filling in the memory too much (parameter cache_size in joblib’s code). In addition, data of a same nature tends to compress better.
  5. The I/O stream or file object interfaces are abstractions that can hide the data movement and the creation of large temporaries. After experiments with GZipFile and StringIO/BytesIO I found it more efficient to fall back to passing around big buffer object, numpy arrays, or strings.
  6. For reasons 4 and 5, I ended up avoiding the gzip module: raw access to the zlib with buffers gives more control. This explains a good part of the differences in read speed for pure arrays with numpy’s save_compressed.

One of my conclusions for joblib, is that I’ll probably use Pytables as an optional backend for persistence in a future release.

Details on the benchmarks

These benchmarks where run on a Dell Lattitude D630 laptop. That’s a dual-core Intel Core2 Duo box, with 2M of CPU cache.

The code for the benchmarks below can be found on a gist.

Thanks

I’d like to that Francesc Alted for very useful feedback he gave on this topics. In particular, the following thread on the pytables mailing-list may be of interest to the reader.

18 Nov

Scikit-learn NIPS 2011 sprint: international thanks to our sponsors

The NIPS conference: time for a sprint. The NIPS conference, one of the major conferences in machine learning, is hosted in Granada this year. I believe that it is the first time that it is hosted in Europe. As many of the scikit-learn developers are part of the wider NIPS community, but also many live in Europe, we jumped on the occasion to organize a truly international sprint: the NIPS 2011 scikit-learn sprint.

Finding money. As often with open source development, a lot of our contributors are young people, investing their free time outside of any request from their hierarchy. In such a situation, it can be hard to find travel money. So we started looking for sponsors. We needed to find a decent sum of money, as we were flying people in from places such as the West coast of the US, or even Japan. The good news is that we found money, and between supervisors pitching in, universities giving travel grants, and our generous sponsors, there will be an impressive list of contributors from all over the world at the sprint.

Thanks to our sponsors. The first people that we need to thank are Google, who gave us a sizable sponsorship, and the PSF, who made Google’s sponsorship possible through their accounting and sprints programs. We also need to thanks our other sponsors, namely Tinyclues. Thanks to these sponsors, and additional investment from many universities and research group, we have been able to gather a total of 12 contributors in Granada, a handful coming from overseas. Also, we are indebted to the University of Granada, and the Gnu/Linux Granada Group (GGG), who are providing hosting for the sprint, as well as Régine Bricquet, from INRIA, who did a lot of the trip planing for the sponsored people.

I am very much looking forward to the sprint. It will be the first time that meet in real life many of the contributors, and judging by the warmness of the on-line exchanges, it will be a great moment. Besides, Granada is known to be a lively and historical city.

If you are around and want to join us, to work on Python in machine learning, send us a mail on the mailing list.

28 Sep

Cython example of exposing C-computed arrays in Python without data copies

Colleagues who are exposing a numerical C code in Python asked me for some advice on the best way to pass arrays from C to Python avoiding copies. They had Cython in mind, and I must agree with them that I have found the Cython code to be more maintainable than hand-written Python C-API code.

When writing my answer, I found out that there was no self-contained example of creating numpy arrays from existing data in Cython. Thus I created my own. The full code with readme build and demo scripts is available on a gist. Here I only give an executive summary.

The core functionality is implemented by the PyArray_SimpleNewFromData function of the C API of numpy that can create an ndarray from a pointer to the data, a simple data type, and the shape of the data. The Cython file just builds around that function:

11 Sep

Python at scientific conferences

Top notch scientific conferences are starting to add Python tracks to their program. This is good news. Indeed, it scientific Python conferences (namely Scipy, EuroSciPy and Scipy India) are doing great to get together people who have already heard about Python for science, but we need to reach out to specific Python communities to maximize impact.

ESCO 2012 - European Seminar on Coupled Problems

ESCO 2012 is the 3rd event in a series of interdisciplineary meetings dedicated to computational science challenges in multi-physics and PDEs.

I was invited as ESCO last year. It was an aboslute pleasure, because it is a small conference that is very focused on discussions. I learned a lot and could sit down with people who code top notch PDE libraries such as FEniCS and have technical discussions. Besides, it is hosted in the historical brewery where the Pilsner was invented. Plenty of great beer.

Application areas Theoretical results as well as applications are welcome. Application areas include, but are not limited to: Computational electromagnetics, Civil engineering, Nuclear engineering, Mechanical engineering, Computational fluid dynamics, Computational geophysics, Geomechanics and rock mechanics, Computational hydrology, Subsurface modeling, Biomechanics, Computational chemistry, Climate and weather modeling, Wave propagation, Acoustics, Stochastic differential equations, and Uncertainty quantification.

Minisymposia

  • Multiphysics and Multiscale Problems in Civil Engineering
  • Modern Numerical Methods for ODE
  • Porous Media Hydrodynamics
  • Nuclear Fuel Recycling Simulations
  • Adaptive Methods for Eigenproblems
  • Discontinuous Galerkin Methods for Electromagnetics
  • Undergraduate Projects in Technical Computing

Software afternoon Important part of each ESCO conference is a software afternoon featuring software projects by participants. Presented can be any computational software that has reached certain level of maturity, i.e., it is used outside of the author’s institution, and it has a web page and a user documentation. If you would like to present your software project, let us know soon.

Proceedings For each ESCO we strive to reserve a special issue of an international journal with impact factor. Proceedings of ESCO 2008 appeared in Math. Comput. Simul., proceedings of ESCO 2010 in CiCP and Appl. Math. Comput. Proceedings of ESCO 2012 will appear in Computing.

Important Dates

  • December 15, 2011: Abstract submission deadline.
  • December 15, 2011: Minisymposia proposals.
  • January 15, 2012: Notification of acceptance.

PyHPC: Python for High performance computing

If you are doing super computing, SC11, the Super Computing conference is the reference conference. This year there will a workshop on high performance computing with Python: PyHPC.

At the scipy conference, I was having a discussion with some of the attendees on how people often still do process management and I/O with Fortran in the big computing environment. This is counter productive. However, has success stories of supercomputing folks using high-level languages are not advertized, this is bound to stay. Come and tell us how you use Python for high performance computing!

Topics

  • Python-based scientific applications and libraries
  • High performance computing
  • Parallel Python-based programming languages
  • Scientific visualization
  • Scientific computing education
  • Python performance and language issues
  • Problem solving environments with Python
  • Performance analysis tools for Python application

Papers We invite you to submit a paper of up to 10 pages via the submission site. Authors are encouraged to use IEEE two column format.

Important Dates

  • Full paper submission: September 19, 2011
  • Notification of acceptance: October 7, 2011
  • Camera-ready papers: October 31, 2011
05 Sep

Conference posters

At the request of a friend, I am putting up some of the posters that I recently presented at conferences.


Large-scale functional-connectivity graphical models for individual subjects using population prior.
This is a poster for our NIPS work

Multi-subject dictionary learning to segment an atlas of brain spontaneous activity.
This is a poster for our IPMI work

Mayavi for 3D visualization of neuroimaging data: powerful scripting and reusable components in Python.

Machine learning for fMRI in Python: inverse inference with scikit-learn.
03 Sep

Hiring a junior developer on the scikit-learn

Once again, we are looking for a junior developer to work on the scikit-learn. Below is the official job posting. As a personal remark, I would like to stress that this is a unique opportunity to be payed for two years to work on learning and improving the scientific Python toolstack.


Job Description

INRIA is looking to hire a young graduate on a 2-year position to help with the community-driven development of the open source machine learning in Python library, scikit-learn. The scikit-learn is one of the majormajor machine-learning libraries in Python. It aims to be state-of-the-art on mid-size to large datasets by harnessing the power of the scientific Python toolstack.

Speaking French is not a requirement, as it is an international team.

Requirements

  • Programming skills in Python and C/C++
  • Understanding of quality assurance in software development: test-driven programming, version control, technical documentation.
  • Some knowledge of Linux/Unix
  • Software design skills
  • Knowledge of open-source development and community-driven environments
  • Good technical English level
  • An experience in statistical learning or a mathematical-oriented mindset is a plus
  • We can only hire a young-graduate that has received a masters or equivalent degree at most a year ago.

About INRIA

INRIA is the French computer science research institute. It recognized word-wide as one of the leading research institutions and has a strong expertise in machine learning. You will be working in the Parietal team that makes a heavy use of Python for brain imaging analysis.

Parietal is a small research team (around 10 people) with an excellent
technical knowledge of scientific and numerical computing in Python asas
well as a fine understanding of algorithmic issues in machine learning and statistics. Parietal is committed to investing in scikit-learn.

Working at Parietal is a unique opportunity to improve your skills in machine learning and numerical computing in Python. In addition, working full time on the scikit-learn, a very active open-source project, will give you premium experience of open source community management and collaborative project development.

Contact Info:

  • Technical Contact: Bertand Thirion
  • E-mail contact: bertrand dotnospam thirion atnospam inria dotnospam fr
  • HR Contact: Marie Domingues
  • E-mail Contact: marie dotnospam domingues atnospam inria dotnospam fr
  • No telecommuting
23 Jul

My conference travels: Scipy 2011 and HBM 2011

The Scipy 2011 conference in Austin

Last week, I was at the Scipy conference in Austin. It was really great to see old friends, and Austin is such a nice place.

The Scipy conference was held in UT Austin’s conference center, which is a fantastic venue. This is the first geek’s conference I have been at where the wireless network worked flawlessly with a good bandwidth, even thought 200 geeks were pounding on it. As a tutorial presenter, this was incredibly useful.

Conference highlight

Here is a short list of what I felt were the big trends and highlights of the conference. This is obviously biased by my own interests. I am not listing parallel computing, as it is clearly an important area of progress and debates, but it has been the case for the last few years.

Eric Jone’s keynote

Of course Eric’s keynote was excellent. Eric is a great speaker and always has good insights on how to run a team and a project. This year he shared (some) of his tricks in making Enthought deliver on software projects: “What Matters in Scientific Software Projects? 10 Years of Success and Failure Distilled”. The video is not yet online, unfortunately. Grab it when you can.

Hilary Mason’s keynote

Hilary is an applied data geek, just what I like! She gave an interesting keynote on how bitly (an URL-shortening startup, for those living under a rock) mines the requests on the URLs that the serve to do things like ranking or phishing attempts detection. Of course, I couldn’t resist asking what tools they used, thinking that she would reply R. She mentioned that they did do some roll-their-own, but she mentioned mlpy and scikit-learn, with a mention that it was very nice, at which point I believe that I blushed. She stressed that R was hard to use and production and raised the point that most often academic software doesn’t pan out in these settings (I hope that I am not distorting her thoughts too much).

Statistics and learning

I had the feeling that statistics and data mining played a big role at scipy this year. Maybe it is because I am more tuned to these questions nowadays, but some signs do not lie. There was a special session on Python in data sciences, a panel discussion on Python in finance and many many statistics and data related talks, as well as two tutorials and a keynote.

In addition, on a personal basis it was really great to meet part of the team behind scikits.statmodels. We had plenty of very interesting discussions and they really help me understand the way that some statisticians abord data: very differently than me, because they have fairly little data, and can afford to inspect reports and graphs, whereas I rely more on automated decision rules.

IPython

Min gave an excellent tutorial on how to do parallel computing using IPython. These guys have certainly done an excellent job to make cluster-level programming in Python easier. While they don’t play yet terribly well with the restrictive job-queue policy of the clusters to which I have access, they have all the right low-level tools to address these issues and Min told me that they will be working on this next year.

Fernando gave an impressive talk on the new developments of IPython. In particular, the new Qt-based terminal is really cool and there is a web frontend in the works.

Cluster computing as facility

While I mention cluster computing, I must confess that I have always stayed away from this beast: I find it a time sink, and I find that I get more science done without it. This is why I really like the presentation of the PiCould guys on, … cluster computing! The reason I liked it, is that they start from the principle that your time is more important than CPU time. I hear so much about bigger better faster more high-performance computing when researchers forget to address the biggest issue:

… a whole generation of researchers turned into system administrators by the demands of computing - Dan Reed, VP Microsoft

Abstract code manipulation for numerical computation

Finally, a trend that is picking up in the Python-based scientific computing is the abstract manipulation of expressions to generate fast code. This ranges from JIT (just in time) compilation generating machine code, to rewriting mathematical expressions. Peter Wang had a talk in this alley, but the topic was also brough up be Aron Ahmadia. Of course this is not new: numexpr has been using these tricks for years, and more recently Theano has been making good use of GPUs thanks to them.

Seeing this topic emerges in more and more places fr good reasons: with faster and more numerous CPU, the number of operations a second is less the bottleneck, and the order in which they are applied, or the physical location, is becoming critical.

My own agenda

Sprinting on scikit-learn

We had two days of sprints after the conference. A huge number of people voted for sprint on the scikit-learn but only two people showed up: Minwoo Lee and David Warde-Farley. Thanks heaps to these guys! My priority for the sprint was to review and merge branches. That worked beautifully: we merged in the following features:

In addition, David added dataset downloader for the Olivetti face datasets which is lightweight, but rich-enough to give very interesting examples.

My presentation

I gave a talk on my research work, and the software stack that undermines it: Python for brain mining: (neuro)science with state of the art machine learning and data visualization. I think that it was well received by the audience. What is really crazy is that I uploaded the slides on slideshare, and they got a ridiculous amount of viewing. I suspect that it is because of the title: brain mining does sound fancy.

Mayavi

Because of technical and political reasons, I cannot get Mayavi installed on the computers at work. This, and the fact that many people ask for help, but little contribute, even in the form of answers on the mailing list, had been mining me a bit. I got so much great feedback on Mayavi at the conference that I feel much more motivated to invest energy on it.

The Humain Brain Mapping conference in Quebec City

This blog post is getting too long. It is well beyond my own attention span. However scipy is not the only conference to which I have been recently. Two weeks before I was in Quebec, for the Human Brain Mapping conference. As each year, HBM is a fun ride. It has fantastic parties in the evenings. But I didn’t stay up too late as, this year was a busy for me: I was teaching in a educational course, and chairing a symposium, both on comparing brain functional connectivity across subjects.

But the really big deal at HBM this year came at the end. As I was dosing off, vaguely listening to Russ Poldrak’s closing comments, he brought up on screen a slide entitled the year of Python. This is a big deal: we’ve been working for years to get Python in the neuroimaging word, and it is clearly making progress, despite all the roadblocks.

22 Jul

Euroscipy 2011: early bird deadline soon

Euroscipy 2011: register now for early bird prices

The deadline for early-bird registration at the Euroscipy conference is this Sunday. Beyond this deadline prices will double. Register now to get a great deal.

To register, simply go to www.euroscipy.org, log in using the link on the top right, and follow the ‘Register now for the conference’ link on the top left.

The conference is a great opportunity to learn the intricacies of numerical and scientific computing in Python. You can register for the tutorials in a intro track, that will take you from beginner to fully autonomous user, or for an advanced track, to learn from the experts topics such as image processing, GPU computing, machine learning or optimization. The tutorials are a fairly unique occasion to improve your skills, as you will seldom get such a concentration of experts.

Some program highlights

After the 2 days of tutorial, the conference itself we host 2 keynotes: one by Marian Petre, of the open university, well-known for her empirical studies of software development, and another one by Fernando Perez, a pioneer in scientific computing in Python and the original author of IPython.

Glancing at the program, we can see how a wide range of topics are touched:

The variety of the topics illustrates what is for me one of the greatest benefits of the scipy conferences: they form a forum to exchange ideas and techniques to find new solutions to scientific, numerical and data analysis problems. Unlike the pure computer science conference, they sit at the frontier of applications and bleeding edge computer developments, because these people really use the tools presented to solve their problems.

In addition to this rich program, we will have 2 days of sprints before the conference as well as 2-day-long satellite conferences on Python in Physics and NeuroScience after the conference. This is how what used to be a small conference can now be a full 8-days event if you order all the extras.

14 May

Hiring a junior engineer on the scikits.learn

The scikits.learn is a Python module for machine learning. The project builds on the scientific and numerical tools of the scipy community to provide state-of-the-art data analysis tools. It is developed by a community of open source developers to which my research team (Parietal, INRIA) contributes a lot and is a striving project. Its mailing list fosters many discussions on code and machine learning topics, it has a a very detailed documentation, and a tight release cycle.

Although scikits.learn is mostly developed by volunteers, INRIA has funded a two year position for a junior engineer —currently Fabian Pedregosa— to help with the core management and integration of the project. This funding is coming to an end in falls 2011 [*]. The good news is that we have been allocate new funding to hire an engineer on the scikit.

We are thus looking to hire a junior engineer for a 2-year position to work on the scikits.learn at INRIA in Saclay, near Paris. The position is only available to candidates that have received a masters or equivalent degree at most a year ago — this is non negotiable: we cannot hire more senior candidates.

We are looking for a developer with good open-source project management skills: the successful candidate will review and merge patches, ensure the quality of the scikit, make releases, coordinate development on the mailing list and on github. Good knowledge of Python and its scientific ecosystem is expected. A mathematical or computer-science oriented mindset is a plus, as the project involves working with machine learning algorithms.

The candidate should be willing to relocate to work daily in the Neurospin brain research institute in which the Parietal is located. Knowledge of French is not required, as the team and the institute are very international. Non-EU candidates are welcome, but the hiring process will take longer.

You will be working in a very stimulating environment. You will be employed by INRIA, the French computer science research institute. As such, you will benefit from the expertise of the institute’s researchers and engineers. Team members contribute to various scientific Python libraries (in addition to scikits.learn, Mayavi, nipy, joblib). In addition, you will be working in a brain research institute, in collaboration with leading methods researchers and neuroscientists that use machine learning to gain new insights on brain processes.

To apply: To apply, you need to prepare a CV and a motivation letter. The deadline for applications is mid June, but we will be selecting candidates and conducting interviews before. Don’t send me CVs. The formal job description, as well as instructions to apply can be found on this page. The page is mostly in French, sorry; use Google translate if you don’t understand. At the bottom of the page you will find a link to apply.


[*] Fabian will most probably stay with us to do a PhD on analysis of large brain functional imaging datasets.