28 Sep

Cython example of exposing C-computed arrays in Python without data copies

Colleagues who are exposing a numerical C code in Python asked me for some advice on the best way to pass arrays from C to Python avoiding copies. They had Cython in mind, and I must agree with them that I have found the Cython code to be more maintainable than hand-written Python C-API code.

When writing my answer, I found out that there was no self-contained example of creating numpy arrays from existing data in Cython. Thus I created my own. The full code with readme build and demo scripts is available on a gist. Here I only give an executive summary.

The core functionality is implemented by the PyArray_SimpleNewFromData function of the C API of numpy that can create an ndarray from a pointer to the data, a simple data type, and the shape of the data. The Cython file just builds around that function:

8 Responses to “Cython example of exposing C-computed arrays in Python without data copies”

  1. Prabhu Says:

    FWIW, PySPH (pysph.googlecode.com) has something very similar to expose our raw C-arrays implemented in Cython as numpy arrays.

    http://code.google.com/p/pysph/source/browse/source/pysph/base/carray.pyx

    cheers,
    Prabhu

  2. gael Says:

    Wow, that’s pretty crazy: you have implemented a good fraction of the array mechanism in Cython!

  3. Mathieu Says:

    The memory is NOT handed off to the Python interpreter: it’s the user’s responsibility to make sure that the array lives long enough for Numpy to use it and it’s also the user’s responsibility to free it.

    As stated in the PyArray_SimpleNewFromDataTo doc, to hand off memory ownership to Numpy, you can set the Numpy array’s OWNDATA flag (in your code: ndarray.flags |= NPY_OWNDATA). However, this assumes that the memory allocator (e.g. malloc) used to allocate the data is the same as the one used internally by Numpy. If you don’t want to make this assumption, you can use Travis Oliphant’s technique.

    I think the most robust way is to always allocate memory with Numpy and hand it to the C function. This can involve changing bits in the existing C code though.

  4. gael Says:

    Yes, you are right. I forgot about the OWNDATA, and I created a memory leaked. I’ll fix that in the gist ASAP.

    I agree with allocating the memory with numpy in Cython, but I am talking about people who are newcomers to Python. Quite often they have a large existing codebase and unsurprisingly, they ‘cling to their guns and religion’ (http://www.youtube.com/watch?v=cXTetFINqq0&NR=1)

  5. gael Says:

    Actually, it was made a bit challenging by the fact that the flags are not settable in Python, and in Cython, the C structure is masked by the Python API. I had to resort to the PyArray_UpdateFlags function.

  6. Kiyo Says:

    I can’t get the PyArray_UpdateFlags line to work. Looks like the OWNDATA flag just never changes. Are you sure that this worked for you?

  7. gael Says:

    Hi Kiyo,

    Indeed, this wasn’t working for me, and I hadn’t realized (I actually don’t use that code, as I am always very careful to allocate my memory in Python, and pass it to the C code).

    I had a look at the source code of PyArray_UpdateFlags:
    https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/flagsobject.c#L64
    It’s quite obvious that it simply ignores the OWNDATA flag.

    This means that we are going to have to write our own deallocation code, and wrap it in a __dealloc__ method. I’ll update the gist, but it might be a few days before I get there. In the mean time, have a look at how Prabhu does it is PySPH (mentioned in the comments above).

  8. gael Says:

    I fixed the problem. It is a bit late, and I hacked this quickly, so the code isn’t the prettiest ever, however it should work.

Leave a Reply

111111