Cython example of exposing C-computed arrays in Python without data copies
Colleagues who are exposing a numerical C code in Python asked me for some advice on the best way to pass arrays from C to Python avoiding copies. They had Cython in mind, and I must agree with them that I have found the Cython code to be more maintainable than hand-written Python C-API code.
When writing my answer, I found out that there was no self-contained example of creating numpy arrays from existing data in Cython. Thus I created my own. The full code with readme build and demo scripts is available on a gist. Here I only give an executive summary.
The core functionality is implemented by the PyArray_SimpleNewFromData function of the C API of numpy that can create an ndarray from a pointer to the data, a simple data type, and the shape of the data. The Cython file just builds around that function:
Subscribe
On Twitter
Posted
on
Wednesday, September 28th, 2011 at 11:42 pm under

FWIW, PySPH (pysph.googlecode.com) has something very similar to expose our raw C-arrays implemented in Cython as numpy arrays.
http://code.google.com/p/pysph/source/browse/source/pysph/base/carray.pyx
cheers,
September 29th, 2011 at 6:03 amPrabhu
Wow, that’s pretty crazy: you have implemented a good fraction of the array mechanism in Cython!
September 29th, 2011 at 6:33 amThe memory is NOT handed off to the Python interpreter: it’s the user’s responsibility to make sure that the array lives long enough for Numpy to use it and it’s also the user’s responsibility to free it.
As stated in the PyArray_SimpleNewFromDataTo doc, to hand off memory ownership to Numpy, you can set the Numpy array’s OWNDATA flag (in your code: ndarray.flags |= NPY_OWNDATA). However, this assumes that the memory allocator (e.g. malloc) used to allocate the data is the same as the one used internally by Numpy. If you don’t want to make this assumption, you can use Travis Oliphant’s technique.
I think the most robust way is to always allocate memory with Numpy and hand it to the C function. This can involve changing bits in the existing C code though.
September 29th, 2011 at 8:31 amYes, you are right. I forgot about the OWNDATA, and I created a memory leaked. I’ll fix that in the gist ASAP.
I agree with allocating the memory with numpy in Cython, but I am talking about people who are newcomers to Python. Quite often they have a large existing codebase and unsurprisingly, they ‘cling to their guns and religion’ (http://www.youtube.com/watch?v=cXTetFINqq0&NR=1)
September 29th, 2011 at 9:06 amActually, it was made a bit challenging by the fact that the flags are not settable in Python, and in Cython, the C structure is masked by the Python API. I had to resort to the PyArray_UpdateFlags function.
September 29th, 2011 at 9:37 amI can’t get the PyArray_UpdateFlags line to work. Looks like the OWNDATA flag just never changes. Are you sure that this worked for you?
December 7th, 2011 at 10:04 pmHi Kiyo,
Indeed, this wasn’t working for me, and I hadn’t realized (I actually don’t use that code, as I am always very careful to allocate my memory in Python, and pass it to the C code).
I had a look at the source code of PyArray_UpdateFlags:
https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/flagsobject.c#L64
It’s quite obvious that it simply ignores the OWNDATA flag.
This means that we are going to have to write our own deallocation code, and wrap it in a __dealloc__ method. I’ll update the gist, but it might be a few days before I get there. In the mean time, have a look at how Prabhu does it is PySPH (mentioned in the comments above).
December 8th, 2011 at 6:13 amI fixed the problem. It is a bit late, and I hacked this quickly, so the code isn’t the prettiest ever, however it should work.
December 8th, 2011 at 8:03 pm