This post discusses the methods that can be used to integrate the array data structures implemented in C++ into Python code. At the end we want to be able to initialize C++ array objects from Python number sequences as well as to convert these objects into ndarray -- the array type of NumPy library.
The code bellow shows a simple example of an array data structure in C++.
Header file:
/* * @file: MyArray.hpp */ #ifndef _MY_ARRAY_ #define _MY_ARRAY_ #includeclass MyArray { public: /* * Constructor allocated memory of given size */ MyArray(int size); /* * Constructor creates MyArray object using given data */ MyArray(double* input, int size); //getters double* getData(); int getSize(); double getItem(int idx); private: double* data; //pointer to the memory segment int size; //array size }; #endif
And the source file:
/*
* @file: MyArray.cpp
*/
#include "MyArray.hpp"
MyArray::MyArray(int size){
this->data = new double[size];
this->size = size;
}
MyArray::MyArray(double* input, int size){
this->size = size;
// create new vector
this->data = new double[size];
// copy data
memcpy(this->data, input, size * sizeof(double));
}
double* MyArray::getData(){
return this->data;
}
int MyArray::getSize(){
return this->size;
}
double MyArray::getItem(int idx){
return this->data[idx];
}
Basically, we want to achieve the following in Python:
from myarray import MyArray
x = MyArray([1,2,3]) # we use Python array to initialize a C++ class
y = x.array() # create new view on the same data segment
type(y)
>> <type 'numpy.ndarray'>
del x # the data still exist since y is pointing on it
y[0]
>> 1.0
del y # memory is deallocated
SWIG is a useful framework for wrapping C/C++ code into libraries, which can be used in script languages like Python or Ruby. The general steps for compilation are described in a number of tutorials. If you are completely new to SWIG you probably should read these tutorials first.
In order to use SWIG with NumPy, we need the file numpy.i that contains the description of NumPy interfaces, i.e. numerical arrays, necessary for the work with Python. If you are using Ubuntu, you can get the file from the package python-instant in Ubuntu 11.10 or from python-numpy in some older versions. Alternatively, you can download the interface file from the NumPy github repository and put it into the directory with other interface files.
Now let us consider two parts of the problem separately.
Part 1. Initialization of a C++ class using a Python number sequence
The solution I have found in a stackoverflow post. We create an interface file MyArray.i for wrapping the C++ class.
/*
* @file MyArray.i
*/
%module myarray
// typemap allowing to pass sequence of numbers to constructor
%typemap(in) (double* data, int size)
{
if (!PySequence_Check($input)) {
PyErr_SetString(PyExc_ValueError, "Expected a sequence");
return NULL;
}
$2 = PySequence_Size($input);
$1 = (double *) malloc(sizeof(double)*$2);
for (int i = 0; i < $2; i++) {
PyObject *o = PySequence_GetItem($input,i);
if (PyNumber_Check(o)) {
$1[i] = (double) PyFloat_AsDouble(o);
} else {
PyErr_SetString(PyExc_ValueError,"Sequence elements must be numbers");
return NULL;
}
}
}
%typemap(freearg) (double *input, int size)
{
if ($1) free($1);
}
%typecheck(SWIG_TYPECHECK_FLOAT) (double* data, int size)
{
$1 = PySequence_Check($input) ? 1 : 0;
}
In the code above all work is done manually. We check if the input parameters have a proper type,
initialize an array of doubles with size elements, and then one by one copy
the elements from Python sequence to our new array. Although I will show a simpler way
below, this solution is useful if we need some additional magic during the conversion.
Alternatively, a cleaner solution uses
input arrays available in SWIG. In this case MyArray.i file will contain following lines:
/*
* @file MyArray.i
*/
%module(directors="1") myarray
%include "typemaps.i"
%{
#define SWIG_FILE_WITH_INIT
#include "MyArray.hpp"
%}
%include "numpy.i"
%init %{
import_array();
%}
%apply (double* IN_ARRAY1, int DIM1) {(double* input, int size)}
//...
We need to include typemaps.i and numpy.i and to call function the function import_array that handles all necessary initialization routinges.
Finally, with the %apply statement with tell the preprocessor the required type mapping.
Important: due to the operation of %apply the signatures and
variable names should exactly match those used in the function signatures! Save yourself hours of bug-fixing and
just copy/paste it.
Part 2. Create a persistent ndarray view on data in C++
Now we want to create a representation of our C++ class as an ndarray. Moreover, as in scientific computing or machine
learning we usually deal with very large data, we would like to avoid allocation of additional memory and data copying.
An obvious solution would be to use argoutview arrays from numpy.i.
In this case, we would need to extend the MyArray.i file as follows:
//...
%apply ( DATA_TYPE** ARGOUTVIEW_ARRAY1, DIM_TYPE* DIM1 ) {(double** vec, int n)};
class MyArray {
public:
/*
* Constructor allocated memory of given size
*/
MyArray(int size);
/*
* Constructor creates MyArray object using given data
*/
MyArray(double* data, int size);
%extend {
// Create a ndarray view from the MyArray data
// using ARGOUTVIEW
void __array(double** vec, int n){
//Get the data and number of entries
vec = &$self->getData();
int n = $self->getSize();
}
%pythoncode
{
def array(self):
return self.__array(self)
}
}
}
This is a simple and clean solution. Unfortunately, it has a big drawback, since without explicit memory management, we cannot guarantee the data
integrity for variables containing
import myarray x = myarray.MyArray([1,2,3]) y = x.array() y[0] >> 1.0 x = 'foobar' y[0] >> [1.93759274e-10, 2,3]
Hence, the solution can be used only in cases, when the original MyArray variable (x in our example) lives as long as all other variables pointing to the underlying y in our example).
To overcome the memory management issue, we need to create a NumPy array manually and use the Python pointer counting mechanism to guarantee that the memory will not be freed before the last variable pointing to the array was deleted or reassigned. The solution was inspired by this blog post.
//...
%inline %{
/*
* Memory managment function used in sg::base::DataVector::__array()
* it simply decrements the number of references to the PyObject datavector
* every time a referencing ndarray is deleted.
* After reference counter reaches 0, the memory of DataVector will be deallocated
* automatically.
*/
void free_array(void* ptr, void* arr){
double* vec = (double *) ptr;
PyObject* array = (PyObject*)arr;
Py_DECREF(array);
}
%}
class MyArray {
public:
/*
* Constructor allocated memory of given size
*/
MyArray(int size);
/*
* Constructor creates MyArray object using given data
*/
MyArray(double* data, int size);
%extend {
// Create a ndarray view from the MyArray data
// an alternative approach using ARGOUTVIEW will fail since it does not allow to do a proper memory management
PyObject* __array(PyObject* myarray){
//Get the data and number of entries
double *vec = $self->getPointer();
int n = $self->getSize();
npy_intp dims[1] = {n};
// Create a ndarray with data from vec
PyObject* arr = PyArray_SimpleNewFromData(1,dims, NPY_DOUBLE, vec);
// Let the array base point on the original data, free_array is a additional destructor for our ndarray,
// since we need to DECREF MyArray object
PyObject* base = C((void*)vec, (void*)myarray, free_array);
PyArray_BASE(arr) = base;
// Increase the number of references to PyObject MyArray, after the object the variable is reinitialized or deleted the object
// will still be on the heap, if the reference counter is positive.
Py_INCREF(myarray);
return arr;
}
%pythoncode
{
def array(self):
return self.__array(self)
}
}
}
The function PyArray_SimpleNewFromData in the line 41 creates a 1D Python array of size dims from the memory segment referenced by vec. In the line 46 we set the base object of the MyArray wrapper class as the base object of the ndarray object arr. Now, if the ndarray object will be deleted or reassigned, Python calls free_array method on its base object. In the function free_array we then would decrement the pointer counter and if no pointer remains, the memory will be deallocated automatically.
Finally, we need to increase the number of pointer to the memory in the line 50 (in fact, we increase the number of pointer to the MyArray object and not to the double* data pointer).
Step 3. Compilation and Installation
To make the compilation simple, I am using distutils and setup.py file adapted from this tutorial.
#! /usr/bin/env python
# System imports
from distutils.core import *
from distutils import sysconfig
# Third-party modules - we depend on numpy for everything
import numpy
# Obtain the numpy include directory. This logic works across numpy versions.
try:
numpy_include = numpy.get_include()
except AttributeError:
numpy_include = numpy.get_numpy_include()
# inplace extension module
_myarray = Extension("_myarray",
["MyArray.i","MyArray.cpp"],
include_dirs = [numpy_include],
swig_opts=['-c++']
)
# NumyTypemapTests setup
setup( name = "C++/Numpy Conversion",
description = "Integration of array data structures implemented in C++ into Python code",
author = "Valeriy Khakhutskyy",
version = "1.0",
ext_modules = [_myarray]
)
For detailed explanation of the file, please refer to the tutorial.
To build the code use
python setup.py build
Now copy the library that was compiled into build/lib.XXX directory into the folder with myarray.py file. If everything went right, running example code will give you correct output.
You can find the example files described in this post in github.