Workshop Programming of

Heterogeneous Systems in Physics

Jena, 5-7 October 2011

Heterogeneous Systems in Physics

Jena, 5-7 October 2011

We present a system for improving the speed of Matlab / DipImage image processing using Cuda based GPU programming. The toolbox that will be presented is freely available on request and works with off the shelf gaming graphic cards. The design of our CudaMat toolbox is such, that close to no modification of previously written Matlab code is required. A subset of standard Matlab and DipImage functionality is shadowed by custom- written Cuda functions on arrays stored on the graphics card. This is realized by introducing a new object datatype ("cuda") which can be mixed with standard Matlab types, causing automatic casts where necessary. Thus the only extra line of code that needs to be written to convert a standard Matlab program to a CudaMat program is no more complicated than "myimage=cuda(myimage)". The typical speedup between a standard desktop computer and the CudaMat version that can be achieved this way is about a factor of 20. However, some problems are massively limited in their speed due to Matlab's peculiarities in data handling, causing sometimes a lot of unnecessary array copy operations. For this reason CudaMat has the ability of on-the-fly compilation of code snippets that can for example contain per-pixel loops running individually on each of the cores of the GPU multiprocessor. The resulting overall speedup is stunning for some problems. E.g. the calculation of a Mandelbot set speeds up by a factor of over a hundred thousand using this scheme, as the data transfer can be almost entirely avoided. The speedup measurements quoted above were obtained for an Intel(R) Core(TM) i7 CPU 750@ 2,67 GHz, 64 bit, gcc 4.5.0 run under OpenSuse11.3. and a GeForce GT 200 graphics card.