GPU-based computation
![]() |
Motivation |
Motivation |
|---|
Today, the computer plays a significant role in optics. Computers are used to design and simulate optical systems and phenomena and of course within a lot of produts the optical signals are processed - after detection - by digital electronics. Thanks to the tremendous advances in computing power, applications that in the past needed supercomputers became realizable with conventional personal computers .
In optics, for a lot of applications one cannot have enough computing power. One prominent example is image and video processing where sophisticated processing algorithms are used for an increasing number of pixels and images. Optical applications are also often good candidates for parallelization. Multicore systems are therefore very much suited for optical applications. Compared with current CPU-designs with rather few cores, graphics processing units (GPU) use much more parallelism. A modern GPU for consumer gaming has a lot of processing cores (e.g., 240 for an NVidia GTX 280) and a peak performance in the range of 1 TFlops (10^12 floating point operations per second). Of course, it is tempting to use that processing power for something else than gaming, in our case optics.
In the future, the gap between CPU and GPU might be reduced due to the development of multi-core CPUs with a lot of processing cores, but at the moment one can typically achieve performance improvement of a factor of 10 to 100 by using the GPU for massive parallel problems.
Hologram computation |
|---|
A lot of different algorithms are used for the computation of computer-generated holograms (CGH). For real-time applications the computation (or optimization) of a holograms is a quite demanding task, thanks to the high resolutions that are typically needed for holographic applications.
One example is the real-time control of holographic optical tweezers. For the computation of the holograms different hologram optimization methods were implemented on standard consumer graphics boards using a combination of Cg (“C for graphics”), OpenGL and C++. The most challenging part when writing such applications is to efficiently map the problem at hand to the hardware of the graphics board. In fig. 1 the program flow for a iterative Fourier transform algorithm is shown. In order to fully exploit the available hardware it turned out that two holograms need to be computed at the same time. With a Nvidia 8800 GTX based graphics board it is possible to generate optimized (10 IFTA iterations) holograms for the holographic tweezers at a rate of 16 Hz. The main computational cost for this Fourier transform hologram optimization is associated with two-dimensional Fourier transforms. The 8800 GTX based board delivers for our application a performance of more than 360 complex (32 bit float) two-dimensional FFTs at a resolution of 512 x 512.

Raytracing |
|---|
For many applications in optical design and simulation the available processing speed still limits optimization. One example is the simulation of the interferometric testing of aspheres. Another example is the design of illumination systems. We performed first experiments using the graphics board for optical raytracing. Benchmarks for tracing rays through systems with spheres and (polynomial) aspheres have been conducted after implementing a simple raytracing scheme.
On a 7800 GTX based card we achieved 200 million rays per surface per second for spherical surfaces and 50 million rays per surface per second for polynomial aspheres. With an NVidia 8800GT 145 million rays per aspherical surface are achieved. The accuracy at the moment is 32 bit (floating point).
With newer cards (GTX 280) it is also possible to switch to 64 Bit accuracy. This should, however, lead to a reduction of speed by a factor of 4 (not tested for this application).
For the implementation of low-level image processing we used CUDA. Our library was programmed under Linux with the aim of accelerating standard image processing core functions. Within the library
- Fourier transforms
- Correlations
- Convolutions (Spatial domain as well as Fourier domain)
- and matrix operations
currently are addressed. More or less it is a C++ interface that helps to use the CUDA functions.
The CUDA library is extremely fast for FFTs. For twodimensional complex 512x512 FFTs we achieve more than 1000 FFTs per second with an NVidia 8800 GTX. For the correlation-based comparison of lots of reference patterns with one input image we have more than 500 correlations per second.
![]() |
![]() |
| Fig. 2: One of the input fingerprints for correlation | Fig. 3: correlation peak of a correctly identified fingerprint |
References of our group concerning GPU programming |
|---|
Hermerschmidt, A., Krüger, S., Haist, T., Zwick, S., Warber, M., Osten, W.,"Holographic optical tweezers with real-time hologram calculation using a phase-only modulating LCOS-based SLM at 1064 nm", Proc. SPIE 6905 (2008)
[1]
Haist, T., Schmid, U., Osten, W., "Fast computation of Fourier transforms using graphics processing units", VDI Berichte 1981, pp. 217-224 (2007)
[2]
Reicherter, M.; Haist, T.; Zwick, S.; Burla, A.; Seifert, L., „Fast hologram computation and aberration control for holographic tweezers“, Proc. SPIE 5930, pp. 501-509 (2005)
[3]
Haist, T., Reicherter, M., Burla, A., Seifert, L., Hollis, M., Osten, W., „Fast hologram computation for holographic tweezers", Proc. Fringe 2005, pp. 126-133 (2005).
[4]
Reicherter . M., Zwick, S., Haist, T., Kohler, C., Osten, W, „Fast digital hologram generation and adaptive force measurement in LCD based holographic tweezers", Applied Optics 45(5), pp. 888-896 (2006).
[5]
Haist, T., Reicherter, M.,Min Wu,Seifert L., „Using Graphics Boards to compute holograms“, Computing in Science & Engineering - January 2006, pp. 8.-14 (2006).
[6]



