|
 |
Today,
the
computer plays a significant role in optics. Computers
are used to design and simulate optical systems and phenomena
and
of course within a lot of produts the optical signals are processed -
after detection - by digital electronics. Thanks to the tremendous
advances in computing power, applications that in the past needed
supercomputers became realizable with conventional personal computers .
In optics, for a lot of applications one cannot
have enough computing
power. One prominent example is image and video processing where
sophisticated processing algorithms are used for an increasing number
of pixels and images. Optical applications are also often good
candidates for
parallelization. Multicore systems are therefore very much suited for
optical applications. Compared with current CPU-designs with rather few
cores, graphics
processing units (GPU) use much more parallelism. A modern CPU for
consumer
gaming has a lot of procesing cores (e.g., 240 for an NVidia GTX 280)
and a peak performance in the range of 1 TFlops (10^12 floating point
operations per second). Of course, it is tempting to use that
processing power for something else than gaming, in our case optics.
In the future, the gap between CPU and GPU might be reduced due to the
development of multi-core CPUs with a lot of processing cores, but at
the moment one can typically achieve performance improvement of a
factor of 10 to 100 by using the GPU for massive parallel problems.
A lot of
different algorithms are used for the computation of
computer-generated holograms (CGH). For real-time applications the
computation (or optimization) of a holograms is a quite
demanding
task, thanks to the high resolutions that are typically needed for
holographic applications.
One example is the real-time control of holographic optical tweezers.
For
the computation of the holograms different
hologram optimization methods
were implemented on standard consumer graphics boards using a
combination of Cg (“C for graphics”), OpenGL and
C++. The most
challenging part when writing such applications is to efficiently map
the problem at hand to the hardware of the graphics board. In
fig. 1 the program flow for a iterative Fourier transform algorithm
is shown. In order to fully exploit the available hardware it turned
out that two holograms need to be computed at the same time. With a
Nvidia 8800 GTX based graphics board it is possible to generate
optimized (10 IFTA iterations) holograms for the holographic tweezers
at a rate of 16 Hz. The main computational cost for this Fourier
transform hologram optimization is associated with two-dimensional
Fourier transforms. The 8800 GTX based board delivers for our
application a performance of more than 360 complex (32 bit float)
two-dimensional FFTs at a resolution of 512 x 512. 
Fig. 1:
Benchmarks for 32
Bit float 2D FFT using CUDA
For
many applications in optical design and simulation the available
processing speed still limits optimization. One example is the
simulation of the interferometric testing of aspheres. Another
example is the design of illumination systems. We performed first
experiments using the graphics board for optical raytracing.
Benchmarks for tracing rays through systems with spheres and
(polynomial) aspheres have been conducted after implementing a simple
raytracing scheme.
On a
7800 GTX based card we achieved 200 million rays per surface per
second for spherical surfaces and 50 million rays per surface per
second for polynomial aspheres. With an NVidia 8800GT 145 million rays
per aspherical surface are achieved. The accuracy at the
moment
is
32 bit
(floating point).
With newer cards
(GTX 280) it is
also
possible to switch to 64 Bit accuracy. This should, however, lead to a
reduction of speed by a factor of 4 (not tested for this application).
For the implementation of low-level
image processing we used CUDA. Our
library was programmed under Linux with the aim of accelerating
standard image processing core functions. Within the library
- Fourier transforms
- Correlations
- Convolutions (Spatial domain as well as Fourier
domain)
- and matrix operations
currently are
addressed. More or
less it is a C++
interface that helps
to use the CUDA functions. The CUDA library is extremely fast
for FFTs. For twodimensional complex
512x512 FFTs we achieve more than 1000 FFTs per second with an NVidia
8800 GTX. For the correlation-based comparison of lots of reference
patterns with one input image we have more than 500 correlations per
second.
 |  | | Fig.
2: One of the input
fingerprints for correlation | Fig.
3: correlation peak of a correctly
identified fingerprint |
References
of our group concerning GPU programming |
[1] |
Haist, T., Schmid,
U., Osten, W., "Fast computation of Fourier transforms using graphics
processing units", VDI Berichte 1981, pp. 217-224 (2007) |
[2]
| Reicherter,
M.; Haist, T.; Zwick, S.; Burla, A.; Seifert, L., „Fast
hologram
computation and aberration control for holographic tweezers“,
Proc.
SPIE 5930, pp. 501-509 (2005) | [3] | Haist,
T., Reicherter, M., Burla, A., Seifert, L., Hollis, M., Osten, W.,
„Fast hologram computation for holographic tweezers", Proc.
Fringe
2005, pp. 126-133 (2005). | [4] | Reicherter
. M., Zwick, S., Haist, T., Kohler, C., Osten, W, „Fast
digital
hologram generation and adaptive force measurement in LCD based
holographic tweezers", Applied Optics 45(5), pp. 888-896 (2006). |
[5]
| Haist,
T., Reicherter, M.,Min Wu,Seifert L., „Using Graphics Boards
to compute
holograms“, Computing in Science & Engineering -
January 2006, pp.
8.-14 (2006). | [6] | Hermerschmidt,
A., Krüger, S., Haist, T., Zwick, S., Warber, M., Osten,
W.,"Holographic optical tweezers with real-time hologram calculation
using a phase-only modulating LCOS-based SLM at 1064 nm", Proc. SPIE
6905 (2008) |
©
Institut für Technische Optik
| | |