GPUs for Scientific Computing

Introduction
GPUs changing geometry
GPUs improving quality and effects on the pixel level
GPUs doing scientific calculation
BrookGPU
Important Webpages


Introduction

In the past, when we tald about video cards (GPUs - Graphics Processing Units) it was like a "Black Engine" where we sent graphics primitive (triangles) using points and with certain transformations. At the end of this complicated process and with some luck we would be able to see an image on the screen. In fact, it still done this like this way. However, nowadays the developer has more control over this process, which we can call "rendering pipeline" (see figure below). Later the GPUs became programable, even using high level language ( Cg, OpenGL's SLang, HLSL).



- GPU's are programmable processors
- They have two types of programs (per-vertex and per-pixel)
- They use a high-level programming language

I'm not intending to tell the whole history of how the GPUs developed, but I want to point out some of their main features. The Cg Tutorial Book (Randina Fernando and Mark J. Kilgard, April 2003) divides the computer graphics hardware in four generation. It only make sense to talk about GPUs programable from the sencond generation (1999-2000), which includes NVIDIA's GeForce 256 and GeForce2, ATI 7500 ... In this generation both, OpenGL and DirectX 7 support hardware vertex transformation. The third generation of GPUs (2001) includes NVIDIA's GeForce3, GeForce4 Ti and ATI's Radeon 8500. Dispite the pixel-level not being powerful enough, this generation has more powerful vertex programing and more available pixel-level configurability. With the fourth generation, (2002 and on) which includes NVIDIA's GeForce Fx family and ATI's Radeon 9700, the programmability opens up in vertex and pixel levels. This is the generation of GPUs where people start thinking about doing not only complicate shading, but scientific computation as well.


GPU's changing geometry (displacement function)


Since the second generation, it became possible to send parameters for the vertex shading processor at the GPU in order to apply a function for each vertex. The following figure shows the new feature. In the first sequence (right-top) the CPU is evaluating the function f(x) for each element of vector x[n]. The second sequence (right-bottom) the function has been developed at GPU vertex processor. Only doing this change, the majority of examples presented so far speed up the rendering at least ten times. The figure in the right is a wave simulator applycation developed on Open Inventor, where the GPU is doing a wave displacement function (more ... for details and code).

more ...


GPU's improving quality and effects on the pixel level


Although the user had adquired a reasonable control over the vertex processor (since the second generation), it was not until the fourth that he was able to reach better control over the fragment shading process. This generation provides:
- support for long shading programs.
- high-precision of color and pixel operations (32 bits).
- High level shading languages get really interesting.

Examples:
Cg_sking - NVIDIA proctex3d -NVIDIA Dynamic shadows - Frustum Displacement Bump Mapping - Frustum

GPUs doing scientific calculation


With the new GPUs features mentioned above, people started to look at the GPU as a powerful vector coprocessor to the CPU. The intermediate values during a computaion (unsing float buffers) are no longer clamped. Additionally, another good reason to use them as a coprocessor is its parallel nature at the razterization stage (pixel-level). In this way texture-images become matrices of values to do computation. Nowadays, GPUs are being used for linear Algebra computation, signal-images processing, physical simulation and so on ...

Limitations
- Limited instructions and register space
- No branches, but you can use conditionals or multipass
- No good enough (so far) to send values back to the CPU

Examples:
Jeff Bolz, Ian Farmer Eitan Grinspun, Peter Schröder (California Institute of Technology) - SIGGRAPH 2003 Sparse Matrix Solvers on the GPU: Conjugate Gradients and Multigrid
- A sparse matrix conjugate gradient solver
- A regular-grid multigrid solver
- The simulation is running on NVIDIA's GeForce FX
J. Krüger & R. Westermann - SIGGRAPH 2003 Linear Algebra Operators for GPU Implementation of Numerical Algorithms more ...
- Implement a linear algebra operator on the GPUs in order to solve numerical simulations
- This framework implements direct solvers for sparse matrices
- Apply this solvers to multi-dimensional finite different equation( 2D wave equation and the Navier-Stokes equations)
- The simulation is running on ATI Radeon 9800 and the performance was about 12-15 times faster compare an optimized software implementation on the same target architecture


more examples and appications ...

BrookGPU

Prerequisites: C++ Object-Oriented Programming and intermedia OpenGL knowledge (work with textures)

Now is time to get better idea how people are using GPUs for scientific computation, which is the main goal of this webpage. In order to carry out this ojetive we select Brook for GPU library that was developed at Stanford University.
As you have seen, there are already so many application and papers that are using GPUs for computing, but it is difficult to find source code and details, how this whole interaction betwen CPU->GPU and GPU->CPU. This is one of the reason we select BrookGPU. Although this library works on OpenGL and DirectX backend, I will explain briefly the OpenGL part.

Introduction questions:
What kind of buffer can we use on the GPU for computation?
They are additional non-visible rendering buffers called float buffers or pixels buffers (pbuffer).

Do I need expecial configuration or support to use them?
In the OpenGL case, you will need check if your hardware support certain
OpenGL Extensions (see registry). The glxinfo application, on Linux machines, can tell you what extentions are supported in your hardware. Usually you can find it at /usr/X11R6/bin/. BrookGPU requires NVIDIA video card so you have to looking for GL_NV_float_buffer extention. You can check another extentions which not required NVIDIA cards: WGL_ARB_pbuffer GLX_SGIX_pbuffer

Note: Before see the next presentation you shoud read BrookGPU webpage. Also you can see GH03-Brook.ppt

How BrookGPU works?

In this presentation (ppt file, pdf file) I try to explain:
- Stream creation
- Kernels creation and rendering
- Reduction operation

Code examples:

- Mark J. Harris (RenderTexture class)
- NVSDK 6.0 (pbuffer.h pBuffer.cpp)
- Patrick Crawley (Unsteady Flow Advection Convolution)

Notes:

- check this out Request For Comment: EXT_render_target proposal. Thanks Patrick Crawley

Important Webpages


Brook for GPU
OpenGL
HLSL
Cg and HLSL FAQ

Date: 01/14/2004 Last Update: 04/09/2004