## T

Object i

i Project

Object

Screen

Shear and Scale

Object i

Object i Project

Screen

FIGURE 10 Shear-warp in orthographic and perspective projections.

given isosurface value has 0(n2) polygons generated using marching cubes. Given intelligent preprocessing, the rendering time will be 0(n2). Since it is hard to improve performance using multiple graphics engines, this seems a hard limit when using commercially available graphics accelerators unless a large fraction of the polygons are not visible [25]. If a ray tracing algorithm is used to traverse the volume until a surface is reached, we would expect each ray to do 0(n) work. If the rays are traced on p processors, then we expect the runtime for an isosurface image to be 0(n/p), albeit with a very large time constant and a limit that p is significantly lower than the number of pixels. For sufficiently large n, ray tracing will be faster than a z-buffer algorithm for generating and rendering isosurfaces. The question is whether it can occur on an n that occurs in practice (e.g., n = 500 to 1000) with a p that exists on a real machine (e.g., p = 8 to 128). The following demonstrates that with a few optimizations, ray tracing is already attractive for at least

FIGURE 12 A ray is intersected directly with the isosurface. No explicit surface is computed.

In the following subsections, we describe the details of our technique. We first address the ray-isosurface intersection followed by a description of various optimizations we have performed to achieve the interactive rates.

### 4.1 Ray-Isosurface Intersection

If we assume a regular volume with even grid point spacing arranged in a rectilinear array, then the ray-isosurface intersection is straightforward. Analagous simple schemes exist for intersection of tetrahedral cells, but the traversal of such grids is left for future work. This work will focus on rectilinear data.

To find an intersection (Fig. 13), the ray a + tb traverses cells in the volume, checking each cell to see if its data range bounds an isovalue. If it does, an analytic computation is performed to solve for the ray parameter t at the intersection with the isosurface:

When approximating p with a trilinear interpolation between discrete grid points, this equation will expand to a cubic polynomial in t. This cubic can then be solved in closed form to find the intersections of the ray with the isosurface in that cell. Only the roots of the polynomial that are contained in the cell are examined. There may be multiple roots, corresponding to multiple intersection points. In this case, the smallest t (closest to the eye) is used. There may also be no roots ofthe polynomial, in which case the ray misses the isosurface in the cell. The details of this intersection computation are given in [29].

FIGURE 12 A ray is intersected directly with the isosurface. No explicit surface is computed.

FIGURE 11 Warped space.

some isosurface applications, including high-resolution medical imaging applications.

Ray tracing has been used for volume visualization in many works (e.g., [26-28]). Typically, the ray tracing of a pixel is a kernel operation that could take place within any conventional ray tracing system. In this section we review how ray tracers are used in visualization, and how they are implemented efficiently at a systems level.

The algorithm has three phases: traversing a ray through cells that do not contain an isosurface, analytically computing the isosurface when intersecting a voxel containing the isosurface, and shading the resulting intersection point. This process is repeated for each pixel on the screen. Since each ray is independent, parallelization is straightforward. An additional benefit is that adding incremental features to the rendering has only incremental cost. For example, if one is visualizing multiple isosurfaces with some of them rendered transparently, the correct compositing order is guaranteed since we traverse the volume in a front-to-back order along the rays. Additional shading techniques, such as shadows and specular reflection, can easily be incorporated for enhanced visual cues. Another benefit is the ability to exploit texture maps that are much larger than texture memory (typically up to 64 Mbytes).

### 4.2 Optimizations

For the traversal of rays through the data, we use the incremental method described by Amanatides and Woo [30]. We found that traversing the cells is the computational bottleneck for large data sets, so we include optimizations to accelerate performance.

The first optimization is to improve data cache locality by organizing the volume into "bricks" that are analogous to the use of image tiles in image-processing software and other

volume rendering programs [31] (Fig. 14). The details of our method for efficiently indexing cells is discussed in [29].

The second is to use a multi-level spatial hierarchy to accelerate the traversal of empty cells as is shown in Figure 14. Cells are grouped divided into equal portions, and then a "macrocell'' is created that contains the minimum and maximum data value for its child cells. This is a common variant of standard ray-grid techniques [32] and the use of minimum/maximum caching has been shown to be useful [3,4,33]. The ray-isosurface traversal algorithm examines the min and max at each macrocell before deciding whether to recursively examine a deeper level or to proceed to the next cell. The average complexity of this search will be 0(tfn) for a three-level hierarchy. Although the worst-case complexity is still 0(n), it is difficult to imagine an isosurface occuring in practice approaching this worst case. Using a deeper hierarchy can theoretically reduce the average case complexity slightly, but also dramatically increases the storage cost of intermediate levels. We have experimented with modifying the number of levels in the hierarchy and empirically determined that a tri-level hierarchy (one top-level cell, two intermediate macrocell levels, and the data cells) is highly efficient. This optimum may be data dependent and is modifiable at program startup. Using a trilevel hierarchy, the storage overhead is negligible (< 0.5% of the data size). The cell sizes used in the hierarchy are independent of the brick sizes used for cache locality in the first optimization.

Since one cannot predict a priori the complexity of extracting an isosurface from a particular screen pixel, we employ a dynamic load balancing scheme to ensure high processor utilization over a wide range of views. The screen space is first split into tiles in the image space. In our implementation, tiles are 32 pixels wide by 4 pixels high. The width of the tile (128 bytes) ensures that tiles will not share a cache line with neighboring tiles. At the beginning of a frame, each tile becomes an assignment in a queue. Each processor pulls a range of assignments from the queue, performs the assigned work, and then returns to the queue for more work. The assignments, which are initially doled out in large chunks, get smaller and smaller as the frame nears completion. The large granularity in the beginning reduces contention for a large portion of the image, and the smaller granularity near the end helps to balance the load efficiently [34].

### 4.3 Real-Time Ray Tracing Results

Table 1 shows the scalability of the algorithm from 1 to 128 processors. View 2 uses a zoomed-out viewpoint with approximately 75% pixel coverage, whereas view 1 has nearly 100% pixel coverage. We chose to examine both cases since view 2 achieves higher frame rates. The higher frame rates cause less parallel efficiency due to synchronization and load balancing. Of course, maximum interaction is obtained with 128 processors, but reasonable interaction can be achieved with fewer processors. If a smaller number of processors were available, one could reduce the imagesize in order to restore the interactive rates. Efficiencies are 91% and 80% for views 1 and 2, respectively, on 128 processors. The reduced efficiency with

FIGURE 14 The ray-tracing algorithm uses two different hierarchies simultaneously. On the left, cells can be organized into "tiles" or "bricks" in memory to improve locality. The numbers in the first brick represent layout in memory. Neither the number of atomic voxels nor the number of bricks need be a power of 2. On the right is the hierarchy used to efficiently skip over empty cells. With a two-level hierarchy, rays can skip empty space by traversing larger cells. A three-level hierarchy is used for the Visible Woman example.

FIGURE 14 The ray-tracing algorithm uses two different hierarchies simultaneously. On the left, cells can be organized into "tiles" or "bricks" in memory to improve locality. The numbers in the first brick represent layout in memory. Neither the number of atomic voxels nor the number of bricks need be a power of 2. On the right is the hierarchy used to efficiently skip over empty cells. With a two-level hierarchy, rays can skip empty space by traversing larger cells. A three-level hierarchy is used for the Visible Woman example.

TABLE 1 Scalability results for ray tracing the bone isosurface in the visible human"

View 1 View 2

TABLE 1 Scalability results for ray tracing the bone isosurface in the visible human"

View 1 View 2

# cpus |

## Post a comment