## Volume Rendering Optimizations

Volume rendering can produce informative images that can be useful in data analysis, although a major drawback of the techniques described earlier is the time required to generate a high-quality image. In this section, several volume rendering optimizations are described that decrease rendering times, and therefore increase interactivity and productivity. Other optimizations have been discussed briefly earlier in the paper, along with the original algorithms. Another way to speed up volume rendering is to employ special-purpose hardware accelerators for volume rendering, as described in Section 7.

Object-order volume rendering typically loops through data, calculating the contribution of each volume sample to pixels on the image plane. This is a costly operation for moderate to large sized data sets (e.g., 128 Mbytes for a 5123 sample data set, with one byte per sample), leading to rendering times that are non-interactive. Viewing the intermediate results in the image plane may be useful, but these partial image results are not always representatives of the final image. For the purpose of interaction, it is useful to be able to generate a lower quality image in a shorter amount of time. For data sets with binary sample values, bits could be packed into bytes such that each byte represents a 2 x 2 x 2 portion of the data [72]. Data would be processed bit by bit to generate the full resolution image, but a lower resolution image could be generated by processing data byte-by-byte. If more than four bits of the byte are set, the byte is considered to represent an element of the object; otherwise it represents the background. This would produce an image with one-half the linear resolution in approximately one-eighth the time.

A more general method for decreasing data resolution is to build a pyramid data structure, which for an original data set of N3 data samples, consists of a sequence of log N volumes. The first volume is the original data set, while the second volume is created by averaging each 2x 2x 2 group of samples of the original data set to create a volume of one-eighth the resolution. The third volume is created from the second volume in a similar fashion, with this process continuing until all log N volumes have been created. An efficient implementation of the splatting algorithm, called hierarchical splatting [38], uses such a pyramid data structure. Depending on the desired image quality, this algorithm scans the appropriate level of the pyramid in a back-to-front order. Each element is splatted onto the image plane using the appropriate sized splat. The splats themselves are approximated by polygons that can be efficiently rendered by graphics hardware.

Image-order volume rendering involves casting rays from the image plane into the data, and sampling along the ray in order to determine pixel values. The idea of a pyramid can also be used here. Actually, Wang and Kaufman [78] have proposed the use of multi-resolution hierarchy at arbitrary resolutions. In discrete ray casting, the ray would be discretized, and the contribution from each voxel along the path considered when producing the final pixel value. It would be quite computationally expensive to discretize every ray cast from the image plane. Fortunately, this is unnecessary for parallel projections. Since all the rays are parallel, one ray can be discretized into a 26-connected line and used as a "template" for all other rays. This technique, developed by Yagel and Kaufman [85], is called template-based volume viewing. If this template were used to cast a ray from each pixel in the image plane, some voxels in the data may contribute to the image twice while others may not be considered at all. To solve this problem, the rays are cast instead from a baseplane, that is, the plane of the volume buffer most parallel to the image plane. This ensures that each data sample can contribute at most once to the final image, and all data samples could potentially contribute. Once all the rays have been cast from the base plane, a simple final step of resampling is needed, which uses bilinear interpolation to determine the pixel values on the image plane from the ray values that have been calculated on the base plane.

An extension can be made to this template-based ray casting to allow higher-order interpolation [86]. The template for higher-order interpolation consists of connected cells, as opposed to the connected voxel template used for zero-order interpolation. Since the value varies within a cell, it is desirable to take multiple samples along the continuous ray inside of each cell. Since these samples are taken at regular intervals, and the same template is used for every ray, there is only a finite number of 3D locations (relative to a cell) at which sampling occurs. This fact allows us to precompute part of the interpolation function and store it in a table, allowing for faster rendering times.

Another extension to template-based ray casting allows for screen space supersampling to improve image quality [82]. This is accomplished by allowing rays to originate at subpixel locations. A finite number of subpixel locations from which a ray can originate are selected, and a template is created for each. When a ray is cast, its subpixel location determines which template is used. For example, to accomplish a 2x 2 uniform supersampling, four rays would be cast per pixel, and therefore four subpixel locations are possible. Stochastic supersampling can also be supported by limiting the possible ray origins to a finite number of subpixel locations, and precomputing a template for each.

Lacroute and Levoy [36] extended the previous ideas in an algorithm called shear-warp factorization. It is based on the algorithm that factors the viewing transformation into a 3D shear parallel to the data slices, a projection to form an intermediate but distorted image, and a 2D warp to form an undistorted final image. The algorithm is extended in three ways. First, a fast object-order rendering algorithm, based on the factorization algorithms with preprocessing and some loss of image quality, has been developed. Shear-warp factorization has the property that rows of voxels in the volume are aligned with rows of pixels in the intermediate image. Consequently, a scanline-based algorithm has been constructed that traverses the volume and the intermediate image in synchrony, taking advantage of the spatial coherence present in both. Spatial data structures based on run-length encoding for both the volume and the intermediate image are used. An implementation running on an SGI Indigo workstation renders a 2563 voxel data set in 1 second. The second extension is shear-warp factorization for perspective viewing transformations. Third, a data structure for encoding spatial coherence in unclassified volumes (i.e., scalar fields with no precomputed opacity) has been introduced. When combined with the shear-warp rendering algorithm, this data structure supports classification and rendering of a 2563 voxel volume in 3 seconds. The method extends to support mixed volumes and geometry and is parallelizable [37].

One obvious optimization for both discrete and continuous ray casting that has already been discussed is to limit the sampling to the segment of the ray that intersects the data, since samples outside of the data evaluate to 0 and do not contribute to the pixel value. If the data itself contains many zero-valued data samples, or a segmentation function is applied to the data that evaluates to 0 for many samples, the efficiency of ray casting can be greatly enhanced by further limiting the segment of the ray in which samples are taken. One algorithm of this sort is known as poZygon assisted ray casting, or PA.RC [1]. This algorithm approximates objects contained within a volume using a crude polyhedral representation. The polyhedral representation is created so that it completely contains the objects. Using conventional graphics hardware, the polygons are projected twice to create two Z-buffers. The first Z-buffer is the standard closest-distance Z-buffer, while the second is a farthest-distance Z-buffer. Since the object is completely contained within the representation, the two Z-buffer values for a given image plane pixel can be used as the starting and ending points of a ray segment on which samples are taken.

The PARC algorithm is part of the VoZVis volume visualization system [1,2], which provides a multialgorithm progressive refinement approach for interactivity. By using available graphics hardware, the user is given the ability to interactively manipulate a polyhedral representation of the data. When the user is satisfied with the placement of the data, light sources, and view, the Z-buffer information is passed to the PARC algorithm, which produces a ray-cast image. In a final step, this image is further refined by continuing to follow the PARC rays that intersected the data according to a volumetric ray tracing algorithm [64] in order to generate shadows, reflections, and transparency (see Section 6.1). The ray tracing algorithm uses various optimization techniques, including uniform space subdivision and bounding boxes, to increase the efficiency of the secondary rays. Surface rendering, as well as transparency with color and opacity transfer functions, are incorporated within a global illumination model.

Another higher-performance presence-acceleration ray casting algorithm has been developed by Wan et aZ. [77]. A

highly accurate estimation for object presence is obtained by projecting all grid cells associated with the object boundary on the image plane. Memory space and access time are reduced by run-length encoding of the boundary cells, while boundary cell projection time is reduced by exploiting projection templates and multiresolution volumes. It further uses task partitioning schemes for effective parallelization of both boundary cell projection and ray traversal procedures.

## Post a comment