## Medical Image Analysis with Deformable Models

Although deformable models were originally developed for application to problems in computer vision and computer graphics, their potential for use in medical image analysis has been quickly realized. They have been applied to images generated by imaging modalities as varied as X-ray, computed tomography (CT), angiography, magnetic resonance (MR), and ultrasound. Two-dimensional and three-dimensional deformable models have been used to segment, visualize, track, and quantify a variety of anatomic structures ranging in scale from the macroscopic to the microscopic. These include the brain, heart, face, cerebral, coronary and retinal arteries, kidney, lungs, stomach, liver, skull, vertebra, objects such as brain tumors, a fetus, and even cellular structures such as neurons and chromosomes. Deformable models have been used to track the nonrigid motion of the heart, the growing tip of a neurite, and the motion of erythrocytes. They have been used to locate structures in the brain, and to register images of the retina, vertebra, and neuronal tissue.

In the following sections, we review and discuss the application of deformable models to medical image interpretation tasks, including segmentation, matching, and motion analysis.

### 3.1 Image Segmentation with Deformable Curves

The segmentation of anatomic structures—the partitioning of the original set of image points into subsets corresponding to the structures—is an essential first stage of most medical image analysis tasks, such as registration, labeling, and motion tracking. These tasks require anatomic structures in the original image to be reduced to a compact, analytic representation of their shapes. Performing this segmentation manually is extremely labor intensive and time-consuming. A primary example is the segmentation of the heart, especially the left ventricle (LV), from cardiac imagery. Segmentation of the left ventricle is a prerequisite for computing diagnostic information such as ejection-fraction ratio, ventricular volume ratio, and heart output, and for wall motion analysis that provides information on wall thickening, etc. [123].

Most clinical segmentation is currently performed using manual slice editing. In this scenario, a skilled operator, using a computer mouse or trackball, manually traces the region of interest on each slice of an image volume. Manual slice editing suffers from several drawbacks. These include the difficulty in achieving reproducible results, operator bias, forcing the operator to view each 2-D slice separately to deduce and measure the shape and volume of 3-D structures, and operator fatigue.

Segmentation using traditional low-level image processing techniques, such as thresholding, region growing, edge detection, and mathematical morphology operations, also requires considerable amounts of expert interactive guidance. Furthermore, automating these model-free approaches is difficult because of the shape complexity and variability within and across individuals. In general, the underconstrained nature of the segmentation problem limits the efficacy of approaches that consider local information only. Noise and other image artifacts can cause incorrect regions or boundary discontinuities in objects recovered by these methods.

A deformable model-based segmentation scheme, used in concert with image preprocessing, can overcome many of the limitations of manual slice editing and traditional image processing techniques. These connected and continuous geometric models consider an object boundary as a whole and can make use of a priori knowledge of object shape to constrain the segmentation problem. The inherent continuity and smoothness of the models can compensate for noise, gaps, and other irregularities in object boundaries. Furthermore, the parametric representations of the models provide a compact, analytical description of object shape. These properties lead to an efficient, robust, accurate, and reproducible technique for linking sparse or noisy local image features into a complete and consistent model of the object.

Among the first and primary uses of deformable models in medical image analysis was the application of deformable contour models, such as snakes [73], to segment structures in 2-D images [12,18,26,29,33,61,80,88,111,120,145]. Typically users initialized a deformable model near the object of interest (Fig. 3) and allowed it to deform into place. Users could then exploit the interactive capabilities of these models and manually fine-tune them. Furthermore, once the user is satisfied with the result on an initial image slice, the fitted contour model may then be used as the initial boundary approximation for neighboring slices. These models are then deformed into place and again propagated until all slices have been processed. The resulting sequence of 2D contours can then be connected to form a continuous 3D surface model [23,26,29,86].

The application of snakes and other similar deformable contour models to extract regions of interest is, however, not without limitations. For example, snakes were designed as interactive models. In noninteractive applications, they must be initialized close to the structure of interest to guarantee good performance. The internal energy constraints of snakes can limit their geometric flexibility and prevent a snake from representing long tubelike shapes and shapes with significant protrusions or bifurcations. Furthermore, the topology of the

FIGURE 3 (a) Intensity CT image slice of canine LV. (b) Edge detected image, (c) Initial snake, (d)-(f ) Snake deforming toward LV boundary, driven by "inflation" force [95],

FIGURE 3 (a) Intensity CT image slice of canine LV. (b) Edge detected image, (c) Initial snake, (d)-(f ) Snake deforming toward LV boundary, driven by "inflation" force [95], structure of interest must be known in advance, since classical deformable contour models are parametric and are incapable of topological transformations without additional machinery.

Various methods have been proposed to improve and further automate the deformable contour segmentation process. Cohen and Cohen [26] used an internal "inflation" force to expand a snakes model past spurious edges toward the real edges of the structure, making the snake less sensitive to initial conditions (inflation forces were also employed in [138]). Amini et al. [2] use dynamic programming to carry out a more extensive search for global minima. Poon et al. [115] and Grzeszczuk and Levin [58] minimize the energy of active contour models using simulated annealing, which is known to give global solutions and allows the incorporation of non-differentiable constraints.

Other researchers [21,51,60,65,69,71,92,119] have integrated region-based information into deformable contour models or used other techniques in an attempt to decrease sensitivity to insignificant edges and initial model placement. For example, Poon et al. [115] use a discriminant function to incorporate region-based image features into the image forces of their active contour model. The discriminant function allows the inclusion of additional image features in the segmentation and serves as a constraint for global segmentation consistency (i.e., every image pixel contributes to the discriminant function).

Several researchers [19,20,77,81,90,91,99,108,122,153, 157] have been developing topology-independent shape modeling schemes that are not only less sensitive to initial conditions, but also allow a deformable contour or surface model to represent long tubelike shapes or shapes with bifurcations (Fig. 4), and to dynamically sense and change its topology (Fig. 5).

Finally, another development is a snakelike technique known as "live-wire" [11,46]. This semiautomatic boundary tracing technique computes and selects optimal boundaries at interactive rates as the user moves a mouse, starting from a user-specified seed point. When the mouse is moved close to an object edge, a live-wire boundary snaps to and wraps around the object of interest. The live-wire method has also been combined with snakes, yielding a segmentation tool that exploits the best properties of both techniques [84,85].

3.2 Volume Image Segmentation with Deformable Surfaces

Segmenting 3-D image volumes slice by slice using manual slice editing (or image processing techniques) is a laborious process and requires a postprocessing step to connect the sequence of 2-D contours into a continuous surface. Furthermore, the resulting surface reconstruction can contain inconsistencies or show rings or bands. As described in the previous section, the

FIGURE 4 Image sequence of clipped angiogram of retina showing an automatically subdividing snake flowing and branching along a vessel [96].

FIGURE 4 Image sequence of clipped angiogram of retina showing an automatically subdividing snake flowing and branching along a vessel [96].

FIGURE 5 Segmentation of a cross sectional image of a human vertebra phantom with a topologically adaptable snake [96]. The snake begins as a single closed curve and becomes three closed curves.

application of a 2-D active contour model to an initial slice and the propagation of the model to neighboring slices can significantly improve the volume segmentation process. However, the use of a true 3-D deformable surface model can potentially result in even greater improvements in efficiency and robustness, and it also ensures that a globally smooth and coherent surface is produced between image slices.

Deformable surface models in 3-D were first used in computer vision [138]. Many researchers have since explored the use of deformable surface models for segmenting structures in medical image volumes. Miller [101] constructs a polygonal approximation to a sphere and geometrically deforms this "balloon" model until the balloon surface conforms to the object surface in 3-D CT data. The segmentation process is formulated as the minimization of a cost function where the desired behavior of the balloon model is determined by a local cost function associated with each model vertex. The cost function is a weighted sum of three terms: a deformation potential that "expands" the model vertices toward the object boundary, an image term that identifies features such as edges and opposes the balloon expansion, and a term that maintains the topology of the model by constraining each vertex to remain close to the centroid of its neighbors.

Cohen et al. [26,28] and Mclnerney and Terzopoulos [95] use finite element and physics-based techniques to implement an elastically deformable cylinder and sphere, respectively. The models are used to segment the inner wall ofthe left ventricle of the heart from MR or CT image volumes (Fig. 6). These deformable surfaces are based on a thin-plate-under-tension surface spline, the higher dimensional generalization of Eq. (2), which controls and constrains the stretching and bending of the surface. The models are fitted to data dynamically by integrating Lagrangian equations of motion through time in order to adjust the deformational degrees of freedom. Furthermore, the finite element method is used to represent the models as a continuous surface in the form of weighted sums of local polynomial basis functions. Unlike Miller's [101]

polygonal model, the finite element method provides an analytic surface representation, and the use of high-order polynomials means that fewer elements are required to accurately represent an object. Pentland and Sclaroff [113] and Nastar and Ayache [106] also develop physics-based models but use a reduced modal basis for the finite elements (see Section 3.5).

Staib and Duncan [127] describe a 3D surface model used for geometric surface matching to 3D medical image data. The model uses a Fourier parameterization that decomposes the surface into a weighted sum of sinusoidal basis functions. Several different surface types are developed, such as tori, open surfaces, closed surfaces, and tubes. Surface finding is formulated as an optimization problem using gradient ascent that attracts the surface to strong image gradients in the vicinity of the model. An advantage ofthe Fourier parameterization is that it allows a wide variety of smooth surfaces to be described with a small number ofparameters. That is, a Fourier representation expresses a function in terms of an orthonormal basis, and higher indexed basis functions in the sum represent higher spatial variation. Therefore, the series can be truncated and still represent relatively smooth objects accurately.

In a different approach, Szeliski et al. [132] use a dynamic, self-organizing oriented particle system to model surfaces of objects. The oriented particles, which can be visualized as small, flat disks, evolve according to Newtonian mechanics and interact through external and interparticle forces. The external forces attract the particles to the data while interparticle forces attempt to group the particles into a coherent surface. The particles can reconstruct objects with complex shapes and topologies by "flowing" over the data, extracting and conforming to meaningful surfaces. A triangulation is then performed, which connects the particles into a continuous global model that is consistent with the inferred object surface.

Other notable work involving 3D deformable surface models and medical image applications can be found in [20,33,37,38,98, 107,134,153,156,162], as well as several models described in the following sections.

FIGURE 6 (a) Deformable "balloon" model embedded in volume image deforming toward LV edges of canine heart. (b) Reconstruction of LV [95].

### 3.3 Incorporating A Priori Knowledge

In medical images, the general shape, location, and orientation of an anatomical structure is known, and this knowledge may be incorporated into the deformable model in the form of initial conditions, data constraints, and constraints on the model shape parameters, or into the model fitting procedure. The use of implicit or explicit anatomical knowledge to guide shape recovery is especially important for robust automatic interpretation of medical images. For automatic interpretation, it is essential to have a model that not only describes the size, shape, location, and orientation of the target object, but that also permits expected variations in these characteristics. Automatic interpretation of medical images can relieve clinicians from the labor-intensive aspects of their work

FIGURE 6 (a) Deformable "balloon" model embedded in volume image deforming toward LV edges of canine heart. (b) Reconstruction of LV [95].

while increasing the accuracy, consistency, and reproducibility of the interpretations. In this section, and in the following sections on matching and motion tracking, we describe several deformable model techniques that incorporate prior anatomical knowledge in different ways.

A number of researchers have incorporated knowledge of object shape into deformable models by using deformable shape templates. These models usually use "hand-crafted" global shape parameters to embody a priori knowledge of expected shape and shape variation of the structures and have been used successfully for many applications of automatic image interpretation. The idea of deformable templates can be traced back to the early work on spring-loaded templates by Fischler and Elshlager [49]. An excellent example in computer vision is the work of Yuille et al. [161], who construct deformable templates for detecting and describing features of faces, such as the eye. In an early example from medical image analysis, Lipson etal. [87] note that axial cross-sectional images of the spine yield approximately elliptical vertebral contours and consequently extract the contours using a deformable ellipsoidal template. More recently, Montagnat and Delingette [103] used a deformable surface template of the liver to segment it from abdominal CT scans, and a ventricle template to segment the ventricles of the brain from MR images.

Deformable models based on superquadrics are another example of deformable shape templates that are gaining in popularity in medical image research. Superquadrics contain a small number of intuitive global shape parameters that can be tailored to the average shape of a target anatomic structure. Furthermore, the global parameters can often be coupled with local shape parameters such as splines, resulting in a powerful shape representation scheme. For example, Metaxas and Terzopoulos [100] employ a dynamic deformable superquadric model [136] to reconstruct and track human limbs from 3D biokinetic data. Their models can deform both locally and globally by incorporating the global shape parameters of a superellipsoid with the local degrees of freedom of a membrane spline in a Lagrangian dynamics formulation. The global parameters efficiently capture the gross shape features of the data, while the local deformation parameters reconstruct the fine details of complex shapes. Using Kalman filtering theory, they develop and demonstrate a biokinetic motion tracker based on their deformable superquadric model.

Vemuri et al. [148,149] construct a deformable superquadric model in an orthonormal wavelet basis. This multiresolution basis provides the model with the ability to continuously transform from local to global shape deformations, thereby allowing a continuum of shape models to be created and to be represented with relatively few parameters. They apply the model to segment and reconstruct anatomical structures in the human brain from MRI data.

As a final example, Bardinet et al. [10] fit a deformable superquadric to segmented 3D cardiac images and then refine the superquadric fit using a volumetric deformation technique known as free-form deformations (FFDs). FFDs are defined by tensor product trivariate splines and can be visualized as a rubber-like box in which the object to be deformed (in this case the superquadric) is embedded. Deformations of the box are automatically transmitted to embedded objects. This volumetric aspect of FFDs allows two superquadric surface models to be simultaneously deformed in order to reconstruct the inner and outer surfaces of the left ventricle of the heart and the volume in between these surfaces. Further examples of deformable superquadrics can be found in [24,112] (see Section 3.5). Further examples of FFD-based (or FFD-like) deformable models for medical image segmentation can be found in [89,94].

Several researchers cast the deformable model fitting process in a probabilistic framework (see Section 2.4) and include prior knowledge of object shape by incorporating prior probability distributions on the shape variables to be estimated [52,126,148,155]. For example, Staib and Duncan [126] use a deformable contour model on 2D echocardiograms and MR images to extract the LV of the heart and the corpus callosum of the brain, respectively. This closed contour model is parameterized using an elliptic Fourier decomposition, and a priori shape information is included as a spatial probability expressed through the likelihood of each model parameter. The model parameter probability distributions are derived from a set of example object boundaries and serve to bias the contour model toward expected or more likely shapes.

Szekely et al. [131] have also developed Fourier parameterized models. Furthermore, they have added elasticity to their models to create "Fourier snakes" in two dimensions and elastically deformable Fourier surface models in three dimensions. By using the Fourier parameterization followed by a statistical analysis of a training set, they define mean organ models and their eigen-deformations. An elastic fit of the mean model in the subspace of eigenmodes restricts possible deformations and finds an optimal match between the model surface and boundary candidates.

Cootes et al. [30] and Hill et al. [66] present a statistically based technique for building deformable shape templates and use these models to segment various organs from 2D and 3D medical images. The statistical parameterization provides global shape constraints and allows the model to deform only in ways implied by the training set. The shape models represent objects by sets of landmark points that are placed in the same way on an object boundary in each input image. For example, to extract the LV from echocardiograms, they choose points around the ventricle boundary, the nearby edge of the right ventricle, and the top of the left atrium. The points can be connected to form a deformable contour. By examining the statistics of training sets of hand-labeled medical images, and using principal component analysis, a shape model is derived that describes the average positions and the major modes of variation of the object points. New shapes are generated using the mean shape and a weighted sum of the major modes of variation. Object boundaries are then segmented using this "point distribution model" by examining a region around each model point to calculate the displacement required to move it toward the boundary. These displacements are then used to update the shape parameter weights. An example of the use of this technique for segmenting MR brain images can be found in [43].

### 3.4 Matching

Matching of regions in images can be performed between the representation of a region and a model (labeling) or between the representation of two distinct regions (registration). Registration of 2D and 3D medical images is necessary in order to study the evolution of a pathology in an individual, or to take full advantage of the complementary information coming from multimodality imagery. Examples of the use of deformable models to perform medical image registration are found in [14,48,59,63,78,104, 105,141]. These techniques primarily consist of constructing highly structured descriptions for matching. This operation is usually carried out by extracting regions ofinterest with an edge detection algorithm, followed by the extraction of landmark points or characteristic contours (or curves on extracted boundary surfaces in the case of 3D data). In three dimensions, these curves usually describe differential structures such as ridges, or topological singularities. An elastic matching algorithm can then be applied between corresponding pairs of curves or contours where the "start" contour is iteratively deformed to the "goal" contour using forces derived from local pattern matches with the goal contour [105].

An example of matching where the use of explicit a priori knowledge has been embedded into deformable models is the automatic extraction and labeling of anatomic structures in the brain from MR images, or the registration of multimodality brain images. The anatomical knowledge is made explicit in the form of a 3D brain atlas. The atlas is modeled as a physical object and is given elastic properties. After an initial global alignment, the atlas deforms and matches itself onto corresponding regions in the brain image volume in response to forces derived from image features. The assumption underlying this approach is that at some representational level, normal brains have the same topological structure and differ only in shape details. The idea of modeling the atlas as an elastic object was originated by Broit [17], who formulated the matching process as a minimization of a cost function. Subsequently, Bajcsy and Kovacic [8] implemented a multiresolution version of Broit's system where the deformation ofthe atlas proceeds step-by-step in a coarse to fine strategy, increasing the local similarity and global coherence. The elastically deformable atlas technique is very promising and consequently has become a very active area of research that is being explored by several groups [15,16,25,34-36,45, 52, 93,121,125,130, 142,146,150].

The automatic brain image matching problem is extremely challenging and there are many hurdles that must be overcome

before the deformable atlas technique can be adopted for clinical use. For example, the technique is sensitive to the initial positioning of the atlas—if the initial rigid alignment is off by too much, then the elastic match may perform poorly. The presence of neighboring features may also cause matching problems—the atlas may warp to an incorrect boundary. Finally, without user interaction, the atlas can have difficulty converging to complicated object boundaries. A proposed solution to these problems is to use image preprocessing in conjunction with the deformable atlas. Sandor and Leahy [121] use this approach to automatically label regions of the cortical surface that appear in 3D MR images of human brains (Fig. 7). They automatically match a labeled deformable atlas model to preprocessed brain images, where preprocessing consists of 3D edge detection and morphological operations. These filtering operations automatically extract the brain and sulci (deep grooves in the cortical surface) from an MR image and provide a smoothed representation of the brain surface to which their 3D B-spline deformable surface model can rapidly converge.

### 3.5 Motion Tracking and Analysis

The idea of tracking objects in time-varying images using deformable models was originally proposed in the context of computer vision [73,138]. Deformable models have been used to track nonrigid microscopic and macroscopic structures in motion, such as blood cells [83] and neurite growth cones [62] in cinemicroscopy, as well as coronary arteries in cineangio-

graphy [82]. However, the primary use of deformable models for tracking in medical image analysis is to measure the dynamic behavior of the human heart, especially the left ventricle. Regional characterization of the heart wall motion is necessary to isolate the severity and extent of diseases such as ischemia. Magnetic resonance and other imaging technologies can now provide time-varying three-dimensional images of the heart with excellent spatial resolution and reasonable temporal resolutions. Deformable models are well suited for this image analysis task.

In the simplest approach, a 2D deformable contour model is used to segment the LV boundary in each slice of an initial image volume. These contours are then used as the initial approximation of the LV boundaries in corresponding slices of the image volume at the next time instant and are then deformed to extract the new set of LV boundaries [5,53,64,123,145]. This temporal propagation of the deform-able contours dramatically decreases the time taken to segment the LV from a sequence of image volumes over a cardiac cycle. Singh et al. [123] report a time of 15 minutes to perform the segmentation, considerably less than the 1.5-2 hours that a human expert takes for manual segmentation. Deformable contour models have also been succesfully used to track the LV boundary in noisy echocardiographic image sequences [22,70].

McInerney and Terzopoulos [95] have applied the temporal propagation approach in three dimensions using a 3D dynamic deformable "balloon" model to track the contractile motion of the LV (Figs 8 and 9).

In a more involved approach, Amini and Duncan [1] use bending energy and surface curvature to track and analyze LV motion. For each time instant, two sparse subsets of surface points are created by choosing geometrically significant landmark points, one for the endocardial surface, and the other for the epicardial surface of the LV. Surface patches surrounding these points are then modeled as thin, flexible plates. Making the assumption that each surface patch deforms only slightly and locally within a small time interval, for each sampled point on the first surface they construct a search area on the LV surface in the image volume at the next time instant. The best matched (i.e., minimum bending energy) point within the search window on the second surface is taken to correspond to the point on the first surface. This matching process yields a set of initial motion vectors for pairs of LV surfaces derived from a 3D image sequence. A smoothing procedure is then performed using the initial motion vectors to generate a dense motion vector field over the LV surfaces.

Cohen et al. [27] also employ a bending energy technique in two dimensions and attempt to improve on this method by adding a term to the bending energy function that tends to preserve the matching of high curvature points. Goldgof et al. [46,68,72,102] have also been pursuing surface shape matching ideas primarily based on changes in Gaussian curvature and assume a conformal motion model (i.e.,

FIGURE 9 Tracking of the LV motion of canine heart during one cardiac cycle (1-6) using deformable balloon model [95].

motion that preserves angles between curves on a surface but not distances).

An alternative approach is that of Chen et al. [24], who use a hierarchical motion model of the LV constructed by combining a globally deformable superquadric with a locally deformable surface using spherical harmonic shape modeling primitives. Using this model, they estimate the LV motion from angio-graphic data and produce a hierarchical decomposition that characterizes the LV motion in a coarse-to-fine fashion.

Pentland and Horowitz [112] and Nastar and Ayache [106] are also able to produce a coarse-to-fine characterization of the LV motion. They use dynamic deformable models to track and recover the LV motion and make use of modal analysis, a well-known mechanical engineering technique, to parameterize their models. This parameterization is obtained from the eigenvectors of a finite element formulation of the models. These eigenvectors are often referred to as the "free vibration'' modes and variable detail of LV motion representation results from varying the number of modes used.

The heart is a relatively smooth organ and consequently there are few reliable landmark points. The heart also undergoes complex nonrigid motion that includes a twisting (tangential) component as well as the normal component of motion. The motion recovery methods described previously are, in general, not able to capture this tangential motion without additional information. Magnetic resonance techniques, based on magnetic tagging [4] have been developed to track material points on the myocardium in a noninvasive way. The temporal correspondence of material points that these techniques provide allow for quantitative measurement of tissue motion and deformation, including the twisting component of the LV motion. Several researchers have applied deformable models to image sequences of MR tagged data [3,42,75,76,110,159]. For example, Amini et al. [3] and Kumar and Goldgof [76] use a 2D deformable grid to localize and track SPAMM (Spatial Modulation of Magnetization) tag points on the LV tissue. Park et al. [109,110] fit a volumetric physics-based deformable model to MRI-SPAMM data of the LV. The parameters of the model are functions that can capture regional shape variations of the LV such as bending, twisting, and contraction. Based on this model, the authors quantitatively compare normal hearts and hearts with hypertrophic cardiomyopathy.

Another problem with most of the methods described previously is that they model the endocardial and epicardial surfaces of the LV separately. In reality the heart is a thick-walled structure. Duncan et al. [42] and Park et al. [109,110] develop models that consider the volumetric nature of the heart wall. These models use the shape properties of the endocardial and epicardial surfaces and incorporate mid-wall displacement information of tagged MR images. By constructing 3D finite element models of the LV with nodes in the mid-wall region as well as nodes on the endocardial and epicardial surfaces, more accurate measurements of the LV

motion can be obtained. Young etal. [158,160] and Creswell et al. [31] have also constructed 3D finite element models from the boundary representations of the endocardial and epicardial surfaces.

## Post a comment