British Journal of Radiology (2004) 77, S133-S139
© 2004 British Institute of Radiology
doi: 10.1259/bjr/20343922
Anatomical statistical models and their role in feature extraction
T F Cootes, PhD
and
C J Taylor, PhD
Imaging Science and Biomedical Engineering, University of Manchester, UK
 |
Abstract
|
|---|
A detailed model of the shape of anatomical structures can significantly improve the ability to segment such structures from medical images. Statistical models representing the variation of shape and appearance can be constructed from suitably annotated training sets. Such models can be used to synthesize images of anatomy, and to search new images to accurately locate the structures of interest, even in the presence of noise and clutter. In this paper we summarize recent work on constructing and using such models, and demonstrate their application to several domains.
 |
Introduction
|
|---|
In order to understand a complex medical image, it is necessary to have some model of what structures are expected to be present. Over recent years there have been considerable advances in computer vision, including the development of methods of representing complex structures and importantly, the ways in which such structures can vary from one image to another. Such "model-based" approaches allow the encapsulation of prior knowledge of anatomy. They can, in principle, be used to resolve the potential confusion caused by structural complexity, provide tolerance to noisy or missing data, and provide a means of labelling the recovered structures.
Of particular interest are generative models that is, models sufficiently complete that they are able to generate realistic images of target objects. An example would be a face model capable of generating convincing images of any individual, changing their expression and so on. Using such a model, image interpretation can be formulated as a matching problem: given an image to interpret, structures can be located and labelled by adjusting the model's parameters in such a way that it generates a synthetic image which is as similar as possible to the real thing.
A powerful approach is to learn the structures and their variation from a suitably annotated training set of typical images. We describe below how statistical models can be constructed to represent both the shape and the "texture" (the pattern of pixel intensities) of examples of structures of interest. These models can generalize from the training set and be used to match to new images, locating the structure in the images. They have been shown to be powerful tools for interpreting medical images, and have been applied to a wide variety of problems.
In the following we will review the approach and describe a number of practical applications.
 |
Background
|
|---|
In recent years there has been considerable interest in methods that use deformable models of expected structure to interpret images. Such models can achieve robust segmentation by constraining the possible shapes. For a comprehensive review of work in this field there are recent surveys of image registration methods and deformable models in medical image analysis [1, 2]. We give here a brief review covering some of the more important points.
Model matching algorithms can be crudely classified as "shape based", in which a deformable model represents, and matches to, boundaries or other sparse features, and "appearance based", in which the model represents the whole image region covered by the structure.
Various approaches to modelling variability have been described previously. The most common general approach is to allow a prototype to vary according to some physical model. Kass and Witkin [3] describe "snakes" which deform elastically to fit shape contours. Park et al [4] and Pentland and Sclaroff [5] both represent prototype objects using finite element methods and describe variability in terms of vibrational modes. Alternative approaches include representation of shapes using sums of trigonometric functions with variable coefficients [6, 7] and parameterized models, hand-crafted for particular applications [8, 9].
Image registration [2] can be used to match a single image to a new image either rigidly or allowing non-rigid deformations. Any annotations on the original image can then be projected onto the new image. In this case typically the texture is fixed but the shape is allowed to vary.
An extension is to match a model image (or anatomical atlas) to a target image, in order to interpret the latter. For instance Bajcsy and Kovacic [10] describe a volume model (of the brain) that also deforms elastically to generate new examples.
Christensen et al use a viscous flow model of deformation, and incorporate statistical information about local deformations [11, 12].
Kirby and Sirovich [13] have described statistical modelling of grey-level appearance (particularly for face images) but did not address shape variability.
Wang and Staib [14] have incorporated statistical shape information into an image-based elastic matching approach.
 |
Statistical models of appearance
|
|---|
An appearance model can represent both the shape and texture variability seen in a training set. The training set consists of labelled images, where key landmark points are marked on each example object. For instance, to build a model of the subcortical structures in two dimensional (2D) MR images of the brain we need a number of images marked with points at key positions to outline the main features (Figure 1
).

View larger version (154K):
[in this window]
[in a new window]
|
Figure 1. Example of MR brain slice labelled with 123 landmark points around the ventricles, the caudate nucleus and the lentiform nucleus.
|
|
Given such a set we can generate a statistical model of shape variation by applying principal component analysis (PCA) to the set of vectors describing the shapes in the training set (see [15] for details). The labelled points, x, on a single object describe the shape of that object. Any example can then be approximated using:
where
is the mean shape vector, Ps is a set of orthogonal modes of shape variation and bs is a vector of shape parameters.
To build a statistical model of the grey-level appearance we warp each example image so that its control points match the mean shape (using a triangulation algorithm). We then sample the intensity information from the shape-normalized image over the region covered by the mean shape. To minimize the effect of global lighting variation, we normalize the resulting samples.
By applying PCA to the normalized data we obtain a linear model:
where
is the mean normalized grey-level vector, Pg is a set of orthogonal modes of intensity variation and bg is a set of grey-level parameters.
The shape and appearance of any example can thus be summarized by the vectors bs and bg. In some cases we can treat the shape and the texture as independent, however in others there may be correlations between the shape and grey-level variations. In such cases we concatenate the vectors, apply a further PCA and obtain a model of the form
where Ws is a diagonal matrix of weights for each shape parameter, allowing for the difference in units between the shape and grey models, Q is a set of orthogonal modes and c is a vector of appearance parameters controlling both the shape and grey-levels of the model. Since the shape and grey-model parameters have zero mean, so does c.
Note that the linear nature of the model allows us to express the shape and grey-levels directly as functions of c
A shape in the image frame, X, can be generated by applying a suitable transformation to the points, x : X=St(x). Typically St will be a linear transformation such as euclidean, similarity or affine.
The texture in the image frame is generated by applying a scaling and offset to the intensities, gim=Tu(g)=(u1+1)gim+u21, where u is a vector of transformation parameters.
A full reconstruction is given by generating the texture in a mean shaped patch, then warping it so that the model points lie on the image points, X.
For instance, Figure 2
shows the effects of varying the first two shape model parameters, bs1, bs2, of a model trained on a set of 72 2D MR images of the brain, labelled as shown in Figure 1
. Figure 3
shows the effects of varying the first two appearance model parameters, c1, c2, which change both the shape and the texture component of the synthesized image.
Sparse representations
Often we wish to locate the boundaries of structures, and are less interested in the details of their internal appearance. In this case, rather than modelling the intensities across the whole region, we can model them in the proximity of the boundary. A simple method for doing this is to sample short profiles through the model points, normal to the boundary. Concatenating these samples together leads to a texture vector, g, which can be manipulated as described above [16]. Figure 4
gives an example of the mean of a model constructed in this way. When this is matched to the image, it concentrates on matching boundary structure. This can lead to more accurate and efficient search.
 |
Image interpretation
|
|---|
The models of shape and texture outlined above can be used to segment structures in new images. The approach is to match the model to the image by optimizing some objective function which measures the quality of fit between the current model example and the target image. Once the optimal model parameters are obtained, the resulting model points define the outlines of the structures of interest, or the positions of desired landmark points.
A widely used technique for matching the shape models to the image is the "active shape model" (ASM) [15, 17]. This attempts to match model points by searching around the current estimate of each point for a good fit, then constraining the set of matches with a shape model. It has been found to be a powerful technique, but has generally been superseded by the "active appearance model"(AAM) algorithm (described below), which tends to be more reliable.
In the following we give a brief overview of an algorithm for rapidly matching such appearance models to images (the AAM algorithm). A more comprehensive description is given by Cootes et al [18]. An AAM contains two main components: a parameterized model of object appearance, and an estimate of the relationship between parameter errors and induced image residuals.
Overview of AAM search
The appearance model parameters, c, and shape transformation parameters, t, define the position of the model points in the image frame, X, which gives the shape of the image patch to be represented by the model. During matching we sample the pixels in this region of the image, gim, and project into the texture model frame, gs=T-1(gim). The model texture is given by
. The difference between model and image (measured in the normalized texture frame) is thus
where p are the parameters of the model, pT=(cT|tT|uT).
A simple scalar measure of difference is the sum of squares of elements of r, E(p)=rTr.
A first order Taylor expansion of Equation 5
gives
where the ijth element of matrix
is
.
Suppose during matching our current residual is r. We wish to choose
p so as to minimize |r(p+
p)|2. By equating Equation 6
to zero we obtain the root mean squared solution,
In a standard optimization scheme it would be necessary to recalculate
at every step, an expensive operation. However, we assume that since it is being computed in a normalized reference frame, it can be considered approximately fixed. We can thus estimate it once from our training set. We estimate
by numeric differentiation, systematically displacing each parameter from the known optimal value on typical images and computing an average over the training set.
The AAM algorithm simply involves sampling the image to estimate the current residual, r(p), then using Equation 7
to estimate the update to the current parameters. The steps are iterated and placed in a multi-resolution framework for improved robustness (see [18]). The approach has been found to be fast and effective.
Where appropriate, additional statistical constraints (such as a prior on the model parameters or incorporating positions of known/hand annotated points) can be applied to improve the search [19].
Examples of AAM search
For example, Figure 5
shows an example of an AAM of the central structures of the brain slice converging from a displaced position on a previously unseen image. The model represents about 10 000 pixels and has 30 parameters. The search took about a second on a modern PC. Figure 6
shows examples of the results of the search, with the found model points superimposed on the target images.
One of the key advantages of using a shape model of multiple structures is that estimates of the positions of low-contrast, hard to locate structures are given by the relative locations of the more well defined structures. The model constraints thus lead to much more reliable feature location than one would obtain by searching for each structure individually.
Although we only demonstrated on the central part of the brain, models can be built of the whole cross-section. Figure 7
shows the first two modes of such a model. This was trained from the same 72 example slices as above, but with additional points marked around the outside of the skull. The first modes are dominated by relative size changes between the structures.
The appearance model relies on the existence of correspondence between structures in different images, and thus on a consistent topology across examples. For some structures (for example, the sulci), this does not hold true. An alternative approach for sulci is described by Caunce and Taylor [20, 21].
When the AAM converges it will usually be close to the optimal result, but may not achieve the exact position. Stegmann and Fisker [16, 22, 23] have shown that applying a general purpose optimizer can improve the final match.
Locating vertebrae
Figure 8
shows the first mode of variation of a shape model trained on 10 vertebra in dual energy X-ray absorptiometry (DXA) images of the spine [24]. This captures the global bending of the spine other modes capture more details about the way in which the spine and vertebral shape can vary. Figure 9
shows a detail in the result of matching such a model to a new image using a profile based AAM. Overall, accuracies of less than 1 mm (1 pixel), comparable with human error, are achieved.
 |
3D models
|
|---|
The approach can easily be extended to 3D. The underlying equations are almost unchanged, other than that 3D points are used and that the image sampling becomes more complicated. An effective approach to representing the texture in 3D is simply to sample profiles normal to the surface this leads to much more compact models than using all the pixels in a volume.
The main difficulty with 3D data is that of annotating the training set so as to define sufficient numbers of corresponding landmarks to represent the shape and its variability. This is almost impossible to do manually, and there has been considerable effort developing methods to automate the process [2531].
Femur head
An example of a 3D model is shown in Figure 10
. This statistical shape model is constructed from a set of 3D surfaces (obtained from manual segmentation of MR images) using the algorithm of Davies et al [32]. The model captures the variation observed in the training set. By matching it to 3D volume images using a 3D AAM, it can be used to locate the surface of the femur (see Figure 11
).
 |
Other applications
|
|---|
The statistical models of shape and appearance described above are being used to solve many medical image interpretation problems. They have been used to locate vertebrae in DXA images of the spine [24, 33], bones and prostheses in radiographs of total hip replacements [34], structures in MR images of the brain [35, 36], the prostate in MR images [37], the outlines of ventricles of the heart in echocardiograms [35, 38] and in cardiac MR sequences [39]. They have been used to locate the outlines of bones in densitometry [40]. The methods have been augmented to segment the lungs in chest radiographs [36]. The AAM algorithm has been extended to register cardiac perfusion MRI sequences [41].
 |
Conclusions
|
|---|
We have demonstrated that image structures can be represented using statistical models of shape and appearance. Both the shape and the appearance of the structures can vary in ways observed in the training set. Such models can be matched to new images rapidly and reliably using efficient algorithms. The methods are applicable to a wide variety of problems and give a powerful framework for automatic image interpretation.
 |
Acknowledgments
|
|---|
The brain images were generated by Dr Hutchinson and colleagues in the Department of Diagnostic Radiology. They were annotated by Dr Hutchinson, Dr Hill and K Davies and Prof. A Jackson (from the Medical School, University of Manchester) and Dr G Cameron (from Department of Biomedical Physics, University of Aberdeen). The knee MR images were provided by AstraZeneca. Femur model was constructed by T Williams (ISBE), C Wolstenholme and G Vincent (imorphics Ltd). The vertebra model was generated by Martin Roberts and Judith Adams.
 |
References
|
|---|
- McInerney T, Terzopoulos D. Deformable models in medical image analysis: a survey. Medical Image Analysis 1996;1:91108.[CrossRef][Medline]
- Maintz JBA, Viergever MA. A survey of medical image registration. Medical Image Analysis 1998;2:136.[Medline]
- Kass M, Witkin A, Terzopoulos D. Active contour models. Int J Computer Vision 1987;1:32131.
- Park J, Mataxas D, Young A, Axel L. Deformable models with parameter functions for cardiac motion analysis from tagged MRI data. IEEE Transactions on Medical Imaging 1996;15:27889.[Medline]
- Pentland AP, Sclaroff S. Closed-form solutions for physically based modelling and recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 1991;13:71529.[CrossRef]
- Scott GL. The alternative snake and other animals. In: 3rd Alvey Vison Conference, Cambridge, England, 1987:3417.
- Staib LH, Duncan JS. Boundary finding with parametrically deformable models. IEEE Transactions on Pattern Analysis and Machine Intelligence 1992;14:106175.[CrossRef]
- Yuille AL, Cohen DS, Hallinan P. Feature extraction from faces using deformable templates. Int J Computer Vision 1992;8:99112.[CrossRef]
- Lipson P, Yuille AL, O'Keeffe D, Cavanaugh J, Taaffe J, Rosenthal D. Deformable templates for feature extraction from medical images. In: Faugeras O, editor. 1st European Conference on Computer Vision, Springer-Verlag, Berlin/New York, 1990:4137.
- Bajcsy R, Kovacic S. Multiresolution elastic matching. Computer Graphics and Image Processing 1989;46:121.
- Christensen GE, Rabbitt RD, Miller MI, Joshi SC, Grenander U, Coogan TA, et al. Topological properties of smooth anatomic maps. In: 14th Conference on Information Processing in Medical Imaging, France, Kluwer Academic Publishers, 1995:10112.
- Christensen GE, Joshi SC, Miller M. Volumetric transformation of brain anatomy. IEEE Trans Medical Image 1997;16:86477.
- Kirby M, Sirovich L. Application of the Karhumen-Loeve procedure for the characterization of human faces. IEEE Transactions on Pattern Analysis and Machine Intelligence 1990;12:1038.[CrossRef]
- Wang Y, Staib LH. Elastic model based non-rigid registration incorporating statistical shape information. In: MICCAI, 1998:116273.
- Cootes TF, Taylor CJ, Cooper D, Graham J. Active shape models - their training and application. Computer Vision and Image Understanding 1995;61:3859.[CrossRef]
- Stegmann MB, Fisker R, Ersbøll BK. Extending and applying active appearance models for automated, high precision segmentation in different image modalities. In: Scandinavian Conference on Image Analysis, 2001:907.
- Cootes TF, Taylor CJ. Combining point distribution models with shape models based on finite element analysis. Image Vision Computing 1995;13:4039.[CrossRef]
- Cootes TF, Edwards GJ, Taylor CJ. Active appearance models. In: Burkhardt H Neumann B, editors. 5th European Conference on Computer Vision, Springer, Berlin, 1998;2:48498.
- Cootes TF, Taylor CJ. Constrained active appearance models. In: 8th International Conference on Computer Vision, IEEE Computer Society Press, 2001;1:74854.
- Caunce A, Taylor CJ. 3D point distribution models of the cortical sulci. In: Clark AF, editor. 8th British Machine Vison Conference University of Essex, UK, BMVA Press. 1997:5509.
- Caunce A, Taylor CJ. Using local geometry to build 3D sulcal models. In: 16th Conference on Information Processing in Medical Imaging, 1999:196209.
- Fisker R. Making deformable template models operational. PhD thesis, Informatics and Mathematical Modelling, Technical University of Denmark, 2000.
- Stegmann MB. Active appearance models: theory, extensions and cases. Master's thesis, Informatics and Mathematical Modelling, Technical University of Denmark, 2000.
- Roberts M, Cootes T, Adams J. Linking sequences of active appearance sub-models via constraints: an application in automated vertebral morphometry. In: 14th British Machine Vision Conference, 2003;1:34958.
- Brett A, Taylor C. Construction of 3D shape models of femoral articular cartilage using harmonic maps. In: MICCAI, 2000:120514.
- Brett AD, Taylor CJ. A method of automatic landmark generation for automated 3D pdm construction. In: Lewis P, Nixon M, editors. 9th British Machine Vison Conference, Southampton, UK: BMVA Press. 1998;2:91423.
- Fleute M, Lavallee S. Building a complete surface model from sparse data using statistical shape models: application to computer assisted knee surgery. In: MICCAI, 1998:87887.
- Brett AD, Taylor CJ. A framework for automated landmark generation for automated 3D statistical model construction. In: 16th Conference on Information Processing in Medical Imaging, Visegrad, Hungary, 1999;37681.
- Davies R, Twining C, Cootes T, Taylor C. A minimum description length approach to statistical shape modelling. IEEE Transactions on Medical Imaging 2002;21:52537.[CrossRef][Medline]
- Rueckert D, Frangi A, Schnabel J. Automatic construction of 3D statistical deformation models using non-rigid registration. In: MICCAI, 2001:7784.
- Frangi A, Rueckert D, Schnabel J, Niessen W. Automatic construction of multiple-object three-dimensional statistical shape models: application to cardiac modeling. IEEE-TMI 2002;21:115166.
- Davies RH, Twining CJ, Allen PD, Cootes TF, Taylor CJ. Shape discrimination in the hippocampus using an mdl model. In: 18th Conference on Information Processing in Medical Imaging, Springer-Verlag, 2003:3850.
- Smyth PP, Taylor CJ, Adams JE. Automatic measurement of vertebral shape using active shape models. In: 7th British Machine Vison Conference Edinburgh, Scotland, BMVA Press. 1996:70514.
- Kotcheff A, Redhead A, Taylor C, Hukins D. Shape model analysis of THR radiographs. In: 13th International Conference on Pattern Recognition. IEEE Computer Society Press 1996;4:3915.
- Hill A, Cootes TF, Taylor CJ, Lindley K. Medical image interpretation: A generic approach using deformable templates. J Med Informatics 1994;19:4759.
- van Ginneken B, Frangi AF, Stall JJ, ter Haar Romeny B. Active shape model segmentation with optimal features. IEEE-TMI 2002;21:92433.
- Haslam J, Taylor CJ, Cootes TF. A probabalistic fitness measure for deformable template models. In: Hancock E, editor. 5th British Machine Vison Conference York, England, BMVA Press, Sheffield. 1994:3342.
- Mitchell S, Lelieveldt B, van der Geest R, Schaap J, Reiber J, Sonka M. Segmentation of cardiac MR images: an active appearance model approach. In: SPIE Medical Imaging, Feb. 2000.
- Mitchell S, Boudewijn P, Lelievedt PF, van der Geest R, Bosch H, Reiber J, Sonka M. Time continuous segmentation of cardiac MR image sequences using active appearance motion models. In: SPIE Medical Imaging, Feb. 2001.
- Thodberg H, Rosholm A. Application of the active shape model in a commercial medical device for bone densitometry. In: Cootes T, Taylor C, editors. 12th British Machine Vison Conference, 2001:4352.
- Stegmann MB, Larsson HBW. Fast registration of cardiac perfusion MRI. In: Proc. International Society of Magnetic Resonance In Medicine - ISMRM 2003, Toronto, Ontario, Canada, Berkeley, CA, USA, 2003:702.