| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
Full Paper |
Imaging Science and Biomedical Engineering, Stopford Building, Oxford Road, Manchester M13 9PT, UK
| Abstract |
|---|
|
|
|---|
| Introduction |
|---|
|
|
|---|
Mammographic film reading is a particularly demanding visual task. In screening programmes, the film reader must search for extremely infrequent and often very subtle signs of cancer superimposed on complex and variable backgrounds. Early breast cancer may appear in a variety of forms: a few particles of microcalcification; a small ill-defined or spiculated mass; abnormal asymmetry between right and left breast images, or subtle distortion of the underlying structure of the breast. These abnormalities vary in size, shape, structure, brightness and location and may share a great deal of similarity with normal mammographic appearances.
False negative cases, in which signs of cancer are missed by a reader, sometimes occur. Retrospective evaluation of the previous screening films of cancers detected between screening rounds (interval cancers) and screen-detected cancers show evidence of abnormality in between 16% and 27% of cases. Some of these signs are very subtle, and may have been seen by the readers but dismissed as being insignificant, but others are clear signs of malignancy [24]. However, different readers miss different cancers, as is evidenced by the success of double reading in which two readers independently read the films [5]. The most accurate method of interpretation is double reading with arbitration, where a third reader reviews cases about which the two readers disagree [5, 6].
In the UK, recent extensions to the National Health Service Breast Screening Programme (NHSBSP) have included increasing the age range of women invited for screening, and taking a second radiographic projection of each breast at all visits. These, coupled with a natural expansion of the eligible population, have significantly increased the manpower requirements of the programme. Different methods of coping with this, including the use of computer-based aids, are currently being explored.
Researchers have been developing algorithms to detect mammographic abnormalities for more than 30 years with the aim of either automating mammographic interpretation or, more realistically, providing a tool which will enhance human film-reading performance. A benchmark against which the efficacy of such tools can be measured is whether or not the performance of an individual reader could be improved to the extent that double reading was no longer necessary, as this would alleviate any shortage of film readers.
| Computer-based detection of mammographic abnormalities |
|---|
|
|
|---|
Microcalcifications are sometimes difficult for the human film reader to detect because of their small size and low contrast, particularly if they are superimposed on dense glandular tissue. However, of all the signs of abnormality found on mammograms, microcalcifications are the easiest to detect automatically. Unlike small ill-defined masses, which may superficially resemble normal glandular tissue, microcalcifications have properties namely their very small size and high attenuation which differ significantly from those of normal background structures. Computers can be trained relatively easily to detect small, bright, regions with well-defined edges (Figure 1
). Most microcalcification detection algorithms make an initial attempt at detection based on these properties, followed by a more detailed analysis to reduce the number of false positives due to benign calcification, overlapping narrow linear structures, or artefacts such as screenfilm "shot" noise [10]. A wide variety of detection methods have been investigated including mathematical morphology [11, 12], the use of matched filters [13] and neural networks [14, 15]. One successful approach involved a preliminary phase of noise estimation and equalization followed by a method based on Bayesian techniques and a Markov random field model. Iterative updating maximized the probability of a correct labelling of pixels [16]. It achieved more than 90% sensitivity with slightly less than one false detection per image on a set of 40 mammograms; derivatives of this method are frequently used as a basis against which to assess the performance of more recent techniques. The false positive removal (or analysis) phase of detection algorithms not only takes into account the shape, size and brightness of individual candidate particles, but also involves assessment of number, location and clustering of candidates; for example, isolated candidates can generally be ignored.
|
Soft tissue masses are more difficult to detect automatically than microcalcifications because they may superficially resemble normal background structures such as overlapping areas of glandular tissue. Edge detection, which is one of the most widely used computer-based feature detection techniques, is of little use since the lesion edges are often not as strong as those of normal structures within the breast. In fatty breasts, methods directed at local increases in density of a particular size are reasonably effective, but in breasts with a greater glandular component many false positives result, and the problem again becomes one of analysis to determine which detected regions correspond to genuine abnormalities. An alternative approach is to look directly for the foci of radiating patterns of lines characteristic of spiculated lesions [22, 23]. The resulting locations can be compared with local increases in density to provide additional confirmation that a malignant lesion may be present. Linear structure analysis can also be used to detect architectural distortion, along with methods which first detect the outline of the glandular disc and then search around it for regions with an uncharacteristic shape [24]. The segmentation of the glandular disc has been tackled by various methods including texture measures and calibration of densities [2527]. Mass detection algorithms are generally less sensitive and specific than microcalcification detection algorithms; commercial computer aided detection (CAD) systems claim sensitivities of up to 98.5% for microcalcifications, but only 89.2% for masses and distortions [2830].
Both masses and diffuse asymmetries can be detected by comparing right and left breast images. Normal mammograms usually show some degree of asymmetry (Figure 2
), but although this is usually easily identified and ignored by human readers, it can be problematic for computer-based approaches. The automated detection of asymmetry is a particularly interesting problem from a technical perspective: anatomically similar regions of the right and left breasts must be compared, but there are few reliable landmarks on which to base registration. The nipple is always present, and is usually visualized in profile, and the outline of the breast (skinline) may also be matched. However, the positioning of the breast at the time of mammography determines first the amount of tissue visualized at the chest wall boundary and second the extent of the breast outline. Features on the outline such as the inframammary fold are not always visualized [31]. Compression also has an impact on the appearance of the mammogram; this is an added complication when comparing images taken at different times. In the mediolateral oblique view, the line of the pectoral muscle and the position of the nipple can be used to define a frame of reference for alignment. In the craniocaudal view it is more difficult since the orientation of the chest wall boundary of the film depends entirely on patient positioning. Various methods have been investigated as means of comparison; bilateral subtraction, non-rigid registration and transportation have all been used in breast imaging [3235]. An example of a multiresolution transportation-based approach to the detection of asymmetry is shown in Figure 3
. Here, pixel values (which represent X-ray density) are moved from right to left breast images to make them similar. A cost is associated with distance and quantity, and an efficient solution is found. The output images represent cost associated with transportation to each destination pixel. Few asymmetry detection algorithms have been evaluated on data sets of significant size to enable comparison, and only one of the current CAD systems claims to detect asymmetry [29].
|
|
| Computer-based aids for film readers |
|---|
|
|
|---|
A step back from this is automated pre-screening, in which the computer is used to sort cases into two categories: "abnormal or equivocal" and "normal". In the pre-screening model, the radiologist would view only those cases in which the computer detected something suspicious along with a small subset of the normal cases for quality control purposes [36]. The overall sensitivity with pre-screening is limited by the sensitivity of the computer system, as any cancer cases erroneously classified as normal by the computer would be unlikely to be detected. The gain in terms of reduced time spent interpreting the cases depends on the specificity of the computer system, but it is not enough to calculate the proportion of cases viewed: those deemed normal by the computer are likely to include the majority of mammograms of very fatty breasts which the human film reader can also dismiss very rapidly.
Over the last few years, detection algorithms have been used clinically as components of CAD systems, which aim to aid the film reader by drawing attention to suspicious regions of the original mammogram in a process known as prompting [37]. The aim of CAD is to ensure that all potentially significant regions of the mammogram are examined (to avoid errors caused by failure of the reader to adequately scan the whole mammogram), and given due consideration (to avoid errors in which abnormalities are detected but dismissed as being normal). Prompting systems require digital images, either acquired directly or obtained by digitizing film images. Algorithms are then applied to detect specific types of candidate abnormalities such as microcalcification clusters and masses. The most suspicious locations are marked in a prompt image, which is usually a low-resolution version of the mammogram. Before consulting the prompt images for a given case, the reader should make a thorough initial unprompted search of the original mammograms. This would ensure that the sensitivity of a reader with CAD at least equals their unprompted sensitivity. The reader then accesses the prompt images and re-evaluates the case, checking marked locations and noting any new findings. If the system is used as intended, and the algorithms are sufficiently sensitive and specific, the process should lead to an improvement in the reader's detection performance. The sensitivity of a reader using CAD should be limited neither by that reader's unprompted performance, nor by the sensitivity of the individual algorithms in the CAD system.
There are now a number of commercially available CAD systems, the first of which to be marketed extensively is the R2 ImageChecker (R2, CA, USA) [28] which detects and prompts potential masses and microcalcification clusters. This system incorporates a digitizer to convert film mammograms to digital format. Bar-coded case separators are used to relate each case to the corresponding prompt image. The prompts are presented on a small monitor positioned near the viewer on which the mammograms are displayed, with paper printouts of the prompt images available for back-up. The software provides information about the strength of evidence that caused each region to be prompted, and enables detailed examination of the prompted regions by means of magnified images accessible via a touch screen.
To date, three systems have been approved by the Food and Drug Administration (FDA), and insurance companies in the USA now reimburse an extra $17 per case when CAD is used. The MammoReader from iCAD (Instrumentarium Imaging Inc., Milwaukee, USA) [29] and CADx's Second Look [30] both operate on a similar principle to the ImageChecker, although the systems are based on different detection algorithms, and thus respond differently to potential abnormalities. There are also some practical differences in how the prompting information is displayed. These two systems are soon to be merged. The development of CAD is a fast growing field, and a number of other systems are at a stage where they will soon become contenders in the CAD market.
| The potential of CAD to aid screening |
|---|
|
|
|---|
None of the commercial systems claims perfect sensitivity and specificity of their algorithms, and nor do they claim to detect all manifestations of cancer. The initial unprompted search is thus vital to the success of CAD. In addition to errors caused by failure to search the image thoroughly, cancers may also be missed if the signs are detected by the reader but wrongly dismissed as being normal. These cancers are more likely to be subtle in appearance, with similar features to normal background structures and benign abnormalities. In this case, a correctly placed prompt should add weight to the conviction of the reader that there is actually an abnormality present, thus reducing the possibility of misclassification. Many of the very early cancers which can be seen during retrospective analysis of screening films taken prior to cancer being detected show only subtle changes, but there is evidence that CAD systems are sensitive enough to prompt in such cases [2, 40].
The majority of four-film cases presented to current CAD systems will be prompted, regardless of whether or not they contain a cancer. In breast screening programmes, over 99% of women screened will have normal mammograms. With a false prompt rate of 0.5 per image, this would mean that only 1 in 100 prompts would actually correspond to a cancer; the remaining false positive prompts would have to be disregarded by the film reader. Some of these false prompts may be very easy to dismiss, for example if they mark clearly benign calcifications, crossing ducts or image artefacts. However, the overall effect of a high ratio of false prompts to true ones will be to reduce the weighting placed by the reader on any given prompt, thus reducing the potential of CAD to overcome misclassification errors. False prompts may also degrade performance by acting as distractors and drawing attention away from genuinely abnormal regions.
For algorithms which are very sensitive, such as microcalcification detection algorithms, there is a danger that the reader may become over-reliant on prompts, and miss those cancers that the system fails to detect. If CAD is being used as intended, the majority of these cancers should be detected in the initial, unprompted search. However, some studies have shown that readers took no longer to read with CAD than they did without it, which could indicate that they reduced their initial unprompted search of the image in the knowledge that they would be making a second search with the aid of prompts [41, 42].
There are many practical considerations associated with the introduction of CAD into a screening programme. In larger centres in the UK, the mammograms of approximately 200 women must be interpreted each day. This will include both standard and large format films, with any previous screening films displayed alongside current mammograms to enable comparison. Radiologists read up to 130 cases per hour [2], although the amount of time spent on each case is highly variable. In the NHSBSP, the women screened are between 50 years and 69 years and a large proportion of them have fatty breasts which are easily assessed. Mammograms showing fatty-glandular and dense breasts are more time-consuming to read. Clearly, the use of CAD will increase the time taken for an individual reader to review the films, since it is additional to the initial unprompted evaluation, but it is unlikely to be prohibitively slow in practice. The digitization and processing of films from larger centres is an issue which needs to be addressed; to cope with current workloads, two systems (or an overnight run) would be required. Medicolegal implications may also need to be addressed, for example in the case where a radiologist chooses to dismiss a prompt which is marking a genuine abnormality.
| The clinical evaluation of CAD |
|---|
|
|
|---|
The majority of published trials of CAD have been retrospective. The most significant prospective study took place over a 12 month period during which two readers examined screening mammograms unprompted, recorded their findings and then re-examined the films with the aid of prompts [43]. A 19.5% increase in the number of cancers detected was reported, with a 5% increase in the proportion of early cancers found at screening. These results are very encouraging; however with only two readers it is difficult to generalize the results to readers with a wider range of experience. Furthermore, they were reading with the aid of a "safety net"; they knew that their unprompted search was preliminary to a further search with prompts, and this could have adversely affected their baseline measure of unprompted performance.
One of the major limitations of many of the published retrospective studies is that many have chosen to load their test cases with cancers, rather than use a realistic screening mix in which fewer than 1% of cases show cancer [40, 42, 4446]. This methodology has clearly been adopted for reasons of efficiency; it is very costly to have readers review the large numbers of cases required to evaluate reader sensitivity on a screening mix of cases. However, by loading the data they alter the ratio of false to true prompts. This artificially increases the readers' expectation that any given prompt will mark a cancer, leading to an overestimate of prompted performance and making generalization to the screening programme impossible.
Other studies have involved only a small number of readers, and are limited by the effects of natural differences in reading performance and practice between individuals. The degree of training (or in some cases, lack of it) may also have influenced results; we do not yet know how long it takes, or how many prompted cases must be read, for a reader to attain stable performance with CAD. Finally, since the spectrum of appearances of cancers is so wide, selection of small sets of cases is likely to lead to the results being over-dependent on the actual case-mix used.
Most published studies have shown improvement with CAD [43, 46]. One exception is a recent UK study of 50 readers interpreting 180 cases of which one third were cancers: no significant difference between prompted and unprompted conditions was found [42]. This result could, however, have been influenced by lack of reader training prior to the study, or to the artificial nature of the experiment.
| Conclusions |
|---|
|
|
|---|
| Acknowledgments |
|---|
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
D. Georgian-Smith, R. H. Moore, E. Halpern, E. D. Yeh, E. A. Rafferty, H. A. D'Alessandro, M. Staffa, D. A. Hall, K. A. McCarthy, and D. B. Kopans Blinded Comparison of Computer-Aided Detection with Human Second Reading in Screening Mammography Am. J. Roentgenol., November 1, 2007; 189(5): 1135 - 1141. [Abstract] [Full Text] [PDF] |
||||
![]() |
N Karssemeijer, J D M Otten, H Rijken, and R Holland Computer aided detection of masses in mammograms as decision support Br. J. Radiol., December 1, 2006; 79(Special_Issue_2): S123 - S126. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| BJR | DMFR | IMAGING | ALL BIR JOURNALS |