BJR
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS

British Journal of Radiology (2005) 78, S20-S25
© 2005 British Institute of Radiology
doi: 10.1259/bjr/37221979

This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF)
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Astley, S M
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Astley, S M

Paper

Evaluation of computer-aided detection (CAD) prompting techniques for mammography

S M Astley, PhD

Imaging Science and Biomedical Engineering, University of Manchester, Stopford Building, Oxford Road, Manchester M13 9PT UK


    Abstract
 Top
 Abstract
 Introduction
 Evaluating CAD algorithms
 Evaluating CAD systems
 Conclusions
 References
 
Computer-aided detection (CAD) systems, in which abnormalities are automatically detected and their locations presented to the radiologist as prompts, are increasingly being used to improve reader performance. The performance of CAD systems can be evaluated in two ways: by measuring the performance of the algorithms, or by monitoring the performance of readers using the system. All aspects of evaluation need careful consideration to avoid potential bias. This paper examines a variety of different approaches to evaluation and discusses their relative strengths and weaknesses.


    Introduction
 Top
 Abstract
 Introduction
 Evaluating CAD algorithms
 Evaluating CAD systems
 Conclusions
 References
 
Computer-aided detection (CAD) techniques have been applied to a variety of medical images, but are used most widely in mammography where signs of early breast cancer are often very subtle. CAD involves the use of computer algorithms to detect patterns in images associated with signs of disease [1]. The resulting information is presented to the film reader as prompts – markers superimposed on the image (or a version of it). The intention is to attract the reader's attention to potentially abnormal regions, and to increase the degree of suspicion associated with signs of abnormality that had previously been seen but dismissed as probably normal by the reader. It is the reader's decision whether or not to act on the prompts.

CAD systems have a number of potential benefits, including improved reader sensitivity, earlier detection of cancer and a solution to shortages of readers. In the National Health Service Breast Screening Programme (NHSBSP), the recommended practice is double reading, in which two readers independently review each mammogram. This has been shown to improve detection performance, especially where a third reader arbitrates in cases of disagreement [2]. However, this process is labour intensive. If a single reader using CAD could match the performance of independent double reading, this would alleviate shortages of trained readers in the programme and facilitate further enhancement. Earlier detection of cancer would clearly be beneficial, and there is now evidence that CAD systems can prompt abnormalities in a proportion of the previous screening films of women with screen-detected cancer [3, 4].

However, in the UK we are still in the process of investigating the extent to which these benefits can be achieved in practice. One UK study failed to show any significant benefit in terms of detection rates [5], whilst the initial results from another study are far more promising, suggesting equivalence with double reading [6]. All the readers in the two studies are working within the NHSBSP, and both studies have used the same CAD system. It can thus be seen that the method of evaluation of CAD systems has a direct bearing on the results achieved.

Another important aspect of evaluation is testing CAD algorithms to determine both their success in marking clinically significant abnormalities and the number of prompts marking normal regions of images. CAD algorithm performance tests can be used to aid decisions about which CAD system to use, and also to monitor the effects of any modifications to the algorithms.


    Evaluating CAD algorithms
 Top
 Abstract
 Introduction
 Evaluating CAD algorithms
 Evaluating CAD systems
 Conclusions
 References
 
CAD algorithms
CAD algorithms operate on digital images, acquired either by digitising film mammograms or directly from full-field digital mammography. Initially, algorithms are applied to single images to identify regions with characteristics associated with abnormality, such as groups of small bright blobs that could correspond to microcalcification clusters. A common form of output at this stage is a further image in which background structures are suppressed and potential abnormalities are enhanced. There are many different ways of proceeding, for example thresholding the image to produce a set of candidate locations for lesions. Thresholding can be based on a number of different features such as size, degree of enhancement and number of candidates. It may also depend on contextual information and other features derived from the image. Subsequently, additional image information can be extracted at candidate locations, for example a search for spicules around potential masses. Candidate locations can be compared with corresponding regions in another view of the same breast to obtain verification that they are genuine. They can also be compared with anatomically similar regions in the image of the contralateral breast to assess asymmetry.

Where algorithms output a series of candidate locations for prompts, each with an associated "probability" value, it is possible to produce receiver operating characteristic curves in which sensitivity is plotted against a measure of specificity [7]. A given point on the curve corresponds to a threshold of "probability" above which candidates are considered to be genuine abnormalities. An appropriate operating point can be selected giving a balance between algorithm sensitivity and false prompts. Different users or groups of users may elect to use different operating points; for example, a reader working in relative isolation with a low throughput of cases per year might benefit from high sensitivity at the expense of a large number of false prompts marking normal regions, whilst in a screening situation it might be more appropriate to have lower sensitivity with few false positive markers.

Placing prompts
When generating prompts, there are two key questions: whether to prompt and where to prompt. The criterion used by most researchers and CAD developers is that three detected microcalcification particles within a limited distance of each other on the mammogram constitute a cluster that should be prompted [8]. Consider the example illustrated in Figure 1Go. Because the mammogram is a two-dimensional projection of a three-dimensional structure, features may appear to be in close proximity in the breast only because of the way in which the breast was compressed and imaged. Comparison of detected candidates with those in other views of the same breast can be used to eliminate this false clustering effect.



View larger version (5K):
[in this window]
[in a new window]
 
Figure 1. (a) Three-dimensional representation of three individual particles of calcification in the breast. (b) Two-dimensional projection of the particles showing false clustering.

 
The question of where to prompt is more complex, particularly for larger clusters of microcalcification and masses. The "centre" of a cluster depends on the distribution of its constituent particles. Whilst it is relatively easy to specify rules that would place a single prompt to draw attention to the cluster illustrated in Figure 1Go, this is more difficult for more diffuse or elongated clusters. One solution would be to use multiple prompts in such cases. However, CAD system manufacturers prefer to limit the number of prompts used to mark large or diffuse clusters, as this improves the apparent performance of systems by reducing the potential for a high false marker rate.

For masses and asymmetric densities, the location of a prompt can be based on several different criteria. A simple example is shown in Figure 2Go, where the mid and light grey regions represent a detected lesion. A prompt could be placed either in the most dense part of the lesion (on the light grey area), at the centre of the mass, at the focus of the spiculation, or in the centre of the lesion defined geometrically (e.g. at the middle of the longest axis of the bounding ellipse). In the example, all of these strategies would result in sensible prompt positions, but examples can be generated where such methods would not produce useful prompts. The aim of prompting is two-fold: first, the prompt must be placed in a position that will attract attention to the abnormal region, and second, the prompt must convince the user that something genuinely abnormal was detected by the computer.



View larger version (20K):
[in this window]
[in a new window]
 
Figure 2. Drawing of a mass lesion. The light grey region represents an area of increased density within the mass.

 
The psychophysics of prompting is complex, particularly where the prompts are placed on a separate image and the reader is required to make the association between a location in the prompt image and the corresponding location in the original mammogram. Soft-copy reading simplifies the situation, as prompts can appear in physically the same location as the region of the original image they are marking. The effects of prompts falsely marking normal regions also need to be taken into account. Whether or not a reader's attention is attracted by a given prompt is likely to depend on several factors, including the false prompt rate of the algorithm, the appearance of the mammogram, the number of other prompts in the image, and the conspicuity of any abnormality.

Evaluating algorithm performance
Generally, when evaluating CAD algorithm performance, algorithms are applied to a set of images and the response to known truth is measured. There are three main factors involved in the evaluation: the data, the truth, and the means of establishing correspondence between the truth and the algorithm response.

Data
There are two principal approaches to creating a data set for algorithm evaluation: data selection and random (or consecutive) sampling. Data selection involves first identifying the variants expected in the population to which the algorithms will be applied in clinical practice. This can be done at many different levels. At the simplest, we can define classes for "normal" and "abnormal", but even this requires further definition. In mammography it is difficult to use very recent cases, as there is evidence that a proportion of cases classified as normal by double reading do actually show some signs of early cancer [3, 4]. It is thus safer to require that in a "normal" case there has been at least one subsequent normal screening mammogram. Mammograms showing benign appearances that a reader would dismiss without comment should ideally be included in the normal category. The "abnormal" category can be split into general types such as "calcification" and "mass", or more specific categories such as "lobular carcinoma". The selection-based approach to creating a data set then involves obtaining a specified number of cases in each category. These may be selected as "representative examples" [9], picked randomly [10], or selected on the basis of subtlety determined by a panel of radiologists [11]. The prime advantage of selection is that it is possible to test performance on less common pathologies and appearances in an efficient way. The disadvantage is that it is much harder to generalise results to a screening population. Appearances that fall between categories are less likely to be included, and the selection of both categories and cases is subjective.

The random sampling approach to creating a data set involves taking either consecutive or random cases without first classifying them. Provided sufficient examples are used, this approach enables a more accurate prediction of algorithm performance on the target population. However, the sample may not contain sufficient examples of less common appearances for reliable results to be obtained for all types of cases. There is also a practical problem in that cases listed for inclusion may have been removed from the image store, causing a possible bias. A pure random sampling approach would require the use of substantial numbers of images to ensure that even the more uncommon types of abnormality are adequately represented. In practice, a combination of random sampling and selection provides the best compromise between efficiency and accuracy. Broad categories such as "normal" and "abnormal" are used, ensuring that there are no gaps, i.e. every case must be a member of one group or the other. Within those categories, cases can be sampled randomly.

Truth
To assess the performance of prompting algorithms, the locations of abnormal regions in the mammograms must be determined. Whilst pathology reports are useful for confirming the type and approximate location, they cannot be used to delineate the boundaries of lesions in images. Radiographs of excised lesions may provide useful data on the actual number of calcifications in a given cluster, although it is unlikely that a cluster in a specimen radiograph would be at the same radiographic projection as it was in vivo. There are also image-based techniques for obtaining additional data, including the use of a different modality or alternative view. However, measurement of algorithm performance necessitates accurate location in the mammogram as processed. This is usually obtained by asking expert mammographic film readers to annotate the images.

For film mammograms, annotation can be carried out on the original film using a chinagraph pencil. In this case, the annotated image can be re-digitised via the CAD system to produce a version comparable with the prompt image, although any prompts produced on the annotated image should be ignored. Alternatively, digital images can be annotated on-screen. Annotations of boundaries are subjective, and different radiologists produce different estimates of boundary location. For this reason, multiple annotations should be made (Figure 3aGo). Regions in the image can then be coded based on the likelihood that the pixels at each point correspond to genuine abnormality, with pixels annotated by all readers having the greatest likelihood and being assigned to "on-target", and pixels outside all readers' annotations being assigned to the background ("off-target"). Regions annotated by some readers, but not all, can be dealt with depending on the proportion of readers who included them in the target.



View larger version (9K):
[in this window]
[in a new window]
 
Figure 3. (a) Three expert readers' delineations of a lesion boundary. (b) A likelihood map based on the annotations. The lighter the colour, the more likely a pixel is to be part of the lesion.

 
Correspondence
Having processed data and obtained ground truth information, the next stage is to determine the performance of the algorithm by comparing the algorithm output or prompts with the ground truth. If a straightforward "hit" or "miss" can be determined, free-response receiver operating characteristic (ROC) curves can be constructed using different algorithm operating points to produce points on the curve [7]. This provides a good method of comparing algorithms and algorithm revisions. A common way of presenting the results is to plot sensitivity against false prompts per case. If accurate determination of the lesion boundary is more important, the percentage of correctly classified pixels can be measured; both ROC analysis and transportation based approaches have been used in this case [12, 13].

Where algorithms output regional responses rather than localised symbolic prompts, a decision must be made about whether or not a given lesion has been detected. The simplest method is to demand that for detection at least 50% of the response region overlaps with the truth region. This ensures that overdetection, where a large region of the image is labelled as abnormal and the lesion is relatively small, does not score highly. The converse (i.e. demanding that 50% of the truth region is occupied by the response region) is less helpful, as a small focal region detected within a larger diffuse abnormality would be penalised. The main problem with overlap criteria is that a small response region that would certainly attract attention to a small, adjacent but not overlapping abnormality would count as a false positive (Figure 4aGo). Worse, responses around the edge of an abnormality would count as multiple false positives (Figure 4bGo). These limitations can be overcome by using a measure of proximity. For example, it is possible to measure the number and distance that response region pixels would have to be moved to reconstruct the truth region [12]. However, this approach is computationally expensive and not suitable for analysis of the performance of algorithms that generate a large number of response regions.



View larger version (7K):
[in this window]
[in a new window]
 
Figure 4. Diagrams illustrating false positive responses. The truth region is white and the algorithm response regions are black. (a) A "near miss" (one false positive). (b) Multiple false positives around the edge of the lesion.

 
Where the algorithm response appears as a symbolic prompt, the problems are similar. Much depends on the definition of ground truth; a prompt near the end of a long spicule might be technically "on-target" but in practice it is unlikely to attract the attention of a reader to the abnormality (Figure 5Go).



View larger version (6K):
[in this window]
[in a new window]
 
Figure 5. Illustration of a spiculated lesion with a prompt placed near the end of a spicule (arrow).

 

    Evaluating CAD systems
 Top
 Abstract
 Introduction
 Evaluating CAD algorithms
 Evaluating CAD systems
 Conclusions
 References
 
In addition to measuring algorithm performance, it is also important to measure the performance of readers using CAD, and to compare prompted and unprompted performance. There are three main factors involved in this process: selection of data, selection and training of readers, and the methodology for comparison.

Data
Once again there are two principal approaches to obtaining data: selecting cases, and using randomly selected or consecutive cases. Where case selection is used, the aim is generally to reduce the number of normal cases read by loading the data with cancers. Any selection beyond loading, i.e. in terms of types of case, renders the results extremely difficult to generalise, but even loading with cancer cases is problematic. If a CAD algorithm has a false prompt rate of 0.5 false prompts per image, and the data set contains 50% cancers, one in three prompts presented to the readers will mark genuine abnormalities compared with approximately 1 in 100 in a screening case-mix. Clearly these two situations make different demands on the readers. In the latter (more realistic) case, the readers have an underlying expectation that the vast majority of prompts are false and must be dismissed. In the former case all prompts must be treated with suspicion. Random selection is more labour intensive for the readers since they must read a large number of normal cases to read enough cancer cases to achieve statistically significant results. Thousands of cases are needed to encapsulate the natural variation in abnormal mammographic images.

Whichever primary method of selection is used, in retrospective studies it is important to ensure that prior cases are included. Prior cases are the previous screening films of women with subsequently detected cancer, either as an interval cancer or at the next screening round. It is also important both in prospective and retrospective studies that, if the cases used to establish detection and recall rates for unprompted screening are not the same as those used to measure performance with CAD, the two sets of cases are equivalent. If, for example, the proportions of incident and prevalent cancers are different, this could be problematic because the expected cancer rates in the two populations are not the same.

It is vital that the status of all cases used in any retrospective evaluation is known. Pathology reports must be obtained for abnormal cases, and normal cases should have at least one subsequent normal screening mammogram to avoid the situation in which a lady who had previously been given the all clear should require recall on the basis of re-reading with CAD.

Readers
In the UK, radiologists, radiographers and breast physicians all read screening mammograms. The performance of individual readers within the breast screening programme is known to vary [14]. For this reason, it is important that studies are conducted using several readers. To date, this has rarely been the case. The most widely quoted prospective trial that showed a large benefit with CAD had only two readers [15]. One UK study used a large number of readers, but they had little training in the use of CAD so any benefit was lost [5]. Experienced film readers from the NHSBSP will have much more experience in unprompted than prompted reading, which could bias results in favour of unprompted reading.

If a study is conducted consecutively, for example comparing unprompted performance over a year with prompted performance the following year, it should be noted that the readers will be more experienced in the second period of the study. For radiologists who have been reading for many years this will be a minor effect, but for newer readers there could be a substantial difference in performance. Ideally, performance should be evaluated before and after the study to establish whether this is the case.

Studies in which the readers differ in the unprompted and prompted conditions could also introduce a bias, depending on the experience and performance of the readers. Alongside experience, the most important factor in reader studies is training. Few studies have incorporated any significant degree of reader training, and there has been a blurring of the distinction between training and practice. The crucial difference between the two is feedback. In a recent UK study, readers were trained for 7 weeks prior to commencing the trial. They were tested at the start and end of the training period, and it was found that most readers improved their performance over that period by reducing the number of normal cases scored as "probably malignant" or "malignant" [16]. The effect was also mirrored in scores made before the use of CAD. These results, in which readers overdetected at the start of the training period, indicate that training is important to achieve stable performance at the start of trials. Further care is necessary when readers are asked to record or score results differently from normal practice, especially if the unprompted condition is taken from a different time period.

Method of comparison
The most widely quoted retrospective CAD trial is that of Warren Burhenne et al [3]. This investigated the use of the R2 ImageChecker® [17] to improve detection of early cancers. Approximately 80% of the cases used were screen-detected cancers and the rest were priors; there were no normal cases at all. This methodology was used to calculate the "potential benefit" of CAD, that is the benefit if all the correctly placed prompts were acted on. However, it really serves to place an upper limit on the benefit, as in a screening situation where the vast majority of prompts mark normal regions a proportion of the correct prompts marking subtle abnormalities will be dismissed. Since the study was performed, the R2 ImageChecker® CAD algorithms have been further improved, so the potential benefit measured in this way will be even greater. This is one of the difficulties in evaluating CAD; there is ongoing research in algorithm development, and systems change their algorithms frequently.

The prospective trial of reading with the ImageChecker® conducted by Freer and Ulissey is also widely cited [15]. They scored cases twice, once before consulting the CAD prompts and once after, and found a large benefit in using the system. Cancer detection was increased by 19.5%. One weakness of their study is that there were only two readers, so generalisation is not possible. Another problem with the methodology is that there is a change to the normal unprompted practice in that the readers knew they would have a second look at the images when making their first scoring decision. This may adversely affect unprompted performance.

A study published by Gur et al [18] found no difference in sensitivity or recall rate with CAD. This could, however, be explained by the selection of cases since the study was conducted consecutively, with the two conditions performed at different times. The proportions of incident and prevalent round cases in the two conditions differed by 10%. This would lead to different rates of cancer in the two groups, with more expected in the unprompted condition. Any effect from CAD would thus be masked.

Taylor and Given-Wilson have performed two studies in the UK, also evaluating the ImageChecker®. The first study was retrospective. It involved a relatively small number of cases, heavily weighted with cancers, read by a large number of readers [5]. They concluded that there was no benefit from CAD. However, the readers had little training in using the system, and the small set of selected cases means that the results cannot be generalised to the screening situation. Furthermore, the readers took as long to read in the unprompted condition as they did when using CAD, which indicates that the initial unprompted search was reduced when reading with CAD, contrary to recommended good practice. A prospective trial was then performed with the aim of replicating Freer's methodology [19]. Once again they derived no benefit from the system. In this study a small number of readers independently double read mammograms with CAD. Since reading should take longer with CAD, gains in efficient use of manpower will be achieved by single reading with CAD rather than by double reading, provided that equivalence can be shown. Double readers know that another reader will view the same case, so may not perform in a comparable way to single readers.

The CADET study, undertaken in Aberdeen and Manchester, is a retrospective comparison of single reading with CAD and previous double reading of the same cases by different readers. Eight radiologist readers, trained for 7 weeks in the use of CAD prior to the study, read a total of over 10 000 screening mammograms. Preliminary results strongly suggest at least equivalence between the cancer pick-up rates of double reading and single reading with CAD [6]. The main weakness of this study lies in the use of cases from 1996 to avoid any ethical problems that could arise with more recent cases. Film quality has improved since 1996, although as the current and previous readers used the same data this should have affected them equally. It may also have affected the performance of the CAD algorithms. There is also the question of reader experience. Some of the readers read in both conditions, and they will clearly be more experienced readers at the time of the prompted condition. However, experienced readers have left both screening centres since 1996 and been replaced with less experienced ones, so this may balance out.


    Conclusions
 Top
 Abstract
 Introduction
 Evaluating CAD algorithms
 Evaluating CAD systems
 Conclusions
 References
 
The evaluation both of CAD algorithms and of CAD systems is complex, and all methodologies have weaknesses. There are many variables and confounding factors to be taken into account. The major limitations of published studies are insufficient number of readers, lack of training, an inadequate benchmark for unprompted reading (especially where the populations differ), use of an artificial case-mix and outdated software. Most of these can be addressed by careful study design. For example, it is possible to compensate for expected improvements in software by slightly loading the case-mix with cancers to achieve the expected true prompt:false prompt ratio with newer (more specific) software.

Most evaluations that have been published show some degree of benefit with CAD, but the spectrum of improvement in cancer detection rate ranges from 0 to 19.5%. Further research is necessary to establish under what circumstances the greatest gains can be achieved and the expected benefit in the context of the NHSBSP. In addition to evaluating algorithms and the performance of readers with CAD, there are other aspects of CAD that must be considered, including ergonomics, cost effectiveness and workflow issues. The real test is whether it is possible – and desirable – to replace double reading by single reading with CAD.


    References
 Top
 Abstract
 Introduction
 Evaluating CAD algorithms
 Evaluating CAD systems
 Conclusions
 References
 

  1. Working Party of the Radiologists Quality Assurance Co-ordinating Group. Computer aided detection in mammography, NHSBSP Publication No. 48. London, UK: NHSBSP, 2001.
  2. Blanks RG, Wallis MG, Moss SM. A comparison of cancer detection rates achieved by breast cancer screening programmes by number of readers, for one and two view mammography: results from the UK National Health Service breast screening Programme. J Med Screen 1998;5:195–201.[Abstract/Free Full Text]
  3. Warren Burhenne LJ, Wood SA, D-Orsi CJ, Feig SA, Kopans DB, O’Shaughnessy KF, et al. Potential contribution of computer-aided detection to the sensitivity of screening mammography. Radiology 2000;215:554–62.[Abstract/Free Full Text]
  4. Astley SM, Boggis CRM, Walker K, Wallace S, Tomkinson S, Hillier V, et al. An evaluation of a commercial prompting system in a busy screening centre. In: Peitgen H-O, editor. Proceedings of the Sixth International Workshop on Digital Mammography; 2002 June 22–25; Bremen, Germany. Heidelberg, Germany Springer Verlag, 2002.
  5. Taylor PM, Champness J, Given-Wilson R, Potts HW, Johnston K. An evaluation of the impact of computer-based prompts on screen readers' interpretation of mammograms. Br J Radiol 2004;77:21–7.[Abstract/Free Full Text]
  6. Astley S, Gilbert FJ, McGee M, Griffiths P, Duffy S, Buchan I, et al. CADET: The Computer Aided Detection Evaluation Trial. In: Pisano E, editor. Proceedings of the Seventh International Workshop on Digital Mammography; 2004 June 18–21; Chapel Hill, NC. [in press.]
  7. Metz CE. Fundamental ROC analysis. In: Handbook of medical imaging, Vol. 1. Physics and psychophysics. Beutel J, Kundel H, Van Metter R, editors. Bellingham, WA: SPIE Press, 2000:751–69.
  8. Chan H-P, Doi K, Galhotra S, Vyborny C, MacMahon H, Jokich P. Image feature analysis and computer-aided diagnosis in digital mammography. (I) Automated detection of microcalcifications in mammography. Med Physics 1987;14:538–48.[CrossRef][Medline]
  9. Suckling J, Astley S, Betal D, Cerneaz N, Dance DR, Kok S-L, et al. The Mammographic Image Analysis Society Digital Mammogram Database, International Congress Series 1069. Excerpta Medica 1994:375–8.
  10. Dukic I, Astley SM, Boggis CRM. An evaluation of a CAD system with variable marker sizes. In: Pisano E, editor. Proceedings of the Seventh International Workshop on Digital Mammography, 2004 June 18–21; Chapel Hill, NC. [in press.]
  11. Zheng B, Ganott MA, Britton CA, Hakim CM, Hardesty LA, Chang TS, et al. Soft-copy mammographic readings with different computer-assisted detection cuing environments: preliminary findings. Radiology 2001;221:633–40.[Abstract/Free Full Text]
  12. Board M, Astley S. A new method for evaluating and optimising mammographic detection algorithms. In: Peitgen, editor. Proceedings of the Sixth International Workshop on Digital Mammography; 2002 June 22–25; Bremen, Germany. Heidelberg, Germany: Springer Verlag, 2002:257–61.
  13. Board M, Bruynooghe M, Messainguiral C, Astley SM. Comparison of two microcalcification algorithms using FROC and transportation based techniques. In: Pisano E, editor. Proceedings of the Seventh International Workshop on Digital Mammography; 2004 June 18–21; Chapel Hill, NC. [in press.]
  14. Savage CJ, Gale AG, Pawly EF, Wilson ARM. To err is human, to compute divine? In: Gale AG, Astley SM, Dance DR, Cairns AY, editors. Digital mammography. Amsterdam, The Netherlands: Excerpta Medica, Elsevier, 1994:405–14.
  15. Freer TW, Ulissey MJ. Screening mammography with computer-aided detection: prospective study of 12,860 patients in a community breast center. Radiology 2001;220:781–6.[Abstract/Free Full Text]
  16. Astley S, Quarterman C, Al Nuaimi Y, Chasser C, Dukic I, Hillier V, et al. Computer-aided detection in screening mammography: the impact of training on reader performance. In: Pisano E, editor. Proceedings of the Seventh International Workshop on Digital Mammography; 2004 June 18–21; Chapel Hill, NC. [in press.]
  17. R2 ImageChecker®. http://www.r2tech.com [accessed 9 November 2004].
  18. Gur D, Sumkin JH, Rockette HE, Ganott M, Hakim C, Hardesty L, et al. Changes in breast cancer detection and mammography recall rates after the introduction of a computer-aided detection system. J Natl Cancer Inst 2004;96:185–90.[Abstract/Free Full Text]
  19. Taylor P, Khoo L, Given-Wilson R. Prospective study of the release of the R2 ImageChecker in the UK screening setting. In: Pisano E, editor. Proceedings of the Seventh International Workshop on Digital Mammography; 2004 June 18–21; Chapel Hill, NC. [in press.]




This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF)
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Astley, S M
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Astley, S M


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
BJR DMFR IMAGING  ALL BIR JOURNALS