BJR
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS

British Journal of Radiology (2006) 79, S127-S133
© 2006 British Institute of Radiology
doi: 10.1259/bjr/25049149

This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF)
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Scott, H J
Right arrow Articles by Gale, A G
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Scott, H J
Right arrow Articles by Gale, A G

Full paper

Breast screening: PERFORMS identifies key mammographic training needs

H J Scott, MSc and A G Gale, PhD

Applied Vision Research Centre, Loughborough University, Garendon Wing, Loughborough, LE11 3TU, UK

Correspondence: HJ Scott, Applied Vision Research Centre, Loughborough University, Garendon Wing, Loughborough, LE11 3TU, UK. E-mail: H.Scott{at}lboro.ac.uk


    Abstract
 Top
 Abstract
 Introduction
 Methods and materials
 Results
 Discussion
 Conclusion
 References
 
The UK Breast Screening Programme has recently expanded the age range for invitation in the prevalent round to 70 years. In contrast, fewer radiologists now choose to specialise in the area of breast cancer screening. In response to this depletion in film-reading personnel, an increasing number of radiographers have been trained as advanced practitioners in order to film-read alongside the current radiologists. As part of the quality assurance programme for the National Health Service Breast Screening Programme (NHSBSP), each film-reader can participate in a voluntary self-assessment scheme (Personal Performance in Mammographic Screening, PERFORMS) which consists of a number of recent challenging breast screening cases that are amassed nationally and distributed bi-annually. The scheme produces anonymous data on any areas of difficulties that individual participants have; these data can then be aggregated over groups of participants or over specific types of screening cases. In this paper, the areas of difficulty experienced by groups of advanced practitioners and radiologists on the PERFORMS cases were investigated to determine whether there were occupational group differences in reading skills in terms of case classification and feature type. Identifying if such problematic areas exist would be the first step to provide training sets specially tailored to the needs of particular occupational groups. As a bench mark for which cases could be problematic, the types of cases that a panel of experienced radiologists deemed as difficult was first examined in order to compare the performance of both film-reading groups against this panel standard. Secondly, any differences in performance error and case characteristics (classification, difficulty level and feature type) between radiologists and advanced practitioners were examined. The decisions of 15 experienced "panel" radiologists and approximately 400 film readers (including radiologists and advanced practitioners) were compared on 180 cases, over a number of years. This study employed a matched design which controlled for any differences between radiologists and advanced practitioners in terms of real-life factors, such as volume of cases read per week and years of radiological experience. The results elucidate the type of cases most appropriate for advanced mammographic training. No significant differences were found between the advanced practitioners and radiologists on these self-assessment screening cases, indicating that dedicated occupational group training is not required.


    Introduction
 Top
 Abstract
 Introduction
 Methods and materials
 Results
 Discussion
 Conclusion
 References
 
The perceptual and cognitive difficulties facing the radiologist in interpreting medical images, particularly ones with a very low incidence of abnormality such as in screening, have been described previously [1, 2]. The NHS Breast Screening Programme (NHSBSP) has been running for over 15 years in the UK and invites women aged 50–70 years for regular screening. Detecting early signs of breast cancer is difficult as the disease is rare, with a cancer incidence of approximately seven cases per 1000 women screened [3]. Whilst breast screening yields feedback to the radiologist for FP (false positive) and TP (true positive) decisions of potential abnormality presence via the outcome of further investigative procedures, the feedback for TN (true negative) or FN (false negative) decisions is necessarily slow and can take up to three years, when the woman returns for her next screening (incident) round. Consequently, the PERFORMS (Personal Performance in Mammographic Screening) self-assessment scheme was developed in 1991 as an educational tool with the Royal College of Radiologists and the NHSBSP (National Health Service Breast Screening Programme) [47]. This scheme bi-annually offers individuals the opportunity, both voluntarily and anonymously, to examine a range of recent challenging screening cases and provides immediate feedback on the accuracy of an individual's decisions as against known case pathology, concerning whether to recall a case or not, and also with regard to aspects of the mammographic appearance of each case where a participant's decisions are compared to the opinions of a panel of experienced radiologists.

Since the inception of the NHSBSP there has been a substantial fluctuation in film-reading personnel, moving to an increasingly flexible multi-disciplinary team. This has developed from the single radiologist to two radiologists double-reading, double reading with advanced practitioners and, finally, to the present situation in some breast screening units of advanced practitioners double-reading together for non-recalled cases.

Recent studies by the Royal College of Radiologists [8] have highlighted the current shortfall in radiologists in breast screening throughout the country. In 2003, approximately 59% of breast screening units had vacancies for radiologists, an increase in the number of positions previously noted (49%) in 1997. Additionally, as fewer radiologists are choosing to train in breast screening, there is a problem of the increased workload that causes pressure on existing staff, which is further exacerbated by the upper age range of invited women having increased to 70 years (with a potential lowering of age range to 40 years for those invited for the prevalent screening round). This workforce issue was recognised in the NHS Cancer Plan [9], which identified the need to increase both the number of radiologists and radiographers working in breast screening and proposed a four tier workforce approach, importantly with advanced practitioner radiographers undertaking mammographic film reading. Studies examining occupational differences in radiological skill [10] from UK Breast Screening Units as well as data from the PERFORMS scheme [11] indicate that advanced practitioners perform well. The background experience of radiologists and advanced practitioners is necessarily different; therefore, given the changing workforce, the question can be asked as to whether advanced practitioners have different training needs regarding identifying mammographic features accurately and making correct case classification decisions as compared to radiologists. In addressing this question, one approach would be to examine the existing anonymous data from participants who undertake the PERFORMS scheme as this provides information on large groups of radiologists and advanced practitioners who have all examined the same screening cases. Consequently, this paper presents data from several sets of PERFORMS cases over three years. Each set contains normal, benign and malignant classifications with a wide range of mammographic feature appearances.

Two studies are reported. In the first, the type of cases and mammographic features that participants in the scheme incorrectly identify were examined in relation to the judged difficulty of the cases by an experienced panel of radiologists. It was hypothesized that those cases which were judged by the panel as the more difficult would give rise to more incorrectly reported cases by the participants. Secondly, these participants' data were further examined in terms of radiologists and advanced practitioners. It was proposed that if differences in performance, based upon decision errors, were found between the two groups then such data would indicate the need for dedicated types of training to be offered to each occupational group.


    Methods and materials
 Top
 Abstract
 Introduction
 Methods and materials
 Results
 Discussion
 Conclusion
 References
 
Every year a number of mammographic screening cases are selected from a large number of cases submitted from UK screening centres to make up an annual set of PERFORMS self-assessment cases. These are then distributed to the screening units bi-annually and individuals can elect to read them. Cases are selected so as to be challenging to participants in the scheme and thus offer both a self-assessment as well as training element. Each case represents a classification of malignant, benign (both from known case pathology), or normal (based upon a normal three year follow up) outcome. The consensus opinion of a panel of experienced screening radiologists provides a "panel opinion" as to each case's screening decision (i.e. whether in a screening situation that case should be recalled or not, based solely upon its radiographic appearance), suitability for the self-assessment scheme and a rating of case difficulty. Similarly, a consensus opinion is derived for the identification of the presence of a range of key mammographic features, namely: well-defined mass (WDM); ill-defined mass (IDM); spiculate mass (Spic); architectural distortion (AD); calcification (Calc); asymmetry (Asym), other appearances (other) or no key features present (none). Participants from all UK breast screening units can then voluntarily read the set of 120 cases, split into two sets of 60 cases, and their decisions are compared both to known pathology as well as to the experienced radiological panel decisions. At the completion by all participants of each PERFORMS set, the opinion of the panel is superseded by a "national radiological opinion" which is derived from known pathology together with the decisions of all participating film readers. Full details of the scheme have been described previously [4].

Study 1
For this study, the radiological panel comprised 15 experienced screeners who individually rated the difficulty of the 180 PERFORMS cases considered here using a four point scale ranging from "very easy" to "very difficult". The anonymous data of 400 participants reporting on these cases was then examined to investigate the overall effect of case classification and feature type. In order to determine which cases were the most difficult for participants, the accuracy of their classification judgements on each case was measured by the percentage of errors (FP and FN decisions) as to whether a case was correctly judged to be recalled or correctly returned to screen.

Study 2
To assess the effect of case difficulty on occupational group, the data from Study 1 were re-examined according to whether a participant was a consultant radiologist or an advanced practitioner. Such information is recorded as part of the PERFORMS scheme. A matched group design, whereby an equal number of radiologists and advanced practitioners were selected, was then employed in order to control for any real-life effects of annual breast screening case volume or years of breast screening film reading experience as both these factors have previously been shown to have an impact on performance [11, 12].


    Results
 Top
 Abstract
 Introduction
 Methods and materials
 Results
 Discussion
 Conclusion
 References
 
Study 1
The case difficulty levels recorded by the radiological panel (n = 15) were first examined as a function of classification type (normal, benign, or malignant). Analysis of variance (ANOVA) revealed significant overall differences [F(2,70724) = 4419, p<0.001], and additionally Student Newman Keuls identified that normal cases were reported as significantly less difficult than the benign and malignant cases (p<0.05). Consequently, it would be expected that participants would make less errors, both FP and FN decisions, on the normal cases. However, in contrast, the percentage of cases incorrectly classified by the participants (n = 400) indicated that they made significantly less errors on the malignant and benign cases as compared to the normal cases (p<0.05). These results are detailed in full elsewhere [13].

In terms of the difficulty of specific mammographic features, the radiological panel rated spiculate masses and architectural distortions as significantly more difficult features to identify, with calcifications and "none" (absence of key mammographic features) rated the least difficult. Conversely, the participants found cases containing spiculate masses and architectural distortions the least problematic and performed significantly less well on asymmetries (p<0.5).

As some mammographic features are more prevalent within particular case classifications, feature type was examined as a function of case classification. Student Newman Keul's (SNKs) test identified that, for normal cases, the radiological panel rated asymmetries as the most difficult (p<0.05) feature, whereas for benign cases the panel rated spiculate masses (p<0.01) as the most difficult; for malignant features, architectural distortions (p<0.05) were considered as the most problematic. The participants' performance was quite different. A univariate ANOVA, with one dependent variable (DV - incorrectly reported case) and two independent variables (IV - case classification and feature type) showed that there was a significant effect of both case classification [F(2,73482) = 894.713, p<0.001] and feature type [F(6,73482) = 188.841, p<0.001]. There was also a significant interaction of classification by feature [F(8,73482) = 216.32, p<0.001]. SNKs post hoc tests revealed significant differences between groups of features (Figure 1Go). For normal cases, calcification had the highest percentage of incorrect responses. For benign cases, asymmetry posed the most problems and was reported incorrectly over 60% of the time. For the malignant cases, the ill-defined masses were the most challenging to classify, although performance levels overall were far higher than for normal and benign cases (see Figure 1Go).


Figure 1
View larger version (15K):
[in this window]
[in a new window]

 
Figure 1. Participants: mean incorrectly reported case per type of mammographic feature.

 
Study 2
Each participant from the consultant radiologist group was carefully matched with an individual from the advanced practitioner group, both on the volume of breast screening cases read and the individuals' years of experience in breast screening. This resulted in two groups of 90 matched individuals whose anonymous data were examined over the 180 PERFORMS cases. No differences were found for either of these two matching factors (p = n.s.), as shown in Figure 2Go.


Figure 2
View larger version (14K):
[in this window]
[in a new window]

 
Figure 2. A matched design– subject characteristics. This relates to hundreds of screening cases per week.

 
The number of cases incorrectly reported by each occupational group was examined (Figure 3Go). A univariate ANOVA with one DV (incorrectly reported cases) and two IVs (case classification and occupational group) revealed no significant differences between the two matched groups (or group x classification interactions). However, a significant main effect of case classification [F(2,5396) = 47.9, p<0.001] was found. Post hoc tests indicated that both groups performed significantly better on the malignant cases (p<0.05) as compared to the benign and normal cases (with performance on these two classifications not significantly different). Therefore, the reporting performance of the two matched groups was similar to the overall participant performance in Study 1, with malignant cases reported more correctly than normal or benign cases.


Figure 3
View larger version (14K):
[in this window]
[in a new window]

 
Figure 3. Incorrectly reported cases by case classification.

 
An abnormality may be misreported due to the difficulty of recognising certain types of mammographic features present and some features may prove to be more difficult to identify than others. Consequently, the type of mammographic features identified by the two groups on the incorrectly reported cases was examined (Figure 4Go). A univariate ANOVA with one DV (incorrectly reported case) and two IVs (occupational group and feature type) showed a significant main effect of feature [F(6,5396) = 10.32, p<0.001]. There were no significant group differences (p = n.s) or group by feature interactions.


Figure 4
View larger version (14K):
[in this window]
[in a new window]

 
Figure 4. Incorrectly reported cases by feature type and occupation. Spic: spiculated mass, AD: architectural distortion, Calc: calcification, IDM: ill defined mass, None: no features present, WDM: well defined mass, Asym: asymmetric density.

 
Although there were no significant interaction effects, Figure 4Go shows that there were slight differences between the incorrectly reported cases by the two groups and the features identified as present by the radiological panel. Although these trends did not reach statistical significance, asymmetries appeared to be more problematic for radiologists than for the advanced practitioners (although for advanced practitioners the percentage of incorrect cases was also high for this feature). Cases that were the easiest to report correctly were very similar, with both groups performing better on those cases where the main feature appearances were spiculate masses and architectural distortions. These response patterns, which may be indicative of case difficulty, are similar to those found in the overall participants' analysis (Study 1).

Inherently particular mammographic features are more common in certain case classifications; for example, there is a greater presence of spiculate masses in malignant cases. Therefore, the difficulty in correctly identifying certain features (as measured by the number of incorrect cases reported) were analysed by classification type (normal, benign and malignant) as well as by occupational group. Overall analysis revealed that there was a main effect of classification [F(2,5362) = 37.60, p<0.001], feature type [F(6,5362) = 8.871, p<0.001] and a classification x feature interaction [F(8,5362) = 14.152, p<0.001], although there were no main effects of occupation, nor were there any significant interactions with occupation (p = n.s).

Normal cases
For the normal cases (see Figure 5Go), both groups correctly reported those cases containing asymmetries and produced more incorrect reports on cases containing calcifications. Again, this trend is similar to the overall analysis in Study 1.


Figure 5
View larger version (14K):
[in this window]
[in a new window]

 
Figure 5. Normal cases incorrectly reported and feature type.

 
Benign cases
Figure 6Go shows that for benign cases both groups found the cases with asymmetry to be the most challenging (similar to the overall performance results).


Figure 6
View larger version (14K):
[in this window]
[in a new window]

 
Figure 6. Benign incorrectly reported cases and feature type.

 
Malignant cases
The pattern for malignant cases (Figure 7Go) was similar to the overall performance as both groups found ill-defined masses to be the most difficult feature to correctly classify.


Figure 7
View larger version (13K):
[in this window]
[in a new window]

 
Figure 7. Malignant incorrectly reported cases by feature type.

 

    Discussion
 Top
 Abstract
 Introduction
 Methods and materials
 Results
 Discussion
 Conclusion
 References
 
This research investigated the performance of screening personnel on a number of carefully selected self-assessment screening cases. The first study examined the association between cases judged difficult by an experienced panel of radiologists and the overall performance of all PERFORMS participants on those cases. It was found that those cases which the radiological panel rated as difficult did not necessarily reflect those cases which participants (comprising consultant radiologists, advanced practitioners and others) in the PERFORMS scheme actually found to be problematic as judged by their incorrect case reporting decisions. More importantly, this study established a measure of which cases (in terms of classification and key mammographic features) were the most problematic for the participant group as a whole. These were benign cases with asymmetry and normal cases with calcification. Both of these characterise decisions relating to specificity. For the malignant cases, the number of errors observed was notably less, with ill-defined masses as the least well-recalled feature (c.f. [14]).

The lack of correspondence between experts' rating of difficulty and actual participants' performance may well reflect expertise differences between the panel members and the participants. As the participant group comprised several specialities, such a difference may also indicate variations in reporting by different subject specialists, and these occupational groups may find different aspects of particular cases challenging. If this was the situation then elucidating such differences would indicate that different occupational groups may require different types of training to support the maintenance of their screening performance.

Consequently, Study 2 investigated potential differences between a matched group of consultant radiologists and advanced practitioners. This demonstrated that when groups of carefully matched subjects were compared, both performed similarly. In the analysis of case difficulty by classification, both groups found malignant cases the least problematic and feature type revealed similar trends in the sort of cases both groups found difficult. In terms of feature type, the radiologists had a greater difficulty with cases containing well-defined masses and asymmetries, and advanced practitioners performed least well on cases containing ill-defined masses and asymmetries. However, there were no significant group differences for error and feature type (although there were some descriptive percentage differences, these did not reach statistical significance). When feature type by case classification was examined, there were no occupational differences; incidence of case reporting error (and hence the judged case difficulty levels) were similar.

These results show that what constitutes a difficult case for both consultant radiologists and advanced practitioners were closely related and coincided with the pattern of challenging cases found in the overall film reading population (Study 1).

These findings then indicate that advanced practitioners and radiologists do not require different types of training for the identification of mammographic features or for case classification. The data further indicate that it may be advantageous to provide training for specific mammographic feature types for different case classifications.


    Conclusion
 Top
 Abstract
 Introduction
 Methods and materials
 Results
 Discussion
 Conclusion
 References
 
No key differences were found in the screening cases found to be difficult by both radiologists and advanced practitioners. These data indicate that common training approaches are appropriate.


    Acknowledgments
 
This work is supported by the NHS Breast Screening Programme.


    References
 Top
 Abstract
 Introduction
 Methods and materials
 Results
 Discussion
 Conclusion
 References
 

  1. Manning DJ, Gale AG, Y Krupinski EA. Perception research in medical imaging. Br J Radiol 2005;78:683–5.[Free Full Text]
  2. Gale AG. Human response to visual stimuli. In: Hendee W and Wells P, editors. Perception of Visual Information (second edition). New York, NY: Springer Verlag, 1997
  3. Patnick J, editor. Annual review 2005: One Vision: NHS Breast Screening Programme. Sheffield, UK: Fulwood House, 2005
  4. Gale AG. PERFORMS – a self assessment scheme for radiologists in breast screening. Sem Breast Dis 2003;6:148–52.
  5. Gale AG and Walker GE. Design for performance: quality assessment in a national breast screening programme. In: Lovesay E, editor. Ergonomics: design for performance. London, UK: Taylor & Francis, 1991
  6. Cowley H and Gale AG. Minimising human error in the detection of breast cancer. In: SA Robertson, editor. Contemporary Ergonomics. London, UK: Taylor and Francis, 1996
  7. Cowley H and Gale AG. Breast cancer screening: comparison of radiologists performance in a self-assessment scheme and in actual breast screening. In: Krupinski EA, editor. Medical Imaging 1999, Image perception and Performance. Proceedings of SPIE 1999;3663:157–68.[CrossRef]
  8. RCRBreastGroup.com [homepage on the Internet]. London: The Royal College of Radiologists. Available from: http://www.rcrbreastgroup.com/BreastGroup/2ndBSPsurvey.html [Accessed: 11 September 2006]
  9. NHS Cancer Plan: A plan for investment, a plan for reform. London, UK: Department of Health, 2000. [Available at: http://www.dh.gov.uk/assetRoot/04/01/45/13/04014513.pdf [Accessed: 15 November 2006]
  10. Wivell G, Denton ERE, Eve CB, Inglis KC and Harvey I. Can advanced practitioners read mammograms? Clin Radiol 2003;58:63–7.[CrossRef][Medline]
  11. Scott HJ, Gale AG, Wooding DS. Breast Screening Technologists: does real-life case volume affect performance? In: Chakraborty DP and Eckstein MP, editors. Medical Imaging 2004, Image Perception, Observer Performance and Technology Assessment. Proceedings of SPIE 2004;5372:399–406.[CrossRef]
  12. Esserman L, Cowley H, Eberle C, Kirkpatrick A, Chang S, Berbaum K, et al. Improving the accuracy of mammography: volume and outcome relationships. JNCI 2002;94:369–75.[Abstract/Free Full Text]
  13. Scott HJ and Gale AG. Breast screening: when is a difficult case truly difficult and for whom? In: Eckstein MP and Jiang Y, editors. Medical Imaging 2005, Image Perception, Observer Performance and Technology Assessment. Proceedings of SPIE 2005;5749:557–65.[CrossRef]
  14. Duncan KA, Needham G, Gilbert FJ, Deans HE. Incident round cancers: what lessons can we learn? Clin Radiol 1998;53:29–32.[CrossRef][Medline]




This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF)
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Scott, H J
Right arrow Articles by Gale, A G
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Scott, H J
Right arrow Articles by Gale, A G


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
BJR DMFR IMAGING  ALL BIR JOURNALS