BJR
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS

British Journal of Radiology (2003) 76, 328-331
© 2003 British Institute of Radiology
doi: 10.1259/bjr/17252624

This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF)
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Bosch, A M
Right arrow Articles by van Engelshoven, J M A
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Bosch, A M
Right arrow Articles by van Engelshoven, J M A

Full Paper

Interexamination variation of whole breast ultrasound

A M Bosch, MD1, A G H Kessels, MD, MSc2, G L Beets, MD, PhD1, K L C G Vranken, MD3, A C Borstlap, MD4, M F von Meyenfeldt, MD, PhD1 and J M A van Engelshoven, MD, PhD3

1 Department of Surgery, 2 Department of Clinical Epidemiology & Medical Technology Assessment, 3 Department of Radiology, Maastricht University Hospital, P. Debyelaan 25, NL-6229 HX Maastricht and 4 Department of Radiology, St. Maartens Gasthuis, PO Box 1926, NL-5900 BX Venlo, The Netherlands

Correspondence: A M Bosch, Maastricht University Hospital, Dept. of Surgery, PO Box 5800, NL-6202 AZ Maastricht, The Netherlands


    Abstract
 Top
 Abstract
 Introduction
 Patients and methods
 Results
 Discussion
 Conclusion
 References
 
The purpose of this study was to determine the interexamination agreement of ultrasound of the breasts. This includes the complete process of performing whole breast ultrasound and interpreting the dynamic scanning and the static images by one person. In a prospective study, 58 patients with a clinical indication for mammography underwent an ultrasound examination of both breasts by three independent sonographers. The sonographers had full knowledge of the physical and mammographic findings. Histology and 12 month follow-up were used as the reference standard. Interobserver variability for both mammography and breast ultrasound was measured using linearly weighted kappa statistics. Receiver operator characteristic curves were constructed to compare the diagnostic performance of the observers. The interexamination agreement for the score of the probability of malignancy after mammography was substantial (kappas ranged from 0.63 to 0.65). The interexamination agreement for the final score of the probability of malignancy after mammography and ultrasound examination was slightly better (kappas ranged from 0.72 to 0.75). The area under the receiver operating characteristic curves after mammography and ultrasound examination ranged from 0.97 to 0.98. Ultrasound examination of the whole breast shows a substantial interexamination agreement. Ultrasound examination of the breast adds consistency to mammography and physical examination.


    Introduction
 Top
 Abstract
 Introduction
 Patients and methods
 Results
 Discussion
 Conclusion
 References
 
Ultrasound (US) has become an established diagnostic adjunct to mammography in the evaluation of breast abnormalities. The main goals of US of the breast are to differentiate cystic from solid lesions, benign solid lesions from malignant solid lesions [15], clarify difficulties in mammographic interpretation. An important drawback of US is operator-dependency. The interobserver variation of reading breast US hardcopy images has been assessed in previous studies [611]. However, these studies are not representative of the interexamination variation of the complete procedure. The selection of static images, used in previous studies, is influenced by the ability of the sonographer to recognize breast abnormalities during dynamic scanning. It is clear that the true interexamination variation consists of more than just the differences in reading hardcopy images. Therefore the true reproducibility may be lower than suggested in the previous reports. To our knowledge, the interexamination variation of the complete process of performing the US examination and interpreting the dynamic scanning and the static images by the same person has not been previously assessed.

The aim of this study was to assess the interexamination variation of the complete US procedure.


    Patients and methods
 Top
 Abstract
 Introduction
 Patients and methods
 Results
 Discussion
 Conclusion
 References
 
The Medical Ethics Committee of the University Hospital Maastricht approved this prospective study. It was part of a larger project in which all patients referred for mammography underwent an additional US examination of the breasts to assess the added value of whole breast US for various indications.

Between March and August 2000, patients were selected from those referred for mammography to the Department of Radiology of the University Hospital Maastricht. After giving informed consent, the patients consecutively underwent a standard physical examination of the breasts, mammography and three US examinations of the breasts by three different sonographers.

Selection was made to increase the number of abnormalities at breast US and to obtain a more even distribution of the different US results compared with the normal population. Patients were selected when they were referred by a surgeon, or when the request form for mammography stated the presence of a palpable lump or local pain, or an increased risk of breast cancer.

A resident with special interest in sonology performed physical examination in the Department of Radiology. The presence of a mass, as well as its location, consistency, size and adherence to the surrounding tissues were recorded. A final probability of malignancy was assigned in a 5-point scale: no abnormalities; benign finding; probably benign finding; malignancy suspected; and malignant finding.

Mammography was performed using craniocaudal and mediolateral oblique projections (Bennet Contour Plus, Oldelft-Benelux, Delft, The Netherlands and Kodak Min-R film screen combination). When indicated, coned-down views, magnification views or views in a third direction were added. Mammography was interpreted by a radiologist with special interest and experience in breast imaging. The mammography images were scored for the density of the breast tissue (4-point scale as described in the BI-RADS reporting system [12]), the presence of masses, the location, size, type of the masses, and presence of microcalcifications. On the basis of this evaluation the probability of malignancy was scored on a 5-point scale (based on the BI-RADS lexicon for mammography) [12]: (1) no abnormalities; (2) benign finding; (3) probably benign finding; (4) malignancy suspected; and (5) malignant finding.

US examination was carried out using an ATL ultrasound scanner (HDI 5000, ATL, Bothell, Washington, USA) and a 12–5 MHz linear array transducer. There was no time limit for performance the whole breast US examination. The three sonographers were two experienced radiologists and one resident. The radiologists had 3 years and over 5 years of experience, respectively. The radiologist with over 5 years of experience was also participating in the National Breast Cancer Screening Program. The resident had carried out 500 breast US examinations prior to this study. Each sonographer was informed about the results of the physical examination and had access to the mammogram report. The ultrasound exams were performed in random order by the three sonographers while they were blinded to the ultrasound results of their colleagues. The presence of any lesion was noted by each sonographer, as well as its location, size, margins, posterior echoes and echogenicity. The US diagnosis and the final probability of malignancy was scored on a 5-point scale (based on the BI-RADS-score under development for ultrasound [13]): no abnormalities; benign finding; probably benign finding; malignancy suspected; and malignant finding were recorded.

In order to determine whether US adds consistency to mammography and physical examination, the three sonographers independently interpreted the mammograms with knowledge of the physical information about the breasts. This assessment was made 3 months after the initial diagnostic procedure. The same items as described above were scored.

Finally, we studied the interexamination agreement in subgroups of patients based on: (1) the presence and absence of a palpable mass; (2) the presence and absence of a lesion on mammography; (3) the density of the breast on mammography (75% dense breasts compared with less then 50% dense breasts); and (4) the presence or absence of an accepted indication for breast ultrasound. Breast ultrasound was considered indicated when there was a palpable lesion, and/or a mammographic lesion with a BI-RADS-score of 3 or higher and/or inconclusive mammography because of high density breasts.

The final diagnosis for each breast was established by histology and follow-up for 12 months. Pathology results were retrieved from the hospital department of pathology and the Dutch Network and National Database for Pathology (PALGA). As all national hospital pathology departments are linked to this database, complete coverage of the study population was assured, including patients who were diagnosed elsewhere. The final diagnoses were divided into: (1) no abnormalities; (2) benign cystic findings; (3) benign solid lesions; and (4) malignant findings.

Statistics
To measure the extent of agreement between the examinations linearly weighted kappa values were calculated. The kappa statistic measures the proportion of decisions in which observers agree while accounting for the possibility of agreement based on chance. Perfect agreement results in a kappa value of 1.0, and a kappa value of 0 indicates the level of agreement expected based on chance alone. Landis [14] indicated kappa values of 0.2 or less as slight agreement, 0.21–0.40 as fair, 0.41–0.60 as moderate, 0.61–0.80 as substantial, and 0.81–1.00 indicates almost perfect agreement between observers. Other researchers consider kappa values of 0.50 or less as poor and values of 0.75 or more as excellent reproducibility [15]. Differences in kappas between the mammographic and US results were tested using the jack-knife method [16].

Diagnostic performance was evaluated using the five levels of suspicion categories in receiver operating characteristic (ROC) analysis. ROC analysis was carried out for the combined result of the physical examination, mammography and US of each sonographer. The area under the ROC curves (AUC-ROC) was used as a measure of diagnostic performance. The differences between the areas under two ROC-curves were compared, taking into account that both curves were derived from the same cases [17].

A p-value of {els]le;0.05 was considered as statistically significant.


    Results
 Top
 Abstract
 Introduction
 Patients and methods
 Results
 Discussion
 Conclusion
 References
 
Between March 2000 and August 2000, 58 women (113 breasts, 3 patients post unilateral mastectomy) were included in this study. The mean age of the women was 52.6 years (range 18.4–86.9 years). The indication for mammography was a palpable lesion in 21 patients (22 breasts), other local breast complaints such as pain in 10 women (10 breasts), a BI-RADS-score of 3 or more on the mammograms of the national breast screening program in 16 patients (16 breasts), follow-up of malignant breast disease in 7 patients (12 breasts) and a high risk of breast cancer because of a positive family history in 4 (8 breasts). The remaining 45 breasts were normal asymptomatic contralateral breasts.

68 breasts contained 1 or more lesions (60%), out of which 11 were malignant. The mean radiological size of the lesions was 14 mm. The mean histological size of the lesions was 15 mm. Normal or benign lesions were confirmed by histology in 13 cases, by repeated mammography in 27 cases and by a follow-up of 12 months in 62 cases.

The interexamination agreement (kappa value) between the three sonographers in diagnosing the probability of malignancy for the US examination of the breasts ranged from 0.72 to 0.75 (all three with a standard deviation (sd) of 0.04). The interobserver agreement for reading the mammography images of the same patients, and by the same observers ranged from 0.63 to 0.65 (sd=0.06) (Table 1Go). The total consistency (US with physical and mammographic information) increased compared with physical and mammographic information only. These differences were statistically significant for sonographer 2 and 3 (p=0.008).


View this table:
[in this window]
[in a new window]
 
Table 1. Kappa scores and 95% confidence interval of the 5-point scale of probability of malignancy by mammography and ultrasound for three observers

 
Excluding the normal breasts did not affect the mammographic interexamination agreement (mean kappa value 0.545, sd=0.077). The sonographic interexamination agreement decreased slightly (mean kappa value=0.651, sd=0.065).

For the subgroups based on clinical information the mean kappas of the three sonographers were calculated (Table 2Go). There was a significant difference in kappa value for the subgroups based on the density of the breast on mammography.


View this table:
[in this window]
[in a new window]
 
Table 2. Mean linearly weighted kappa scores of the 5-point scale of probability of malignancy after ultrasound and the difference within the subgroups

 
As an index for the diagnostic performance, ROC-curves of each sonographer were constructed (Figure 1Go). The AUC-ROC-curves for the three sonographers did not differ significantly.



View larger version (10K):
[in this window]
[in a new window]
 
Figure 1. Receiver operating characteristic curves of the diagnostic accuracy of breast ultrasound by three sonographers.

 

    Discussion
 Top
 Abstract
 Introduction
 Patients and methods
 Results
 Discussion
 Conclusion
 References
 
US examination of the whole breast with prior knowledge of physical and mammographic findings, shows substantial interexamination agreement (kappa-value 0.72–0.75) and adds consistency to the interpretation of the combined physical and mammographic examination (kappa-value 0.63–0.65).

Studies examining the total interexamination variability of whole breast US examination, being the US scanning procedure combined with the interpretation of the images are, to our knowledge, not reported in the literature. For consistency in reading breast US images alone, the kappa values range from 0.32 to 0.62 [6, 8, 10, 11]. In Table 3Go a comparison of some features of these studies is presented. Those studies determined interobserver examination retrospectively, from images with known lesions and a high cancer prevalence. These images had been obtained previously by a sonographer who was not the interpreting observer. The observers retrospectively determined the characteristics of the lesions and the probability of malignancy. In our study, we not only determined the lesion characteristics, but also examined whether or not a lesion in a breast was detected. Two possible sources of disagreement: obtaining images and interpreting images were included in our study. In spite of the enhanced chance of disagreement by introducing the extra source, we obtained substantial agreement. An explanation might be the number of normal breasts (40%) in our study compared with other studies (0%). Excluding the normal breasts did not affect the mammographic interexamination agreement and decreased sonographical interexamination agreement slightly.


View this table:
[in this window]
[in a new window]
 
Table 3. Literature on interobserver variability of breast ultrasound

 
US examination of the breasts was performed as an adjunct to physical and mammographic findings. This is current policy and our daily practice and for this reason we decided to test the interexamination consistency in this context. At the time of the US examination the mammogram and its results assessed by the attending radiologist were available to all three sonographers, creating the same conditions for the three sonographers. The availability of both the physical and mammographic information cannot be the cause of the higher agreement, because this information was also available in Zonderland's and Skaane's study. This justifies the Dutch daily practice in which the reviewing radiologist is also the assessing radiologist.

Despite the differences in experience of the three sonographers the kappa-values did not differ significantly between the three sonographers. The resident sonographer seems to have reached the required breast US experience plateau after 500 US.

Except for the 24 breasts for which we had a pathological diagnosis and the 27 cases, which underwent radiological follow-up examination, the diagnosis after a clinical follow-up period of 12 months was considered as a reference test (n=62). No false-negatives were found in this group, but the observation period of 12 months is short. However, our main goal was to study the interexamination agreement and not the diagnostic accuracy.

Table 2Go, showing kappa-values for the subgroups, includes no significant difference within the subgroups, except for the dense/non-dense breast tissue group. Non-dense breasts showed high interobserver agreement on the mammography findings and therefore after US an increased interexamination agreement. Dense breasts often yield inconclusive mammograms. US might be expected to increase the sensitivity and diagnostic accuracy of the radiological imaging [2, 3]. Our results showed a substantially lower agreement after US in the group of patients with dense breasts on the mammography.


    Conclusion
 Top
 Abstract
 Introduction
 Patients and methods
 Results
 Discussion
 Conclusion
 References
 
The interexamination agreement for whole breast US is substantial when this examination is performed as adjunct to physical examination and mammography.


    Acknowledgments
 
The authors want to thank Dr D Koster for his effort in the performance and interpretation of the imaging examinations, the radiology technologists for their assistance in the study procedures, Mrs P Habets for data-entry and Mrs M Casparie from PALGA for supplying us with pathology data.


    Footnotes
 
Funded by the Dutch Health Care Insurance Executive Board. Back

Received for publication August 29, 2002. Revision received January 2, 2003. Accepted for publication February 13, 2003.


    References
 Top
 Abstract
 Introduction
 Patients and methods
 Results
 Discussion
 Conclusion
 References
 

  1. Stavros AT, Thickman D, Rapp CL, Dennis MA, Parker SH, Sisney GA. Solid breast nodules: use of sonography to distinguish between benign and malignant lesions. Radiology 1995;196:123–34.[Abstract/Free Full Text]
  2. Kolb TM, Lichy J, Newhouse JH. Occult cancer in women with dense breasts: detection with screening US—diagnostic yield and tumor characteristics. Radiology 1998;207:191–9.[Abstract/Free Full Text]
  3. Buchberger W, DeKoekkoek-Doll P, Springer P, Obrist P, Dunser M. Incidental findings on sonography of the breast: clinical significance and diagnostic workup. AJR Am J Roentgenol 1999;173:921–7.[Abstract/Free Full Text]
  4. Berg WA, Campassi C, Langenberg P, Sexton MJ. Breast Imaging Reporting and Data System: inter- and intraobserver variability in feature analysis and final assessment. AJR Am J Roentgenol 2000;174:1769–77.[Abstract/Free Full Text]
  5. Gordon PB, Goldenberg SL, Chan NH. Solid breast lesions: diagnosis with US-guided fine-needle aspiration biopsy. Radiology 1993;189:573–80.[Abstract/Free Full Text]
  6. Baker JA, Kornguth PJ, Soo MS, Walsh R, Mengoni P. Sonography of solid breast lesions: observer variability of lesion description and assessment. AJR Am J Roentgenol 1999;172:1621–5.[Abstract/Free Full Text]
  7. Rahbar G, Sie AC, Hansen GC, Prince JS, Melany ML, Reynolds HE, et al. Benign versus malignant solid breast masses: US differentiation. Radiology 1999;213:889–94.[Abstract/Free Full Text]
  8. Skaane P, Olsen JB, Sager EM, Abdelnoor M, Berger A, Kullmann G, et al. Variability in the interpretation of ultrasonography in patients with palpable noncalcified breast tumors. Acta Radiol 1999;40:169–75.[Medline]
  9. Shimamoto K, Sawaki A, Ikede M, Satake H, Naganawa S, Tadokoro M, et al. Interobserver agreement in sonographic diagnosis of breast tumors. Eur J Ultrasound 1998;8:25–31.[CrossRef][Medline]
  10. Skaane P, Engedal K, Skjennald A. Interobserver variation in the interpretation of breast imaging. Comparison of mammography, ultrasonography, and both combined in the interpretation of palpable noncalcified breast masses. Acta Radiol 1997;38:497–502.[Medline]
  11. Zonderland HM, Hermans J, Holscher HC, Schipper J, Obermann WR. Additional value of US to mammography: profit and loss. Eur Radiol 1994;4:511–6.
  12. American-College-of-Radiology, Illustrated Breast Imaging Reporting and Data System (Illustrated BI-RADS). American College of Radiology, 1998,179–81.
  13. Mendelson EB, Berg WA, Merritt CR. Toward a standardized breast ultrasound lexicon, BI-RADS: ultrasound. Semin Roentgenol 2001;36:217–25.[CrossRef][Medline]
  14. Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics 1977;33:159–74.[CrossRef][Medline]
  15. Svanholm H, Starklint H, Gundersen HJ, Fabricius J, Barlebo H, Olsen S. Reproducibility of histomorphologic diagnoses with special reference to the kappa statistic. APMIS 1989;97:689–98.[Medline]
  16. Efron B, Tibshirani RJ. An introduction to the bootstrap. New York, NY: Chapman & Hall. 1993;436.
  17. Hanley JA, McNeil BJ. A method of comparing the areas under receiver operating characteristic curves derived from the same cases. Radiology 1983;148:839–43.[Abstract/Free Full Text]



This article has been cited by other articles:


Home page
JAMAHome page
W. A. Berg, J. D. Blume, J. B. Cormack, E. B. Mendelson, D. Lehrer, M. Bohm-Velez, E. D. Pisano, R. A. Jong, W. P. Evans, M. J. Morton, et al.
Combined Screening With Ultrasound and Mammography vs Mammography Alone in Women at Elevated Risk of Breast Cancer
JAMA, May 14, 2008; 299(18): 2151 - 2163.
[Abstract] [Full Text] [PDF]


Home page
Am. J. Roentgenol.Home page
E.-K. Kim, K. H. Ko, K. K. Oh, J. Y. Kwak, J. K. You, M. J. Kim, and B.-W. Park
Clinical Application of the BI-RADS Final Assessment to Breast Sonography in Conjunction with Mammography
Am. J. Roentgenol., May 1, 2008; 190(5): 1209 - 1215.
[Abstract] [Full Text] [PDF]


Home page
RadiologyHome page
W. A. Berg, J. D. Blume, J. B. Cormack, and E. B. Mendelson
Operator Dependence of Physician-performed Whole-Breast US: Lesion Detection and Characterization.
Radiology, November 1, 2006; 241(2): 355 - 365.
[Abstract] [Full Text] [PDF]


Home page
RadiologyHome page
W. A. Berg, J. D. Blume, J. B. Cormack, E. B. Mendelson, and E. L. Madsen
Lesion Detection and Characterization in a Breast US Phantom: Results of the ACRIN 6666 Investigators
Radiology, June 1, 2006; 239(3): 693 - 702.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF)
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Bosch, A M
Right arrow Articles by van Engelshoven, J M A
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Bosch, A M
Right arrow Articles by van Engelshoven, J M A


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
BJR DMFR IMAGING  ALL BIR JOURNALS