BJR
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS

British Journal of Radiology (2003) 76, 561-563
© 2003 British Institute of Radiology
doi: 10.1259/bjr/14999231

This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Jeffreys, M
Right arrow Articles by Gunnell, D
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Jeffreys, M
Right arrow Articles by Gunnell, D

Short communication

Breast density: agreement of measures from film and digital image

M Jeffreys, MSc, PhD 1 R Warren, MD, FRCP, FRCR 2 G Davey Smith, FFPHM, DSc 3 and D Gunnell, FFPHM, PhD 3

1 Centre for Public Health Research, Massey University – Wellington Campus Private Box 756, Wellington, New Zealand, 2 Department of Radiology, Level 5, Box 219, Addenbrooke's Hospital, Hills Road, Cambridge CB2 2QQ and 3 Department of Social Medicine, University of Bristol, Canynge Hall, Whiteladies Road, Bristol BS8 2PR, UK

Correspondence: M Okasha


    Abstract
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion
 References
 
Mammographic density, in particular density from digital images, is increasingly used in breast cancer research. We investigated the concordance between density assigned by the same radiologist to a mammogram film and a digital image of the same mammogram. Two density measures were investigated, Wolfe parenchymal patterns and a six category classification (SCC) system of density. Included in the study were 78 women, 528 mammograms. Crude and weighted Kappa statistics were used to estimate agreement between the density assigned from the film and the image. Kappa for Wolfe measures was 71%, p<0.001 and for SCC measures was 54%, p<0.001. Weighted Kappa values were 79%, p<0.001 and 77%, p<0.001, respectively. There was some evidence to suggest that the digitized image may be assigned a higher Wolfe but not numerical category than the original film, and the magnitude of these differences was small. Neither age nor mammogram view (craniocaudal or mediolateral oblique) were related to the likelihood of agreement of the two density measurements. This evidence justifies the use of digital images in the visual assessment of breast density in research studies.


    Introduction
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion
 References
 
Mammographic density is increasingly used in breast cancer research and is recognised as an important predictor of risk. Women with breast density measures in the top quartile of the distribution have risks of breast cancer four to six times those of women in the lowest category (<25% dense) [1]. Traditionally, breast density has been assessed by radiologists from the mammogram. Two scoring schemes have been frequently used, namely the Wolfe system [2] and categorical assessments of the percentage of the area of the breast that appears dense, typically in six categories [3]. Recent advances in technology have emphasised the importance of digital images, with their ease of transmission between research centres, enhanced durability and possibilities for standardized assessment.

We used results from an ongoing study into the use of various measures of breast density to investigate the concordance between density categories assigned by the same radiologist to a mammogram film and a digital image of the same mammogram.


    Methods
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion
 References
 
The women included in the study are part of the Glasgow Alumni Cohort, which has been described in detail elsewhere [4]. All surviving cohort members were contacted by postal questionnaire in 2001. Those women who replied and still lived in Scotland were then asked to give permission for access to mammograms taken under the Scottish Breast Screening Programme (SBSP). The images were obtained from the relevant breast screening centre and were digitized on site with a Canon (Amsterdam, The Netherlands) 300 digitizer scanner at a resolution of 100 microns with 8 bit precision by a single radiographer. The images were transferred from the digitizer to a laptop computer using file transfer protocol. Both craniocaudal (CC) and mediolateral oblique (MLO) images were included in the analysis. Density measures were made by one author experienced in density assessment (RW) using both the Wolfe scale and a categorical scale (six category classification, SCC). All assessments were made twice, once from the original film and once from the scanned image. Scanned images were displayed at 300 micron resolution on a flat-panel display system. This resolution is adequate for the gross assessment of film density and means that on a screen, that they are about the same size as a mammogram film. All the images were displayed to appear as if viewed on a light-box, using software written by Mirada Solutions (Oxford, UK; see www.mirada-solutions.com). The program uses the screen characteristics and the known film densities to work out the required pixel brightness to appear identical to a light box [5]. No adjustment needed to be made, or was made, during the reading period. No image post-processing was applied. The assessments from the film and digitized image were made approximately 1 month apart, at different locations. The radiologist was blind to the previously assigned value.

Statistical methods
Crude Kappa values were calculated to assess the agreement between the density measures obtained from the film and scanned images. Kappa is a measure of the level of agreement in excess of that which would be observed by chance. A Kappa value of 0% indicates that the agreement between two values is no greater than would be expected by chance. Kappa values of 60% to 80% indicate good agreement; those of over 80% indicate very good agreement [6]. Given the ordered nature of the data, weighted Kappa statistics were also calculated. This allows more weight to be placed on the two measures which are assigned to adjacent categories than on measures assigned to non-adjacent categories. The weights used decreased by 0.2 for each category removed from concordance. Thus, for adjacent categories a weight of 0.8 was used; if categories were two distant, a weight of 0.6 was used and so on. Random effects logistic regression models were used to determine whether the woman's age or the mammogram view (MLO or CC) were related to disparities between the density measures made from film and digital image. This method takes into account the non-independent nature of the data (multiple images per woman). For these models, the outcome variable was perfect vs imperfect concordance of measures.


    Results
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion
 References
 
528 mammograms, taken from 78 women formed the basis for the current analysis. Each of these mammograms was taken under the South East Scotland Breast Screening Programme at Edinburgh.

Of the 528 mammograms, Wolfe measures were available for 486 (92%) and SCC measures were available for 490 (93%) mammograms. Density measures were not assigned to the remainder of films or images because they were too pale or because the films were required by the clinic at the time of assessment or scanning. The numbers of films in each density category are shown in Table 1Go.


View this table:
[in this window]
[in a new window]
 
Table 1. Classification of mammograms: raw data

 
Agreement between the two measures was moderately good. Of the 486 Wolfe measures, 388 (80%) were assigned the same category from both mammograms and a further 90 were assigned adjacent categories (19%). For SCC measures, 306 (62%) were assigned the same category and a further 173 (35%) were assigned adjacent categories.

The Kappa value for the Wolfe measures was 71%, p<0.001 and for the SCC measures was 54%, p<0.001. When the weighted Kappa method was applied, the corresponding values were 79%, p<0.001 and 77%, p<0.001. Kappa values were also calculated using just the earliest left MLO mammogram per woman, because of the non-independence of the data. The Kappa values from this analysis were marginally lower than the above figures. Crude Kappa values were 69% and 44% for Wolfe and SCC measures; weighted Kappa values were 77% and 73%. The uncertainty in these estimates (standard error) was greater, since this analysis was based on 78 women compared with 528 images.

Using the figures shown in Table 1Go, it appears that there is some bias in the assessment of density. For Wolfe but not SCC measures, the density assigned tended to be higher for mammograms assessed from the digital image compared with those assigned to the films. For Wolfe measures, 66% (=65/98) of the discordant comparisons indicated that the digital image was more dense than the assessment made from film (p=0.002). For SCC, 55% (=101/184) of the discordant comparisons indicated that the digital image was more dense than the assessment made from film (p=0.21).

Results from the random effects logistic regression indicated that neither age nor mammogram view (CC or MLO) were related to the likelihood of agreement of the two density measurements. Furthermore, no consistent patterns of differing agreement across levels of density of the original mammogram were evident, i.e. the degree of concordance did not depend on the density assigned.


    Discussion
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion
 References
 
The data presented in this report indicate that density assessments from the original film and from a digitized image are reasonably similar. This was evident for density categorized according to Wolfe and SCC scales. There was some evidence to suggest that the digitized image may be assigned a higher Wolfe category than the original film, although the magnitude of this difference is small. These observations are important, given the increasing propensity for research into breast density being undertaken from digital images. It is important to note that these observations relate only to comparisons between films and images created from the digitization of mammograms. The results cannot be extended to include images captured using digital acquisition systems.

To our knowledge, no assessment of agreement of density measurement from film and digital image has previously been made. The degree of agreement which we have shown between film and image assessments is similar to the interindividual and intraindividual comparisons made in other studies. Unfortunately, studies used different methods to describe agreement. In interpreting this, it must be borne in mind that Kappa values tend to be lower that those of percentage agreement, since Kappa takes into account the possibility that some measures will agree by chance. An interindividual study found 94% agreement when mammograms were assessed blindly using the Wolfe scale by two independent radiologists [7]. The agreement in our study for Wolfe measures was 80%.

Measures of correlation are commonly used in studies of reproducibility, although this is incorrect [6]. One study reported the intraindividual reproducibility of the Wolfe scale, with one radiologist assessing the same film on two different occasions. This study found correlations of 0.88 between the two assessments [8]. For comparison, we calculated correlation coefficients for our data. These were 0.86 and 0.91 for Wolfe measures and SCC measures, respectively.

The use of Kappa values is the correct way of reporting agreement, taking into account the possibility of chance. A study in which 100 mammograms were assessed by 9 radiologists found agreement varying from 72% to 88%. Corresponding weighted Kappa values ranged from 0.40 to 0.80 [9]. Using the same weights, Toniolo and colleagues reported Kappa values of 0.51 for the agreement between two radiologists assessing the repeatability of Wolfe measures [10]. These results are very similar to those observed in our data.

It is unsurprising that the crude Kappa values that we observed for the Wolfe assessments were higher than those achieved for the categorical values, since the value of Kappa is dependent on the number of categories in the scale. This is because with a larger number of categories to choose from, the likelihood of being assigned to any one category is smaller. The Wolfe scale uses four categories (N1, P1, P2, DY) whereas in the categorical assessment we used six (0%, 1–10%,11–25%, 25–50%, 51–75%, >75%).

In summary, we have shown that the assessment of breast density using Wolfe patterns or categorical measures is similar when the measures are made from the original film and from the digital image. This evidence justifies the use of digitized mammograms in the visual assessment of breast density in research studies.


    Footnotes
 
This work was funded by the Breast Cancer Campaign and the Breast Cancer Research Trust. This work was completed while Dr Jeffreys was employed at the University of Bristol. Back

Received for publication October 18, 2002. Revision received March 19, 2003. Accepted for publication May 15, 2003.


    References
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion
 References
 

  1. Boyd N, Lockwood G, Byng J, Tritchler D, Yaffe M. Mammographic densities and breast cancer risk. Cancer Epidemiol Biomarkers Prev 1998;7:1133–44.[Abstract/Free Full Text]
  2. Wolfe JN, Saftlas AF, Salane M. Mammographic parenchymal patterns and quantitative evaluation of mammographic densities: a case-control study. AJR Am J Roentgenol 1987;148:1087–92.[Abstract/Free Full Text]
  3. Byng J, Boyd N, Fishell E, Jong R, Yaffe M. The quantitative analysis of mammographic densities. Phys Med Biol 1994;39:1629–38.[CrossRef][Medline]
  4. McCarron P, Davey Smith G, Okasha M, McEwen J. Life course exposure and later disease: a follow-up study based on medical examinations carried out in Glasgow University (1948–1968). Public Health 1999;113:265–71.[CrossRef][Medline]
  5. Highnam R, Brady M. Mammography image analysis. Dordrecht, The Netherlands: Kluwer Academic Publishers, 1999.
  6. Silman A, Macfarlane G. Epidemiological studies: a practical guide. Cambridge: Cambridge University Press, 2002.
  7. Sala E, Warren R, McCann J, Duffy S, Day N, Luben R. Mammographic parenchymal patterns and mode of detection: implications for the breast screening programme. J Med Screen 1998;5:207–12.[Abstract/Free Full Text]
  8. Atkinson C, Warren R, Bingham SA, Day NE. Mammographic patterns as a predictive biomarker of breast cancer risk: effect of tamoxifen. Cancer Epidemiol Biomarkers Prev 1999;8:863–6.[Abstract/Free Full Text]
  9. Boyd NF, Wolfson C, Moskowitz M, et al. Observer variation in the classification of mammographic parenchymal patterns. J Chronic Dis 1986;39:465–72.[CrossRef][Medline]
  10. Toniolo P, Bleich AR, Beinart C, Koenig KL. Reproducibility of Wolfe's classification of mammographic parenchymal patterns. Prev Med 1992;21:1–7.[CrossRef][Medline]



This article has been cited by other articles:


Home page
Am J EpidemiolHome page
C. Byrne
Invited Commentary: Assessing Breast Density Change--Lessons for Future Studies
Am. J. Epidemiol., May 1, 2008; 167(9): 1037 - 1040.
[Abstract] [Full Text] [PDF]


Home page
Br. J. Radiol.Home page
M Jeffreys, R Warren, R Highnam, and G Davey Smith
Initial experiences of using an automated volumetric measure of breast density: the standard mammogram form
Br. J. Radiol., May 1, 2006; 79(941): 378 - 382.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Jeffreys, M
Right arrow Articles by Gunnell, D
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Jeffreys, M
Right arrow Articles by Gunnell, D


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
BJR DMFR IMAGING  ALL BIR JOURNALS