| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
Short communication |
1 Centre for Public Health Research, Massey University Wellington Campus Private Box 756, Wellington, New Zealand, 2 Department of Radiology, Level 5, Box 219, Addenbrooke's Hospital, Hills Road, Cambridge CB2 2QQ and 3 Department of Social Medicine, University of Bristol, Canynge Hall, Whiteladies Road, Bristol BS8 2PR, UK
Correspondence: M Okasha
| Abstract |
|---|
|
|
|---|
| Introduction |
|---|
|
|
|---|
We used results from an ongoing study into the use of various measures of breast density to investigate the concordance between density categories assigned by the same radiologist to a mammogram film and a digital image of the same mammogram.
| Methods |
|---|
|
|
|---|
Statistical methods
Crude Kappa values were calculated to assess the agreement between the density measures obtained from the film and scanned images. Kappa is a measure of the level of agreement in excess of that which would be observed by chance. A Kappa value of 0% indicates that the agreement between two values is no greater than would be expected by chance. Kappa values of 60% to 80% indicate good agreement; those of over 80% indicate very good agreement [6]. Given the ordered nature of the data, weighted Kappa statistics were also calculated. This allows more weight to be placed on the two measures which are assigned to adjacent categories than on measures assigned to non-adjacent categories. The weights used decreased by 0.2 for each category removed from concordance. Thus, for adjacent categories a weight of 0.8 was used; if categories were two distant, a weight of 0.6 was used and so on. Random effects logistic regression models were used to determine whether the woman's age or the mammogram view (MLO or CC) were related to disparities between the density measures made from film and digital image. This method takes into account the non-independent nature of the data (multiple images per woman). For these models, the outcome variable was perfect vs imperfect concordance of measures.
| Results |
|---|
|
|
|---|
Of the 528 mammograms, Wolfe measures were available for 486 (92%) and SCC measures were available for 490 (93%) mammograms. Density measures were not assigned to the remainder of films or images because they were too pale or because the films were required by the clinic at the time of assessment or scanning. The numbers of films in each density category are shown in Table 1
.
|
The Kappa value for the Wolfe measures was 71%, p<0.001 and for the SCC measures was 54%, p<0.001. When the weighted Kappa method was applied, the corresponding values were 79%, p<0.001 and 77%, p<0.001. Kappa values were also calculated using just the earliest left MLO mammogram per woman, because of the non-independence of the data. The Kappa values from this analysis were marginally lower than the above figures. Crude Kappa values were 69% and 44% for Wolfe and SCC measures; weighted Kappa values were 77% and 73%. The uncertainty in these estimates (standard error) was greater, since this analysis was based on 78 women compared with 528 images.
Using the figures shown in Table 1
, it appears that there is some bias in the assessment of density. For Wolfe but not SCC measures, the density assigned tended to be higher for mammograms assessed from the digital image compared with those assigned to the films. For Wolfe measures, 66% (=65/98) of the discordant comparisons indicated that the digital image was more dense than the assessment made from film (p=0.002). For SCC, 55% (=101/184) of the discordant comparisons indicated that the digital image was more dense than the assessment made from film (p=0.21).
Results from the random effects logistic regression indicated that neither age nor mammogram view (CC or MLO) were related to the likelihood of agreement of the two density measurements. Furthermore, no consistent patterns of differing agreement across levels of density of the original mammogram were evident, i.e. the degree of concordance did not depend on the density assigned.
| Discussion |
|---|
|
|
|---|
To our knowledge, no assessment of agreement of density measurement from film and digital image has previously been made. The degree of agreement which we have shown between film and image assessments is similar to the interindividual and intraindividual comparisons made in other studies. Unfortunately, studies used different methods to describe agreement. In interpreting this, it must be borne in mind that Kappa values tend to be lower that those of percentage agreement, since Kappa takes into account the possibility that some measures will agree by chance. An interindividual study found 94% agreement when mammograms were assessed blindly using the Wolfe scale by two independent radiologists [7]. The agreement in our study for Wolfe measures was 80%.
Measures of correlation are commonly used in studies of reproducibility, although this is incorrect [6]. One study reported the intraindividual reproducibility of the Wolfe scale, with one radiologist assessing the same film on two different occasions. This study found correlations of 0.88 between the two assessments [8]. For comparison, we calculated correlation coefficients for our data. These were 0.86 and 0.91 for Wolfe measures and SCC measures, respectively.
The use of Kappa values is the correct way of reporting agreement, taking into account the possibility of chance. A study in which 100 mammograms were assessed by 9 radiologists found agreement varying from 72% to 88%. Corresponding weighted Kappa values ranged from 0.40 to 0.80 [9]. Using the same weights, Toniolo and colleagues reported Kappa values of 0.51 for the agreement between two radiologists assessing the repeatability of Wolfe measures [10]. These results are very similar to those observed in our data.
It is unsurprising that the crude Kappa values that we observed for the Wolfe assessments were higher than those achieved for the categorical values, since the value of Kappa is dependent on the number of categories in the scale. This is because with a larger number of categories to choose from, the likelihood of being assigned to any one category is smaller. The Wolfe scale uses four categories (N1, P1, P2, DY) whereas in the categorical assessment we used six (0%, 110%,1125%, 2550%, 5175%, >75%).
In summary, we have shown that the assessment of breast density using Wolfe patterns or categorical measures is similar when the measures are made from the original film and from the digital image. This evidence justifies the use of digitized mammograms in the visual assessment of breast density in research studies.
| Footnotes |
|---|
Received for publication October 18, 2002. Revision received March 19, 2003. Accepted for publication May 15, 2003.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
C. Byrne Invited Commentary: Assessing Breast Density Change--Lessons for Future Studies Am. J. Epidemiol., May 1, 2008; 167(9): 1037 - 1040. [Abstract] [Full Text] [PDF] |
||||
![]() |
M Jeffreys, R Warren, R Highnam, and G Davey Smith Initial experiences of using an automated volumetric measure of breast density: the standard mammogram form Br. J. Radiol., May 1, 2006; 79(941): 378 - 382. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| BJR | DMFR | IMAGING | ALL BIR JOURNALS |