| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
Full Paper |
1 Department of Radiation Physics, Malmö University Hospital, SE-205 02 Malmö, Sweden, 2 GSF-National Research Centre for Environment and Health, D-857 64 Neuherberg, Germany, 3 Department of Radiation Physics, Sahlgrenska University Hospital, SE-413 45 Göteborg, Sweden, 4 Department of Radiation Physics, Faculty of Health Sciences, Linköping University, SE-581 85 Linköping, Sweden, 5 Joint Department of Physics, The Royal Marsden NHS Trust, Fulham Road, London SW3 6JJ, UK, 6 Department of Diagnostic Radiology, Malmö University Hospital, SE-205 02 Malmö, Sweden and 7 Department of Diagnostic Radiology, Sahlgrenska University Hospital, SE-413 45 Göteborg, Sweden
| Abstract |
|---|
|
|
|---|
OD) of relevant anatomical details was calculated, using a Monte Carlo simulation-model of the complete imaging system including a 3D voxel phantom of a patient. Correlations between the calculated contrast and the radiologists' assessment by VGA were sought. The results of the radiologists' assessment show that the quality in selected regions of lumbar spine and chest images can be significantly improved by the use of films with a steeper H&D curve compared with the standard latitude film. Significant (p<0.05) correlations were found between the VGA results and the calculations of the contrast of transverse processes and trabecular details in the lumbar spine vertebrae, and with the contrast of blood vessels in the retrocardiac area of the chest. | Introduction |
|---|
|
|
|---|
For the evaluation of the relevant physical image quality parameters of medical screenfilm systems there are well-established methodologies, which are described in detailed standardization documents, e.g. ISO report 9236-1 for measurement of the characteristic curve [1]. The relevance of the corresponding image quality parameters for the evaluation of patient images, however, must be proved in clinical studies. For such purposes, common experiments involve the detection by observers of artificial objects like disks or wires which should simulate typical pathology and which are hidden in the patient image [2, 3]. Typically, such studies use an approach based on receiver operating characteristic (ROC) analysis [47]. Such methods are well established. The mathematical background of ROC has been thoroughly investigated and the method is well documented in the literature. One of the main disadvantages is that, in a very strict sense, the validity of results obtained by a detection task is limited to the type of object that had to be detected. Furthermore, the images that should be used for the ROC analysis must contain a known signal (pathological structure) and many images must be used for statistical reasons.
An alternative approach for the evaluation of the diagnostic quality of patient images is the application of a catalogue of normal anatomical details, which should be visualized in a patient's image. The Commission of the European Communities has put strong effort into developing such a catalogue of so called image criteria for different radiographic examinations [8]. We have used the image criteria to evaluate the image quality of lumbar spine and chest images produced under various imaging conditions (such as different X-ray tube voltage, type of scatter reduction) [911]. However, the formulation of the image criteria sometimes turned out to be imprecise [911]. Consequently, the discrimination between different radiographic techniques was relatively weak in some cases. Therefore, in our work the use of image criteria has been supplemented by visual grading analysis (VGA) [12, 13]: images produced by different radiographic techniques were visually compared with a set of reference images for the structures mentioned in the image criteria. The advantages of these methods compared with ROC analysis include that ordinary images can be used since it is the visibility of the normal anatomy that is evaluated, and that much fewer images are needed. These new methods have been tested against an ROC related method (the free-response forced error experiment, FFE [14]) and a strong indication that a correlation exists between the new methods and the FFE method shown [13, 15, 16].
To investigate the influence of a variation of particular parameters, such as the characteristic curve, on the diagnostic quality, image sets have to be produced, which are different with respect to that parameter. To reduce the influence of patient-specific variations, the different image sets should be produced with the same group of patients. However, for reasons of radiation protection, the number of radiographs taken of the same patient must be strictly limited. Therefore, the current paper follows a different approach by digitizing existing radiographs, manipulating the digital images, and finally printing them on to photographic film.
An effective method of investigating the influence of a change of a particular image quality parameter, like the image contrast, is to simulate the imaging process with a Monte Carlo model of the imaging system. Several different imaging techniques can be evaluated in terms of clinical image quality at a relatively low cost since no radiologists are needed for the evaluation. Validation of the model is a prerequisite for using the model results. It is therefore necessary to verify that the model predictions are the same as that of a group of radiologists.
The current paper reports the results of a study of the influence of different characteristic curves on the diagnostic quality of radiographs of the lumbar spine and the chest. Existing sets of radiographs have been digitized, the contrast of the images altered and the images are printed onto film. The clinical image quality of the resulting images has been evaluated by a group of expert radiologists by judgement of the fulfilment of the image criteria and by visual grading analysis. The different imaging situations have also been modelled in a Monte Carlo computer program and descriptors of physical image quality derived. The results from the model have been compared with the results of the clinical evaluations performed by the radiologists.
The aims of this study were:
| Material and methods |
|---|
|
|
|---|
|
The lumbar spine images were digitized by means of a CCD flatbed scanner (Vexcel "Ultrascan 5000"; Vexcel Imaging GmbH, Vienna, Austria), and the chest radiographs by means of a drum-scanner (Linotype-Hell "TANGO"; Heidelberger Druckmaschinen, Heidelberg, Germany). The reason for using two different scanners was practical. The flatbed scanner, which was already available when the study was initiated, was not sufficient for scanning the larger chest radiographs at a high enough spatial resolution. Therefore the drum-scanner was obtained and used for digitizing the chest radiographs. All images used for the manipulations described later have a nominal spatial resolution of 40 µm and a dynamic resolution of 16 bits. The scanners were calibrated with respect to optical density by film step wedges produced by X-ray sensitometry of the screenfilm systems as used for the original radiographs.
For image display, a medical laser imager (AGFA "LR 5200"; Agfa Gevaert, Munich, Germany) was employed. The nominal spatial resolution of the laser imager is 40 µm, and its dynamic resolution is 8 bits. The maximum optical density as well as the shape of the calibration curve can be adjusted separately. The same calibration curve was used for printing all images. This curve was a good compromise between high-density range and high-density resolution, covering a density range up to about density 3.3 ODU with a resolution of more than 200 grey levels up to density 2.1. This calibration curve allowed coverage of the density range of the screenfilm systems used for the original lumbar spine and chest radiographs with a density resolution better than that of the human eye, especially in the diagnostically important range [8].
Simulation of different characteristic curves
The measurements of the H&D curves (characteristic curves) of the screenfilm systems used for the original radiographs were performed according to ISO 9236 [1]. The screenfilm systems used for lumbar spine (Kodak TMAT L/RA film and Kodak Regular Plus screen, sensitivity class 400; Eastman Kodak, Rochester, NY) were measured at an X-ray beam quality of 70 kV, the systems used for chest at 120 kV (Kodak TMAT L/RA film and Kodak Lanex 160 and 320, respectively). Additionally, the Regular plus system in combination with Kodak G-film was measured at 70 kV. For a moderate change of tube voltage, i.e. from 70 kV to 90 kV, or a change of screen (same family of screen but different speed class), no significant change of the shape of the H&D curves was measured for the lumbar spine systems. It was assumed that this result also holds for the systems used for chest radiography.
Based on these measurements, sets of H&D curves were simulated, which not only cover the range of characteristics of commercially available films but go beyond that range: the film used for the original radiographs, Kodak T-Mat L, is a typical latitude film (L-film) and common especially in chest radiography. Four systems with steeper film characteristics than L were simulated: "G" which is common for skeletal radiography, "M" which is between L and G, "UG" which is similar to the characteristics of a mammography film, and "UGP" which is even steeper than a mammography film. Additionally, three systems with flatter film-characteristics than L were simulated, labelled "IL", "IL2" and "A". The H&D curves for the lumbar spine and the chest systems are shown in Figure 1
and Figure 2
, respectively. The normalization of the simulated H&D curves has been chosen so that the curves all pass through the same point for either the lumbar spine or the chest film types, viz., the point where the two measured H&D curves (for the L and G films) intersect.
|
|
The application of different H&D curves does not only affect image contrast but also the density level of an image: only image regions with an optical density close to the crossing point of the H&D curves e.g. for lumbar spine this is at an optical density of about 0.8 will keep their optical density. Brighter or darker regions will be shifted to different density levels compared with the original image. Since the average optical density of the lumbar spine radiographs was about 1.25, the manipulated images would suffer strong density shifts especially if high contrast film characteristics are applied. In practice, the automatic exposure control would prevent films being too dark. Therefore, it was necessary to perform a density correction along the H&D curve, so that the manipulated images should have an average density close to that of the original radiograph. This density correction corresponds to a certain shift with respect to relative dose to the detector (and to the patient), which is described in Table 2
. Such a change in dose and film would also have influenced the noise level of the images if it had been done with a real film. In this study, however, only changes in contrast are studied. No changes in noise level have been simulated.
|
The final film images
Based on the group of 32 lumbar spine radiographs, 224 film images were produced, showing the effect of seven different H&D curves IL2, IL, L, M, G, UG and UGP on the same set of patient images. For the chest images, it was decided to concentrate on flat film characteristics A, IL2, IL and only one steeper film characteristic, G. Based on 60 original chest radiographs, 300 images with five different characteristic curves were produced.
Image evaluation
The European Quality Criteria define diagnostic requirements for normal, basic radiographs specifying anatomical image criteria and important image details [8]. They indicate criteria for the radiation dose to the patient, and they give examples of good radiographic techniques which fulfil both diagnostic and dose requirements. Based on this catalogue of image criteria, a set of anatomical structures for evaluating the images in the current study was selected (Table 3
). Experiences from an earlier study were taken into account [911].
|
Discussions with the radiologists prior to this trial indicated that the observers tended to view different parts of the images resulting in relatively large interobserver variations. Therefore, all images were individually masked, showing only the areas and details to be observed. The masking forced the observers to view exactly the same areas of each image. In the lumbar spine images, a region around L3 was observed (Figure 3a
). For the chest images, there were six areas to be observed (Figure 3b
), and the anatomical structures demonstrated in each of these areas are given in Table 4
.
|
|
Visual grading analysis
Visual grading analysis (VGA) is a method for evaluation of image quality, by visual comparison of one image or part of an image with a reference image [12, 13]. In this study we used the structures mentioned in a revised version of the European Quality Criteria [8] for the VGA (Table 3
). The original images (L-film characteristics) were used as reference images, and the processed images were always compared with an image of the same patient. The visibility of a structure was graded on a five-level scale: clearly inferior to (-2), slightly inferior to (-1), equal to (0), slightly better than (+1) and clearly better than (+2) the structure in the reference image. A visual grading analysis score (VGAS) was determined for each radiographic technique. The VGAS is the ratio of the total grading given by all observers for all criteria and all images corresponding to the same H&D curve divided by the total number of observations:
|
|
Gi,s,o=Grading (-2, -1, 0, +1 or +2) for image i, structure s and observer o.
I=Number of images
S=Number of structures (Table 3
)
O=Number of observers
VGAS for an individual structure may be obtained by omitting the sum over S and putting S=1 in the denominator.
The chest study included two digital copies of the original images; one served as the reference image and the other was included in the study. The purpose of this was to test the VGA methodology for systematic errors, and to test the constancy of the printing process.
The simplicity and the strong discriminating power of the VGA method makes it a good method for separating different image production techniques, e.g. different X-ray units, in the clinic, but the drawback is that the resulting score is relative to that of the reference image [12, 13]. If the reference image is not the same it is difficult to use the VGAS to compare two different techniques. This is the case in this study, i.e. different patients were imaged at different radiographic techniques, and therefore the following complementary method was also used.
Image criteria score
A revised version of the image criteria of the European Quality Criteria [8] was used for a test of fulfilment of criteria. A suggestion of a revision of the image criteria was proposed in accordance with the results of our previous studies [911]. The image criteria used are listed in Table 3
. For each criterion, the observers had to decide whether a certain criterion was fulfilled in an image or not (yes/no). A decision of "Yes, the criterion is fulfilled", resulted in a score of 1, and a decision of "No, the criterion is not fulfilled" resulted in a score of 0. The image criteria score (ICS) is defined analogously to VGAS, as a fraction of fulfilled criteria, summing up the scores of all observers for all criteria and all images corresponding to the same H&D curve.
|
|
Fi,c,o=Fulfilment of criterion c for image i and observer o. Fi,c,o=1 if criterion c is fulfilled, otherwise Fi,c,o=0
I=Number of images
C=Number of criteria
O=Number of observers
ICS for an individual criterion may be obtained in the same manner as VGAS for an individual structure. The strength of this method is that the resulting scores are absolute so that images of different techniques can be compared even though the imaged object, i.e. the patient, is not the same. A particularly interesting question in this study was whether the use of a film with a steeper characteristic curve can compensate the poorer radiation contrast of the 90 kV technique for lumbar spine.
Intraobserver variation
To evaluate the intraobserver variation the observers read a number of images twice. The fraction of changed answers for both visual grading and fulfilment of criteria between the first and the second reading, e.g. a change from "Yes criterion 1 is fulfilled" to "No criterion 1 is not fulfilled", was used as a measure of intraobserver variation. At the end of the reading session each observer re-read 14 lumbar spine images and 20 chest images (the first batch of the reading session and one batch that was read halfway through the reading session. One batch of images consisted of 7 lumbar spine and 10 chest images).
Model simulations
A Monte Carlo model of the complete imaging system was used to calculate physical image quality descriptors. The model includes an anthropomorphic three-dimensional, segmented male anatomy (voxel phantom) to simulate the patient. Estimates of the energy imparted per unit area to the image receptor at points in the image plane were used to compute the optical density on the film by using the H&D curve. The model takes specific account of the X-ray spectrum (anode material and angle, peak tube voltage and ripple, and added filtration), anti-scatter grid (strip frequency, lead strip width, grid ratio and material in interspaces and covers) or air gap, couch-top or chest stand and image receptor (cassette front, screenfilm system, and H&D curve). A detailed description is found in [19, 20]. Appropriate anatomical details have been added to this phantom so that realistic estimates of the contrast and signalnoise ratio (SNR) of the details can be made. Here, anatomical details relevant to the actual study were selected (Table 5
). The contrast for each detail was calculated as the difference in optical density (
OD) due to the presence of the detail. The following radiographic techniques were simulated for lumbar spine: 70 kV, 400 screen and IL2, L, M or UGP (four techniques); and for chest: 102 kV or 141 kV, 160 screen and G, L, IL, IL2 or A (10 techniques). Other system parameters (filter, grid, etc.) are taken from the systems used in the original exposure. Correlations between the calculated
OD of the details and VGAS for all structures (Table 3
) and for VGAS of criterion 5 (lumbar spine) and criterion 6 (chest) were tested for significance. The individual criteria were chosen because the anatomical details used in the model calculations are explicitly mentioned in these criteria. It is noted that no changes of the noise level were simulated in this work and consequently, no calculations of SNR of the details were performed for the simulated films. The study is limited to the effects of changing the film contrast at a constant noise level.
|
Correlations between the clinical image quality as evaluated by the radiologists and the calculated physical image quality descriptors were tested for significance with the Pearson product-moment test. A p-value of less than or equal to 5% was considered to indicate a significant correlation.
| Results |
|---|
|
|
|---|
|
|
|
|
Correlation between physical and clinical measures of image quality
For lumbar spine, significant correlations were found between the
OD of the transverse processes (L1T, L3T and L5T) and of the trabecular structures (L1D, L3D and L5D) on one hand and VGAS for structure 5 (Table 6
), which specifically mentions the transverse processes. No significant correlations with ICS were found.
|
OD of the blood vessels in the retrocardiac area and the VGAS for structure 6 (Table 6
OD L3T and VGAS for structure 5 and for chest between
OD RCA and VGAS for structure 6.
|
| Discussion |
|---|
|
|
|---|
Even though there is a lack of statistical power for lumbar spine, the two image quality descriptors used in this study, ICS and VGAS, appear to reach a plateau at average gradients above about 2.0 and 2.5, respectively. Beyond this plateau the ICS and VGAS values are expected to decrease when the contrast will be too high for diagnostic purposes, i.e. the image will tend to be too much "black and white". The plateau is reached at a lower average gradient with the ICS method than with VGAS. A probable explanation for this finding is the inherent properties of the two image quality evaluation methods. ICS reaches a saturation level, when all or almost all criteria are fulfilled, and the score cannot increase more than this level. As the average gradient increases, a certain point is reached when the ICS starts to decrease. The image quality is not good enough and fewer criteria than before are fulfilled. In the VGA method, however, the quality of one image is compared with the quality of a reference image. The reference images were produced with an L-film (mean average gradient 2.18). Thus the start and end points of plateaus detected with the VGA methodology will be relative to the average gradient of the reference images. If the reference images had had a different average gradient then the plateau would probably have had different start and end points. For the chest images, only one H&D curve with a higher average gradient than the reference H&D curve was produced. Therefore no plateau could be detected, but the same trend is expected to be found also for chest if the average gradient is increased sufficiently.
The use of a steeper film such as the G film in lumbar spine and chest radiography offers the possibility of decreasing the dose at a constant clinical image quality level compared with the standard L film (assuming that the increase of noise due to the dose reductions is so small that it could be ignored [18]). This would require the proper adjustment of the automatic exposure control of the X-ray system. Under such conditions dose savings could be achieved for lumbar spine by about 10% compared with the standard L-film at 70 kV. Conclusions about possible dose savings using the other (simulated) steep gradient films is not possible since the normalization of their H&D curves is hypothetical. Provided, however, that they cross the H&D curve of the L film as assumed in Figure 1
, a 20% dose saving would be possible with the UGP film.
VGAS of the copies of the reference images (i.e. the L-films) could not be separated from the reference (Figure 6
, VGAS=0) as was expected. Therefore we can conclude that the quality of the film printing was constant, and that there was no systematic error in the visual grading analysis study. If VGAS of the copies had not equalled zero, resulting in a non-fixed grading system, this would have introduced a source of uncertainty that would have been very hard to estimate.
According to the results for ICS, Figure 5
, lumbar spine images taken at 90 kV are significantly worse than those taken at 70 kV independent of the film gradient. This can be interpreted in the following way. By switching from 70 kV to 90 kV, the contrast in the radiation field leaving the patient is decreased to such a degree that it cannot be restored by using a steeper film. This interpretation is supported by Monte Carlo calculations simulating the corresponding exposure parameters [21]. The radiation contrast of the studied anatomical details is reduced by 30% when the tube voltage increases from 70 kV to 90 kV according to the Monte Carlo model calculations [22]. The model calculations also show a reduction of the SNR of these details (by 3040%) at 90 kV compared with 70 kV in the original images which may additionally contribute to the lower ICS at 90 kV. This result is not in accord with the kV interval recommended by the European Guidelines (7590 kV) [8]. A lower kV will increase the radiation contrast, but it will also increase the entrance surface dose, which could, but will not necessarily lead to an increased effective dose to the patient. The optimum kV for a particular examination is not only dependent on the composition of the anatomical region to be imaged, but also on the type of detector used. This study suggests that for lumbar spine radiography with screenfilm systems this kV may be lower than that indicated in the European Guidelines [8].
A comparison between the results of the visual grading analysis and the image criteria score method for lumbar spine (Figure 4
and Figure 5
) and for chest (Figure 6
and Figure 7
) shows the stronger discriminative power of VGA compared with ICS. For lumbar spine the different film types could not be separated with ICS whereas with VGA the higher gradient techniques were significantly better than the reference image (L-film) and the lower gradient techniques were significantly worse than the reference image. Thus ICS has less discriminatory power than VGA for evaluation of the effect of the shape of the characteristic curve on the clinical image quality of lumbar spine radiographs. In the chest case the results from the VGA showed that all film types were significantly different from each other whereas the results from the ICS method showed that some of the techniques could not be separated from each other.
The study of the intraobserver variance showed that, on average about one out of four readings, the score given by the radiologists was changed when an image was re-read. This corresponds well with our experiences from previous studies performed under similar conditions [911].
The significant correlation found between VGAS structure 5 and the model calculations of the contrast of the L1, L3 and L5 processes and trabecular structures in lumbar spine anteroposterior (AP) examination is encouraging and shows that the model can predict changes in clinical image quality. It is noted, however, that the radiologists' response (VGAS) saturates for films with the highest average gradients (Figure 4
) which may be important as only linear correlations are sought here. The absence of correlation with ICS for lumbar spine can be explained by the fact that none of the tested films show any significant differences in terms of ICS (Figure 5
). Contrary to the lumbar spine AP, the chest posteroanterior (PA) examination shows significant correlation between both VGAS and ICS on one hand and the calculated contrast of anatomical details on the other. The strongest correlation is found for anatomical details situated at a low optical density (OD<1.0) such as vessels behind the heart and the calcification in the right lung apex [23]. We believe that the detection of the small calcifications and trabecular structures will also depend on the noise and not only on the contrast, but this was not considered here, as the noise was not altered in the experiment.
| Conclusions |
|---|
|
|
|---|
A statistically significant correlation exists between some of the physical image quality measures calculated by the Monte Carlo model and clinical image quality assessed by the radiologists. In lumbar spine AP radiography, significant correlations were found between calculations of the contrast of transverse processes and trabecular structures and experimentally determined VGAS. For PA chest radiography, the most significant correlation to VGAS was the contrast of blood vessels in the retrocardiac area. Hence the influence of the H&D curve can be predicted provided the imaging system is carefully modelled and relevant measures of physical image quality are used.
| Acknowledgments |
|---|
| Footnotes |
|---|
Received for publication September 30, 2002. Revision received July 21, 2003. Accepted for publication September 3, 2003.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
M. J. Tapiovaara Review of relationships between physical measurements and user evaluation of image quality Radiat Prot Dosimetry, March 1, 2008; 129(1-3): 244 - 248. [Abstract] [Full Text] [PDF] |
||||
![]() |
M Bath and L G Mansson Visual grading characteristics (VGC) analysis: a non-parametric rank-invariant statistical method for image quality evaluation Br. J. Radiol., March 1, 2007; 80(951): 169 - 176. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Borjesson, M. Hakansson, M. Bath, S. Kheddache, S. Svensson, A. Tingberg, A. Grahn, M. Ruschin, B. Hemdal, S. Mattsson, et al. A software tool for increased efficiency in observer performance studies in radiology Radiat Prot Dosimetry, May 17, 2005; 114(1-3): 45 - 52. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Tingberg, M. Bath, M. Hakansson, J. Medin, J. Besjakov, M. Sandborg, G. Alm-Carlsson, S. Mattsson, and L. G. Mansson Evaluation of image quality of lumbar spine images: a comparison between FFE and VGA Radiat Prot Dosimetry, May 17, 2005; 114(1-3): 53 - 61. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Tingberg, F. Eriksson, J. Medin, J. Besjakov, M. Bath, M. Hakansson, M. Sandborg, A. Almen, B. Lanhede, G. Alm-Carlsson, et al. Inter-observer variation in masked and unmasked images for quality evaluation of clinical radiographs Radiat Prot Dosimetry, May 17, 2005; 114(1-3): 62 - 68. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Tingberg and D. Sjostrom Optimisation of image plate radiography with respect to tube voltage Radiat Prot Dosimetry, May 17, 2005; 114(1-3): 286 - 293. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. M. Moores, S. Mattsson, L. G. Mansson, W. Panzer, D. Regulla, D. Dance, G. Alm Carlsson, F. R. Verdun, E. Buhr, and C. Hoeschen RADIUS--closing the circle on the assessment of imaging performance Radiat Prot Dosimetry, May 17, 2005; 114(1-3): 450 - 457. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| BJR | DMFR | IMAGING | ALL BIR JOURNALS |