| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
Full Paper |
1 Servicio de Radiodiagnóstico, Hospital de El Escorial, 2 Departamento de Radiología, Facultad de Medicina, Universidad Complutense de Madrid, 3 Servicio de Radiodiagnóstico, Hospital Clínico San Carlos, 4 Departamento de Estadística e I. O., Universidad Complutense de Madrid, Madrid, Spain, 5 Servicio de Radiodiagnóstico, Hospital Doce de Octubre, 6 Servicio de Radiodiagnóstico, Hospital del Aire, 7 Servicio de Radiodiagnóstico, Hospital de Móstoles and 8 Servicio de Radiodiagnóstico, Hospital Universitario La Paz
| Abstract |
|---|
|
|
|---|
| Introduction |
|---|
|
|
|---|
The International Commission on Radiological Protection (ICRP) has recommended that patient doses are kept as low as reasonably achievable consistent with clinical requirements with particular attention to CT doses [24]. These recommendations have also been implemented in European legislation [57]. Among other initiatives, a group of experts from the European Commission (EC) has developed and provided guidelines on quality criteria for CT, establishing an operational framework in which technical parameters are considered in relation to image quality and patient dose [8].
The aim of our study was to assess the quality of chest CT examinations for lung carcinoma indication according to the criteria proposed in the European Guidelines for Chest (General), and to investigate their usefulness in the optimization of this practice. This work is part of a more general study of the implementation of quality criteria for helical CT examinations in four frequent clinical indications.
| Material and methods |
|---|
|
|
|---|
|
Image criteria assessment
The image quality and the radiation dose criteria proposed for general chest examinations in the EC Document are listed in Table 2
. Image quality criteria comprise two types: "visualization criteria" by which the organs and structures should be detectable in the volume of investigation; and "critical reproduction criteria" by which details of the anatomical structures considered should be visible and clearly defined.
|
The observers scored the fulfilment of each individual criterion by assigning "Yes" or "No" when it was judged to be met or not met, or "Not Applicable" (NA) when the pathology or the absence of an anatomical structure through surgery prevented the evaluation. Such options were coded by assigning 1 (one) to "Yes" and 0 (zero) to "No". "NA" answers were computed as void values. When an examination consisted of more than one scan sequence, i.e. with and without intravenous contrast agent, each of the sequences was separately evaluated and the highest score used for the whole examination. Prior to the evaluation itself, a pilot common reading session was held with the radiologists involved to attain a consensus about criteria fulfilment and to gain experience.
Selection of observers
Once the second review of the images was completed, an analysis of the intraobserver consistency was performed using the McNemar chi-square test [9]. To apply it for each observer, the scores "1" or "0" from all visualization and critical reproduction criteria of the whole sample of examinations were pooled and classified to produce a 2 x 2 table, with the entries of the table being the number of concordant or discordant pairs according to the first and the second readings. Consequently, the total number of pairs in each observer's table was up to 1600 (i.e. 16 criteria times 100 examinations). This analysis enabled us to select the observers with no significant differences between the two sets of readings, image scores only being selected for observers with a p-value greater than 0.05. We calculated the "fulfilment of each individual image criterion" for every examination as the average of the two scores provided by the observers selected, assigning equal weight to each observer scoring. We also calculated an "image quality score" for every examination as the equally weighted average fulfilment of all the criteria involved.
To study the interobserver variation within sites and in the whole sample, six Friedman two-way analyses of variance for image quality score were applied, since we were employing a randomized complete block design where each CT examination behaves as a block [9]. When a p-value less than 0.05 was obtained for a site, the pattern causing the statistical significance was analysed. The latter comprised possible in-house observer/centre associations and both over- and under-scoring as well. Depending on the conclusions from such analysis the scores from any observer showing statistical significance could be disregarded. The values of individual criteria fulfilment and image quality score variables were then reassessed with the remaining readings, and a statistical analysis per site of both variables was undertaken. To find significant differences between sites in relation to these variables, the Kruskal-Wallis test was applied for a fixed p-value of 0.05 [9].
Dose assessment
During the collection of the examinations comprising the sample, dose measurements were performed at each site to obtain data to estimate the values of the CT specific dosimetric quantities [8]. Both free-in-air and standard body phantom dose measurements were carried out with a CT ionization chamber (length 10 cm, model 6000-100; Victoreen, Cleveland, OH) and an electrometer (model 4000M+, Victoreen) with a calibration traceable to the Spanish National Standard Laboratory [10]. The values of the dose quantities weighted CT dose index (CTDIw) and doselength product (DLP) were derived from the measurements, in accordance with the EC Document. CTDIw was calculated as the weighted average of the central and peripheral CTDI100 (i.e. 1/3 CTDI100,c+2/3 CTDI100,p), where the suffix 100 refers to the integration distance in mm. The DLP values for each examination were calculated from the CTDIw and scan parameters according to the expression:
|
|
In addition to these quantities, estimates of the effective dose (E) for individual patients as proposed by ICRP were made based on the use of organ dose conversion coefficients [2]. To assess the mean equivalent dose to organs for each patient, we used the values of scan parameters, the CTDIair expressed in muscle dose, the NRPB-SR250 scanner specific Monte Carlo conversion coefficients, or those recommended by the ImPaCT, and an Excel spreadsheet [11, 12]. To search for statistical differences among sites in the values of these dose quantities, the data of DLP and E were log transformed to overcome heterogeneity of variance and then analysed by an one-way analysis of variance (ANOVA) followed by a post hoc analysis using the method of Tukey with a fixed p-value of 0.05 [9]. The non-parametric Spearman coefficient (rs) between image quality score and DLP was calculated to estimate the correlation image quality/radiation dose [9].
The statistical analysis was achieved by means of a software package for PCs (Systat 10.0; SPSS Inc., Chicago, IL).
| Results |
|---|
|
|
|---|
Typical values of the scan parameters used at the participant sites are shown in Table 1
. X-ray tube voltage was 120 kV in most cases, except for 10 examinations at site 5, where 140 kV were applied. Concerning the tube current (I) actually used, low variations were found at four sites. The wide range of values (130300 mA) at site 3 is explained by the use of a tube current modulation system with 14 patients, and the use of a high tube current close to 300 mA with the rest of the patients. Tube rotation times were from 0.75 s to 1 s, and pitch values were in the range of 1 to 1.5.
Selection of observers
The results from the application of the McNemar test are shown in Table 3
, and those from the Friedman test in Table 4
. The cardinal number assigned to every observer was the same as that assigned to his own site.
|
|
Image criteria
Concerning image criteria compliance, of a total of 16 criteria proposed in the Guidelines, 10 were met by practically all the examinations in every centre. Table 5
shows the mean percentages of fulfilment per centre for those criteria partially met. The average percentages of compliance with each criterion in the whole sample have also been included in the last row.
|
With regard to the critical reproduction criteria fulfilment, criterion 1.2.6, "visually sharp reproduction of the oesophagus" was accomplished in the range 8395% per site without significant differences among the centres. The range of fulfilment for criterion 1.2.8, "visually sharp reproduction of large and medium sized pulmonary vessels", was between 92% and 100% with a homogeneous group composed of sites 1 to 4. Both criteria 1.2.9 "visually sharp reproduction of segmental bronchi" and 1.2.10 "visually sharp reproduction of the lung parenchyma" had an average fulfilment close to 85%, with ranges of 73100% and 7199%, respectively. The differences between sites in the fulfilment of criterion 1.2.9 were significant between sites 1, 3 and the rest. For criterion 1.2.10, significant differences of mean fulfilment were found between a higher fulfilment group composed of sites 1, 2, 3 and the rest.
Mean image quality scores for the different centres are indicated on the first row of Table 6
. The range was 9398%, with statistically significant differences between two groups: one composed of sites 1 and 3 and the other including sites 2, 4 and 5. The distribution of individual values at each site is displayed in Figure 1
.
|
|
|
Correlation between image quality and dose
There was no overall correlation between image quality and dose, since the value of the Spearman coefficient for the whole sample was rs=0.15. The values of the correlation coefficient between image quality score and DLP for each site are shown in the last row of Table 6
. A weak direct correlation (rs=0.45) was obtained at site 5, where both the lowest mean image quality score and the highest variation in DLP values were also found.
| Discussion |
|---|
|
|
|---|
Image criteria
In general, the examinations comprising the sample largely fulfilled the image criteria pursued. Only two visualization and four critical reproduction criteria were partially fulfilled.
Visualization criterion 1.1.1, "visualization of the entire thoracic wall" showed the lowest fulfilment score, i.e. 74% on average over the whole sample. To be met, this criterion requires a FOV fitted to the perimeter of the thoracic wall including the skin. In our study, two causes account for the failure to fulfil this criterion: either the FOV was too small or the patient was not properly centred, i.e. the patient's position in relation to the isocentre was either lateral, inferior or both, thus preventing the visualization of some thoracic wall regions. Contrary to what could be inferred from our results, it should not be a difficult task to centre the patient and to select a suitable value for FOV. Nevertheless, if the FOV selected is too large, the criterion can still be fulfilled, but at the expense of decreasing the spatial resolution, which is undesirable.
The other visualization criterion with partial compliance was 1.1.4 "visualization of the entire lung parenchyma". For its fulfilment, the scanned volume must cover from the apices to the bases of the lungs, both ends included. Indeed, a lung vertex lying out of the upper limit of the examination scope was the cause of such lack of fulfilment in our case. Lung bases were systematically included since for CT examinations for lung carcinoma indication it is common practice to scan both liver and adrenals.
With regard to the rest of the visualization criteria, whenever criterion 1.1.4 is met, two other criteria, 1.1.2 and 1.1.3, "visualization of the entire thoracic aorta and vena cava" and "visualization of the entire heart", respectively, are also fulfilled because they are included within the scanned volume. Similarly, whenever criterion 1.1.2 is fulfilled, criterion 1.1.3 should also be fulfilled because the caudal length of the entire thoracic aorta is larger than that of the heart, so criteria 1.1.2 and 1.1.3 are probably redundant.
In general, a high fulfilment of the visualization criteria makes it difficult to draw practical conclusions. Since visualization criteria are more closely related to a careful performance than to the image quality per se, it should be possible to fulfil these criteria in practically 100% of the cases by following a correct examination procedure.
Regarding critical reproduction criteria, we analysed those featuring partial fulfilment. Criterion 1.2.6 "visually sharp reproduction of the oesophagus" features an average fulfilment of 89% (with a range between 83% and 95% per centre). This criterion is usually difficult to evaluate because the oesophagus is a long structure lying along the z-axis and has little or no fat around it in some patients. In our study, the observers agreed that this criterion was met when the oesophagus was clearly visualized in at least one image for each of the three oesophagus regions.
Criterion 1.2.8 "visually sharp reproduction of large and medium-sized pulmonary vessels" had a high rate of fulfilment. When analysing the examination sequences of the 12 individual patients performed without a contrast medium, the average scores for this criterion were systematically below 50%; conversely, when 6 of these patients underwent a second examination sequence using intravenous contrast, the criterion was fulfilled. Our results indicate that a complete correlation occurs between the fulfilment of this criterion and the use of intravenous iodinated contrast. To lower the patient radiation dose, it might be advisable to perform only one sequence, with contrast. The fulfilment of this criterion may also depend on the injected volume rate and on the accuracy of timing, although their influence was not analysed in our study.
Criterion 1.2.9, "visually sharp reproduction of segmental bronchi", was met on an average of approximately 75% at three sites. These values were found to be related to the use of wide collimation and reconstruction intervals at sites 2 and 4 and at site 5, to a reduction in the axial resolution caused by storing images on a 256 x 256 pixel matrix. The use of a 7 mm collimation, which gives an improvement in the z-axis spatial resolution and a lowering of partial volume effects, accounts for the higher compliance with this criterion at sites 1 and 3. Thus, in relation to sharp visualization of segmental bronchi, a suitable choice seems to be scanning with a collimation narrower than 10 mm and a pitch greater than 1, as some studies have suggested [13, 14]. This also allows the patient dose to be reduced. No relationship between the use of a high resolution reconstruction algorithm and the fulfilment of this criterion could be derived from our results.
Criterion 1.2.10 refers to "visually sharp reproduction of the lung parenchyma", and presented practically complete compliance at sites 1 to 3, with lower mean values (7182%) at sites 4 and 5. Among the reasons for such differences is the use of "standard" reconstruction algorithms at both sites, and in particular at site 5, the recording of the images on a 256 x 256 pixel matrix. These practices may have caused a loss of axial spatial resolution, which may have hindered compliance with this criterion.
With reference to the image quality score, all the examinations in the sample reached at least 84% (see Figure 1
), which is an indication of the large extent to which the image quality criteria were met. However, two remarkable facts can be noted: first, not a single examination was considered to have full compliance at site 4 (98% maximum value), and second, all the examinations at site 3 had an image quality score above 95%. The variation observed in image quality at sites 2, 4 and 5 indicates that there is still some margin for the optimization of the practice by correcting the causes that hindered criteria fulfilment. At site 1 this optimization process is exclusively related to improving compliance with criterion 1.1.1, which can be tackled by carefully positioning the patient and selecting the appropriate FOV. The overall figures of compliance from our study were similar to those reported in a pilot study from the Nordic Countries [15].
Another aspect to consider is that, despite the prior training session, observers found some EC image quality criteria were in practice hard to interpret. On the other hand, although the criteria were defined for examinations of anatomic areas or specific organs, and for certain clinical indications, a lower level of objective image quality could be acceptable if associated with the necessary degree of diagnostic quality and a lower patient dose. To accomplish the ultimate goal in our particular case, i.e. providing diagnosis for lung carcinoma indication, the criteria demanded could be different for staging and for the follow-up of patients after treatment.
Evaluation process
The data resulting from the application of the McNemar test served the purpose of exposing the radiologists who showed less consistency when appraising CT examinations against the Guidelines. As a consequence of these results, the readings from two observers were removed because keeping them could raise doubts about the objectivity of their scores.
From the Friedman test results (see Table 4
), the global pattern of behaviour for each observer can be summarized in broad terms as follows: the scores from observer 5 were usually prone to overevaluation, whereas those from observers 1 and 3 were homogeneous, even the ones allotted at their own sites. An analysis per site did not show significant differences among observers for subsamples at sites 1, 3 and 4, while it did at sites 2 and 5 due to the overevaluation from observer 5, whose readings were disregarded. The homogeneity shown by the readings from the remaining observers (1 and 3) was considered to be a guarantee of a suitable degree of objectivity in the results.
The observers are in general more used to reviewing images produced at their own departments, which can bias their interpretation of the performance of in-house equipment to the detriment of outside facilities. This trend had already appeared in a preliminary analysis exclusively based on the first readings by all five observers [16]. We tried to prevent this tendency by means of the application of the Friedman test. The approach followed throughout the evaluation process has allowed for the selection of scores in an objective manner by providing a filtering method for observer performance and techniques to improve the observers' skill in appraising images against the quality criteria considered. The results highlight the role of external or independent observers in the design of audit processes.
After looking through our results and only with an exploratory aim in mind, one of the participating radiologists performed one more image quality review on a graded scale basis as proposed by other authors, claiming that this approach could serve to better characterize the level of diagnostic performance [15]. The graded five-level evaluation consisted of a single reading by observer 3. The compliance results were below the values obtained for the previous binary assessment for those criteria showing low or very high levels of fulfilment, and the mean image quality scores decreased at the different sites by between 4% and 6% almost uniformly. It was difficult to derive new essential information from the graded reading, although some partial results complementary to the binary ones could be deduced.
Radiation dose
The CTDIw mean values were quite similar at all centres and below the reference value (30 mGy) proposed in the Guidelines. The highest value, 19 mGy, was found at site 2, which had a scanner equipped with gas detectors. It is well known that these are less efficient than the solid state detectors used in the rest of the scanners involved in the study [17]. This fact justified the use of a more intense beam, which was reflected in a relative increase of CTDIw. At site 5, the CTDIw range was broader than those found in the rest of the sites owing to variation in scan parameters, and in particular to the alternative use of 140 kVp or 120 kVp for different sequences or examinations.
All the mean DLP values obtained were below the reference level (650 mGy cm), although the value for site 2 was close to it (577 mGy cm). This figure was more than twice the minimum obtained for site 1 (263 mGy cm), owing to combination of dose increasing factors, mainly gas detectors and pitch 1. Conversely, the mean value deduced for site 1 along with the small variation for individual patients seems to be closer to the "optimized" dose values for these examinations as reported in other studies [18].
Other noteworthy moderately high DLP values were obtained at site 5, where the mean was 471 mGy cm, but they showed a high degree of dispersion around the mean (see Figure 2
). Since two sequences of the same anatomical region were performed on six patients at this site, the average scanned length was the longest (36 cm). In addition, the variations in the scan parameters used, especially in tube voltage without compensation for the output increase caused by changing from 120 kVp to 140 kVp, also produced an increase of patient dose for some examinations. As a result, the third quartile value of DLP for site 5 (600 mGy cm) was higher than that for site 2. The mean values of DLP for sites 3 and 4 were similar and intermediate, but they showed a higher dispersion for individual patients at site 3. This was caused by the use of a tube current modulation system. However, since the highest DLP values at this site were obtained for examinations performed without using such a system, there is still some margin left for dose optimization.
Effective dose is closely correlated to DLP in the whole sample (r=0.99). The normalized effective dose EDLP, defined as the ratio E/DLP [8], ranged between 14 µSv mGy1 cm1 and 17 µSv mGy1 cm1 at different sites. After a least squares regression assuming a zero intercept, the estimated EDLP was 16 µSv mGy1 cm1 (r: 0.998), which is closely below the value (17 µSv mGy1 cm1) proposed in the Guidelines for the chest region. The good correlation between E and DLP permits converting the values of both quantities directly and using them for draft estimates of risk so long as the anatomical region remains unchanged.
Concerning organ doses, apart from the lungs and the thymus, which were directly exposed in all cases, the thyroid was directly exposed in most examinations comprising the sample. In these cases, depending on patient position, this gland was totally or partially inside the scanned volume. The mean estimates of thyroid doses per site ranged in a proportion similar to the DLP ones. The distribution of intracentre values for individual patients had an almost constant pattern: low values of a few mSv when this gland was left outside the scanned volume, and high values (1540 mSv) when it was inside. Such variations were due to individual anatomical or position differences. As the thyroid weighting factor to assess effective dose is 0.05, the range of thyroid dose observed in some departments (435 mSv at site 2) implies effective dose increases of up to 25%. If the follow-up of pathology involves periodic performance of chest CT examinations, then starting the volume of investigation just below the apex, whenever possible, could avoid the direct exposure of the thyroid.
Image quality and dose
A weak correlation or no correlation at all occurred between image quality and DLP. At sites 1, 2 and 4, where the examinations were performed using a protocol almost systematically, no correlation was found (see Figure 1
and Table 6
). The values of the correlation coefficient at sites featuring greater variability in DLP (3 and 5) indicate a weak but direct image quality/dose correlation. In the case of site 3, as there was a narrow dispersion in image quality, the optimization process should be exclusively related to diminishing the patient dose. At site 5, where a large variation both in image quality and dose was found, the analysis of the current practice should serve to establish a standard protocol.
The weak or lack of correlation between image quality and dose found in the different centres suggests that, in addition to the parameters directly related to the dose, the final image quality is influenced by other parameters. The use of an inadequate FOV, an incorrect centring of the patient, a small matrix, or an unsuitable reconstruction algorithm can bear on image quality as much as the pertinent choice of tube voltage, current, beam collimation or pitch settings. This complexity of factors bearing on image quality in CT has already been reported elsewhere [1822].
| Conclusions |
|---|
|
|
|---|
Various approaches to optimize current CT practice at every centre in order to improve image quality and reduce patient dose have been analysed. This includes using pitch 1.5 instead of 1, selecting adequate scan parameters and reconstruction algorithms, using tube current modulation when available, or establishing a standard protocol. The EC quality criteria proved to be, with some limitations, a useful tool to evaluate the quality of the current CT practice for lung carcinoma examinations in the area surveyed.
| Acknowledgments |
|---|
| Footnotes |
|---|
Received for publication February 16, 2003. Revision received December 15, 2003. Accepted for publication April 15, 2004.
| References |
|---|
|
|
|---|
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| BJR | DMFR | IMAGING | ALL BIR JOURNALS |