| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
Paper |
1 Centre for Software Reliability and 2 Psychology Department, City University, Northampton Square, London EC1V 0HB and 3 Institute for Communicating and Collaborative Systems School of Informatics, The University of Edinburgh, 1 Buccleuch Place, Edinburgh EH8 9LW, UK
| Abstract |
|---|
|
|
|---|
| Introduction |
|---|
|
|
|---|
As is common in the radiological literature, we will use the abbreviation "CAD" to mean the computer tool, whenever the context does not create ambiguities with its literal meaning "detection activity aided by a computer".
CAD is used to alert (prompt) a human expert (typically a radiologist) to areas of a mammogram (a radiography image of the breast) where computerised image analysis suggests that abnormalities may be found. Typically, the CAD tool processes a digitised version of a mammogram and marks it with "prompts" to highlight mammographic features that the reader should examine. The design goal for CAD is to aid human experts (hereafter referred to as "readers") to notice features in a mammogram that might indicate cancer but that they may otherwise miss.
Our interest in the humanCAD system originated from a clinical trial funded by the UK Health Technology Assessment (HTA) programme [1]. The goal was to assess the impact of a particular CAD tool, R2 ImageChecker® M1000 (R2 Technology, Inc., Los Altos, CA) [2], in breast screening. The trial was designed to test whether CAD increases readers' sensitivity (the proportion of cancers recalled out of all cancers) without adversely affecting their specificity (the proportion of normal cases not recalled out of all normal cases). In Section 1 of this paper we outline aspects of the HTA trial (reported in detail in [1]) that are relevant to our research.
Our approach is multidisciplinary and combines insights from various disciplines: reliability engineering, computing, psychology, human factors and sociology. We were granted access to the HTA trial data and conducted supplementary analyses to investigate in more detail the effects of CAD on readers' decisions. Our analyses were influenced by reliability modelling in engineering [3, 4] and focused on how, inter alia, the effects of CAD vary between cases or depend on whether the tool prompts mammograms correctly or not. Section 2 summarises some results of these analyses, which have been partly reported in an earlier publication [5]. We also conducted two follow-up studies, outlined in Section 3, to investigate how readers react to incorrect CAD output [6]. The HTA trial and the follow-up studies were complemented by ethnographic studies of CAD use (see Section 4), which have been partly reported by Hartswood et al [7].
The discussion in Section 5 attempts to integrate the different findings and highlights their potential implications for the design, evaluation and use of CAD. This demands that we consider how we may extrapolate from data obtained in trial conditions to the real-world use of CAD in breast screening; we reflect on the potential benefits and as yet unresolved questions of using a multidisciplinary approach to address this important issue.
| 1. The HTA trial |
|---|
|
|
|---|
The trial was run with 50 readers experienced in breast screening, and used 180 cases (a mixture of 60 cancers and 120 normal cases) distributed in three sets of 60. All participating readers saw all the cases in two different experimental conditions: (a) "unprompted condition", i.e. without CAD; and (b) "prompted condition", i.e. with CAD. The order of conditions was randomised across participants. In both conditions, the participants saw two versions of each case: (1) the mammograms positioned on a standard viewing roller; and (2) a digitised version of the mammograms printed on paper. In the prompted condition, the printouts contained the prompts generated by CAD. Participants were also asked to make their decisions regarding whether a case should be recalled for further tests as if they were viewing the mammograms as single readers in the screening programme. More details of the procedures can be found in Taylor et al [1].
Analysis of the results showed no statistically significant impact of CAD (no improvement and no reduction) on the readers' sensitivity and specificity [1].
| 2. Supplementary analyses of the data from the HTA trial |
|---|
|
|
|---|
Throughout this paper, we will be talking about "errors" of human (reader) or machine (CAD). We clarify here how we define these errors and present the terminology we will be using.
2.1. Potential of CAD for improving readers' sensitivity
It is interesting to estimate how much CAD could potentially improve readers' sensitivity for the test set in the trial.
In the HTA trial, readers in the unprompted condition (without CAD), made 2994 decisions about cancers (50 readers x 60 cancers6 missed data points), 741 of which were incorrect. Thus, the estimated probability of human error for cancers in the unprompted condition is 741/2994=0.247 (95% confidence interval 0.2320.263). Of these 741 errors, 314 occurred on cases correctly prompted by CAD.
If we assume (optimistically) that readers in the prompted condition will make the best possible use of CAD, i.e. (1) will recognise and recall all the cancers correctly prompted by CAD and (2) will still recall all cancers that they recalled in the unprompted conditions, even with incorrect output from CAD, then the expected number of errors for cancers in the prompted condition is 741314=427 (of all 2994 decisions), i.e. the probability of failure in the prompted condition is estimated to be 427/2994=0.143 (95% confidence interval 0.1300.156). Considering only the mean estimates, the estimated potential for improving readers' sensitivity equals 314/2994=0.105 (10.5%). Using a similar approach in another study, Warren Burhenne et al [9] reported "...CAD prompting could have potentially helped reduce..." the initial rate of false negative errors (0.21) by 77%, i.e. they estimated the potential for improving readers' sensitivity as 0.21 x 77%=16.2% (0.162).
Analysis of the HTA trial data [1] shows that the estimated potential improvement in sensitivity of 10.5% does not seem to be realistic, at least for this experimental setting, and that the assumptions for the above calculation were not verified. In the prompted condition, readers missed correctly prompted cancers; they also made errors that they did not make in the unprompted condition, as shown in Table 1
for different categories of cancers.
|
We ignore those outcomes in which the two decisions were either both correct or both wrong, and consider only the following outcomes:
If there was statistical independence between the outcomes "prompted decision better/worse" and "CAD output correct/incorrect", then we could conclude that the correctness of CAD output does not affect readers' decisions. We tested for independence using Fisher's Exact Test [11] for the contingency tables given in Table 2
. Independence would mean that the probability of the prompted decision being better rather than worse would be the same for cases correctly processed by CAD as for cases incorrectly processed by CAD.
|
1.22 for incorrectly processed cancers and 113/147
0.77 for correctly processed cancers. The odds ratio equals 1.22/0.77
1.58. The test indicates that this ratio is significantly different from 1 with a p-value of 0.0274. The results indicate that correct CAD output is likely to help in reaching a correct decision and that incorrect CAD output makes it more difficult.
2.3. Error rates of readers and CAD for different populations of cases in different conditions
To investigate the effect of correctness of CAD output, we estimated the probabilities of human error in both the prompted and the unprompted conditions, for cases categorised according to case type (normal or cancer) and the output of CAD (correct or incorrect).
These estimates are shown in Table 3
. Both for cancers and for normal cases, all the estimated probabilities of error do not differ significantly between the prompted and unprompted conditions, except for those normal cases that CAD did not prompt. For this category of cases, the readers' rate of false recalls in the prompted condition was smaller than in the unprompted condition by 0.06.
|
We tested the correlation between errors made by readers and CAD for significance with Fisher's Exact Test [11] applied to the contingency tables for correct and wrong decisions by readers, on cancers and on normal cases in both conditions (see Table 4
). The test indicated significant correlation for all four contingency tables (p<0.05).
|
We define "non-obvious" as a case on which at least one reader made an incorrect decision, either in the prompted or unprompted condition.
Analysis of variance (ANOVA) indicated that the value ddp is significantly different for groups of cases processed by CAD correctly and incorrectly:
In an earlier study [5], we applied logistic regression to highlight general patterns in the effect of CAD. It appears that CAD tends to make cancers that are relatively easy (i.e. with d<0.6) less difficult (i.e. dp<d) and cases that are relatively difficult (i.e. with d>0.6) even more difficult (i.e. dp>d).
The plot in Figure 1
illustrates this effect. The horizontal axis represents the unprompted difficulty d. The vertical axis shows the difference dpd. So, a point below the horizontal line indicates a cancer for which CAD appears to reduce the rate of reader errors for cancers. Points marked w and c indicate the observed values of d and dpd for non-obvious cancers, divided into those with correct CAD output (c) and those with wrong CAD output (w). The curves show the regression estimate for the mean value of dpd for cases with different difficulty d. The dashed curve corresponds to incorrectly prompted cancers, the dotteddashed curve to the correctly prompted cancers, and the solid curve to all cancers together. For more details of our regression analyses, see Povyakalo et al [5].
|
| 3. Follow-up studies |
|---|
|
|
|---|
The test sets in the HTA trial did not contain enough examples of cancers incorrectly processed by CAD (in particular, "unmarked" cancers; see definitions in Section 2) to allow us to draw statistically significant estimates of their effects on the readers' decisions. We ran a follow-up experiment (Study 1) with a new test set (60 cases) containing a larger proportion of cancers missed by CAD (20 of the 30 cancers in the set). Nine of the false negatives in our test set were "unmarked cancers". We kept all other characteristics of the test set as similar as possible to the sets used in the HTA trial, as we wanted the readers to perceive this study as a natural extension to the original trial and to behave in a comparable way.
The participants in Study 1 were 20 readers who had participated in the original trial. We used essentially the same procedures as used in the HTA trial, except that readers in Study 1 saw all the case, only once, always with the benefit of CAD ("prompted condition").
At that stage, we were not interested in comparing readers' performance with and without CAD; our goal was to estimate the probability of reader error when the output from CAD was incorrect. However, the results turned out to be highly unexpected: the average reader sensitivity was surprisingly low (52%) and this decrease was particularly strong for the "unmarked" cancers. This led us to suspect that CAD errors may have had a significant negative impact on readers' decisions. On the other hand, we could not exclude the alternative explanation that the cases in our study had characteristics that made them particularly difficult (perhaps mammographically undetectable) for both human readers and the CAD tool.
As a "control" for Study 1, we ran Study 2, where readers saw the same test set without CAD ("unprompted condition"). We used 19 readers from three different UK screening centres, none of whom had participated in Study 1 but who were equivalent to the group used in Study 1 in terms of years of experience and professional background. We used the same procedures as in Study 1 except that readers did not see the CAD output.
Additionally, we conducted a new test with six of the more experienced participants in Study 2 to get a better understanding of the "difficulty" of the cancers in our test set. These participants were presented again with the 30 mammograms containing cancer and were asked to rank them according to various criteria of case "difficulty". The responses of this subset of readers, as well as the performance of all participants in Study 2, strongly indicated that six of the cases in our test set were probably "occult" cancers, undetectable via mammography, and so we eliminated these cases from our analyses (see more details in Alberdi et al [6]).
Readers' average sensitivity for the remaining 24 cancers was 61% for those who saw the cases with CAD (Study 1) and 73% for those who saw them without CAD (Study 2). The difference in average sensitivity between the two sets of readers was statistically significant. In contrast, the average specificity in Study 2 was lower than in Study 1 (86% vs 90%), but the difference was not statistically significant.
Table 5
shows the proportions of incorrect human decisions in Study 1 and Study 2 for the 54 analysed cases, categorised according to case type (normal or cancer) and output of CAD ("unmarked", "correctly marked", or "incorrectly marked"). ANOVA showed statistically significant differences between Study 1 and Study 2 for the "unmarked cancers" (p<0.001), the "incorrectly marked" cancers (p<0.05) and the "unmarked" normal cases (p<0.05).
|
| 4. Ethnographic studies of CAD use |
|---|
|
|
|---|
Using a "think aloud" protocol in which participants vocalised their thought processes to the observer as they read cases, we attempted to explore the sense readers made of prompts in the context of the mammograms on which they occurred. Subsequently, we discussed with readers cases that they identified as problematic (especially cases on which they had spent a substantial amount of time) to clarify how they dealt with these "difficult" cases.
A significant finding was the importance readers attached to ascertaining what a prompt "meant": how it could be explained in the context of the mammogram. In some cases, the accounts were of the order "I don't know why it's prompted that", in others readers saw the features prompted as, for example, composite shadows, and gave an account of why they thought the CAD tool had prompted the feature together with what they saw it as actually being (e.g. benign, because it could be "picked apart").
Although readers were advised to use CAD as an attention cue, and to use their own judgement to decide whether a prompted feature required recall, we observed that they sometimes used prompts to inform their decisions. For example, one reader commented: "This is a case where without the prompt I'd probably let it go ... but seeing the prompt I'll probably recall ... it doesn't look like a mass but she's got quite difficult dense breasts ... I'd probably recall." In other instances, we observed readers using the absence of a prompt as evidence for "no recall".
The explanation for this may be that the uses to which prompts are put are contingent on the specific problems posed by individual cases. For example, reading dense, feature-rich breasts poses demands very different from those of lucent or uncomplicated breasts, and the reader's comment above demonstrates how she marshals the "evidence" of the prompt in making a decision under these specific circumstances.
Over time, readers acquired a "biography" for the CAD tool: they came to believe they knew what features it would and would not prompt and they read with this putative biography as a factor in their work. For example, readers would often remark that they had anticipated that the system would prompt for a particular feature within the breast, sometimes then dismissing the prompt as they already had judged the feature to be benign. This does not mean that they would ignore the prompt, or that they would not pay serious attention to it, but that they thought the prompt could be expected given their emerging understanding of what the CAD tool could and did do.
Post-trial discussions with readers indicated some of the strengths they attributed to the CAD tool:
On the other hand, readers noticed the following weaknesses of the tool:
| 5. Discussion |
|---|
|
|
|---|
10%. However, the trial showed no statistically significant effects of CAD on readers' average sensitivity. This result is actually consistent with most experimental measurements of the impact of the R2 ImageChecker® and other similar CAD tools on mammogram reading (e.g. [1216]). To improve the effectiveness of computer aids, it is desirable to explain why CAD apparently had no effect in these studies.
A simple conjecture would be that readers tend to ignore CAD outputs, possibly because the high number of false prompts creates excessive load. However, our statistical analyses and ethnographic studies do suggest systematic effects of the use of CAD, which are positive or negative depending on aspects of the cases and on CAD output. Readers' reports suggested that the presence of CAD prompts had at times alerted them to relevant mammographic features that they would have missed otherwise as well as affecting their recall decision for features that they had already noticed. These reports are corroborated by our supplementary analyses of the HTA trial data, which indicate that for a subset of cases CAD did have beneficial effects on readers' decisions (see Table 2
and Figure 1
). At the same time, readers often dismissed explicitly many of the prompts as they considered them false. Analysis of the data indicates that some of these were, in fact, correct prompts: readers sometimes had difficulties distinguishing between correct and incorrect prompts.
Our statistical analyses also show that CAD output could have detrimental effects on readers' sensitivity for a subset of cases ("difficult" cancers), especially when the output of CAD was incorrect.
We argue that CAD, rather than having too little effect on readers' decisions to produce a measurable impact, had both beneficial and detrimental effects on readers' performance, but in the trial these effects compensated for each other, resulting in no significant impact on average sensitivity.
5.2. Effects of the absence of prompts
By choosing in our follow-up studies a test set with a large proportion of cancers missed by CAD, we managed to isolate (somewhat serendipitously) the potential detrimental effects of CAD on reader sensitivity. Participants who read our test set with the benefit of CAD (Study 1) showed a significantly lower sensitivity than those who saw the same cases without CAD (Study 2).
This effect was particularly marked for those cancers that CAD did not prompt and led us to conjecture that the absence of prompts may have a much bigger impact on readers' decisions than anticipated. Our analyses of the data from both the HTA trial and our follow-up studies strongly suggest that readers may have used the absence of prompts on a mammogram as a sort of reassurance for their "no recall" decisions for normal cases. It appears that, based on their experience with the tool, readers tended to (correctly) assume that the absence of prompting was a strong indication that a case was normal. The participants (both in the HTA trial and in follow-up Study 1) were very unlikely to recall cases for which CAD had issued no prompts. Again, these findings are broadly corroborated by our ethnographic observations.
One could argue that this is a rational approach. Readers perceived many of the prompts as distracting. As most mammograms contained prompts, the absence of prompts was more informative than their presence (especially if detailed analysis of every prompt was too demanding and thus practically infeasible). This can be beneficial when dealing with equivocal normal cases (see Table 2
). However, as the results from our follow-up studies indicate, this can have damaging effects on readers' decisions for difficult-to-detect cancers that CAD does not prompt.
To our knowledge, such detrimental effects have not been reported before in the radiological/CAD literature. Earlier human factors studies of the effects of computer failures on human behaviour have shown that the failure of a computer aid to detect and warn of a target event could make users less likely to make the right decision for the event [17, 18]. However, the participants in such studies are typically students working in artificial laboratory settings, while our participants were experts working in relatively realistic settings relevant to their area of expertise.
One plausible mechanism to explain these effects is that the absence of prompts made readers revise their decisions for ambiguous abnormalities that they had already detected. In other words, they may have used the absence of prompts as a reassurance for a "no recall" decision when dealing with features they found difficult to interpret. It is possible that readers were using whatever evidence was available to resolve uncertainty. The implication is that CAD was being used not only as a detection aid but also as a classification or diagnostic aid, which is not what the tool is designed for. This is consistent with our observations from our ethnographic studies (Section 4) and also with earlier studies of CAD tools [19]. But we cannot exclude alternative mechanisms. For example, the absence of prompts may have caused readers to pay less attention to a case and as a result, they may have failed to detect signs that they would not have missed otherwise (as proposed by studies of "automation bias" or "over-reliance" on computer advice [17]). Although this is a plausible scenario, we have not found evidence to support its occurrence in the HTA trial or our follow-up studies.
5.3. Difficulties of extrapolating results to real-world practice
Most of the findings reported in this paper are based on studies and analyses of readers' behaviour in trial conditions. Although there was an attempt to make the experimental settings reasonably realistic, many artificialities and simplifications were unavoidable. One must be careful, therefore, when extrapolating from the behaviours observed to effects in the field. We highlight here some important differences between the trial(s) and everyday practice.
(a) A common criticism is that clinical trials are conducted with test sets containing unrealistically high proportions of pathological cases (so as to achieve sufficient statistical power with manageable numbers of cases). Evidence that radiologists do behave differently when faced with case samples containing different prevalence of disease has been reported [20, 21]. These effects have not received sufficient attention to date and are worth exploring further.
(b) In everyday breast screening, readers have access to many other sources of information in addition to CAD (e.g. earlier mammograms, medical records, etc.); these sources of information may make readers interpret CAD output in different ways from how they did in the trials.
(c) In the NHS Breast Screening Programme, reading is essentially a collaborative activity [22]; double reading is common practice and the final decision on a case is often the result of group discussion. In contrast, trial participants acted as though they were reading alone, which may have influenced how they interpreted the prompts and made their decisions.
(d) In the trial, we saw how readers attempted to make sense of the tool's behaviour. In everyday practice, readers would have a better opportunity to gain a progressive understanding of how the tool works and to adjust their interpretation of its behaviour accordingly.
Despite these difficulties, many of the considerations derived from the clinical trial are relevant and useful. The statistical analyses, in conjunction with observations of humans in this and similar tasks, indicate plausible mechanisms that would cause the effects we observed, but do not allow us to decide which of these mechanisms will be active to a perceptible extent in a given activity. However, finding evidence of them in practice, even in partially artificial environments, is prima facie evidence of them being likely to occur. Designers of tasks and the computer tools to support them should consider how these behaviours may arise, and how to adapt tools and procedures to reduce those that are considered negative; assessors should consider them as factors that may change between the clinical trial and clinical use.
5.4. Methodological implications for the evaluation of computer aids
The use of a standard clinical trial regimen for evaluating new healthcare technologies has been subject to much criticism in recent years [2326]. One problem that has been highlighted is that trial designs inevitably ignore the contextual nature of the work being supported, raising doubts about extrapolating trial results to real settings of use. Our decision to employ a multidisciplinary approach to CAD evaluation was partly intended as an exploration of how criticisms of clinical trials might be addressed.
Our experience of using ethnographic methods to complement statistical analysis has shown some promise. Ethnography aims at understanding the context of work practices, i.e. how different are they actually performed in the workplace. The value of ethnographic methods has already become quite widely recognised as a way of informing requirements so that information technology (IT) systems are designed appropriately for their actual circumstances of use [27]. In our evaluation of CAD, we also found that ethnographic methods can be valuable in addressing the "ecological validity" of clinical trials by helping the interpretation of trial results to take into account differences between the context of the trial and a more realistic context of use.
First, it is only through ethnographic studies that we have been able to gain an understanding in detail of the character of everyday screening work and the context in which a CAD tool would be used. This has helped us to identify possible mismatches between the tool as designed and readers' requirements [7]. As an example, we observed readers' perceived need to explain the behaviour of the CAD tool in order to use it properly, and how they struggled to do this with the tool as designed. Second, ethnographic studies helped to reveal aspects of how readers actually used the CAD tool in the trial. We found evidence of readers not adhering to the trial protocol of using the tool as an attention cue and instead, using it at times as a decision aid. Ethnography and statistical analysis thus corroborated each other's conclusions. Third, ethnographic findings also influenced the choice of probabilistic models [4] by avoiding unrealistic assumptions that would invalidate results.
From a statistical viewpoint, a first consideration is simply that averages may hide substantial variations between subpopulations. Our statistical analyses, motivated by the "diversity modelling" approach (reported in references [3 and 4]) and its emphasis on how performance varies across classes of cases, have proven very useful here.
We found both beneficial and detrimental systematic effects of the use of CAD that just happen to cancel out in the trial (although the detrimental effects appear acutely in our follow-up studies). If these effects were to be present in practical clinical use, with different mixes of cases and readers from those in the trial, the net overall effect might be positive, negative or null, in addition to some possible transfer of risk between categories of patients. It might still be possible to estimate the net effect in future clinical use from the results in the trial [4] if further studies showed that the {case, reader} pairs can be classified by variables that can be estimated in both situations before introducing CAD and that are sufficiently predictive.
5.5. Implications for CAD design and deployment
We consider here the implications of four of our main sets of findings.
(a) Limited diversity between human and machine errors. In the HTA trial, CAD errors were heavily correlated with the "difficulty" of cases (see Section 2.3.; Table 3
). To increase the "potential" advantage that we estimated in Section 2.1., CAD should prompt correctly those cases where unaided readers would tend to fail: its error pattern should be as "diverse" as possible from the readers" (cf. the mathematical models in references [3 and 4]). Some improvement in CAD effectiveness could be sought by increasing this diversity, even without improving the average sensitivity or specificity of CAD. The tool could be tuned to be more sensitive for classes of cases on which readers tend to be less effective, as these cases are natural candidates for CAD to make a difference. Perhaps CAD thus tuned to be "more diverse" from its users may improve the latter's performance more than CAD simply tuned to be very good (in terms of sensitivity, specificity or any weighted combination of the two). We believe that this possibility is worth exploring, although how much gain (if any) it would produce depends on the details of the specific CAD algorithms and the degrees of freedom in tuning them. It might even be desirable to adapt CAD tuning differently for each individual reader, automatically or manually.
(b) Evidence of systematic positive and negative effects of CAD. Specific human behaviours that are considered undesirable may be targeted with methods for either avoiding them or correcting their negative effects ("fault tolerance"). We could think of many solutions, whose acceptability in the specific working environment would need to be checked before they are adopted. For example, the cognitive load on readers could be reduced by not repeating prompts on features that readers have already noticed and marked. Similarly, the self-calibration of readers, and their reliance at times on the CAD outputs as a decision aid instead of an attention cue, could be corrected by including in their normal workload fictitious cases with incorrect CAD outputs, rarely but yet frequently enough to refresh their memory of types of possible errors by CAD. Changes could also be made to the CAD tool so that it makes readers' violations of prescribed protocols of use impractical or harder to justify.
(c) Readers' evident concern to explain the presence or absence of prompts. One approach would be to explore how the CAD tool could provide explanations of its behaviour on demand. The challenge would be to produce the sorts of accounts that would be useful to a reader, which calls for an understanding of the sorts of explanations where confusion may arise. The question also exists whether such exacting specifications could be implemented reliably enough not to make the explanation facility a source of further problems.
(d) Readers seemed to be overinfluenced by the absence of prompts. When using CAD, readers are required to pay attention to prompts and not to their absence, but, in reality, they appeared to be doing the opposite. Design changes might be implemented to combat this problem (e.g. by introducing mechanisms that make readers aware of the risks of this behaviour). However, one could argue that the requirement that readers not be influenced at all by the absence of prompts is possibly psychologically impossible to satisfy as well as normatively incorrect, if the absence of prompts has indeed informative value; in this case, there could be merit in seeking to provide readers with a heuristic procedure to follow that would give absence of prompts approximately the right weight in decisions, rather than attempting to have them ignore it altogether.
| 6. Concluding remarks |
|---|
|
|
|---|
Our multidisciplinary approach has helped to address some of the recognised limitations of the clinical trial as an evaluation methodology for IT-based healthcare interventions. The use of ethnographic observation in particular has helped to put the clinical trial results into a real-world context of use and thereby gain a better understanding of their implications for the everyday work of reading mammograms. We would note also that the introduction of a new technology such as CAD may change work practices and thus, in an iterative process, make new demands on the technology. Evaluation methodologies that do not take into account the influence of adaptation and learning may fail to deliver meaningful results for users and designers alike and fail to contribute to improving the technology. Again, our multidisciplinary approach has enabled us to gain some perspectives on how readers learn and adapt to the behaviour of the CAD tool and has suggested some promising lines of enquiry for how the tool and the procedures for using it may themselves be adapted.
In conclusion, our studies have provided some insight about the effects of CAD on readers' performance, as well as some methodological indications. Both results may be of interest for a larger class of tasks and computer aids than that of CAD for mammography.
| Acknowledgments |
|---|
| Footnotes |
|---|
| References |
|---|
|
|
|---|
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| BJR | DMFR | IMAGING | ALL BIR JOURNALS |