Despite its success, however, progress in improving CAD algorithms in mammography has slowed considerably since 2001, when the technology became reimbursed in the United States. It appears that the focus of CAD companies has shifted from basic development towards sales and marketing, and to new applications like MRI and tomosynthesis. This might be acceptable if users were happy with the technology. But in reality, most radiologists feel that much improvement is still needed. They are generally pleased with the performance of CAD for clustered microcalcification detection, but have little confidence in CAD for mass detection, which is the main sign of invasive cancer.
While there is agreement among users that CAD for mass detection should be improved, few have actually thought about novel directions of research that could bring the technology to the next level. The most common complaint of radiologists is that current CAD algorithms have too many false positives. Indeed, considering that current algorithms operate at a level of one or two false positives per four-view case (MLO and CC views of the right and left breast), there are still a few hundred false positives for every true positive in a screening setting.
When asked what is so bothersome about false positives, some users comment that false positives confuse them or slow down the reading. However, the worst aspect of false positives is that, as readers see many irrelevant CAD marks on obviously normal regions, they lose confidence in CAD. Many wonder how a system that operates so poorly in this aspect can ever be of help to them.
Interestingly, experienced readers seem to be more positive about CAD than inexperienced readers, although they are rarely excited. It's likely that experienced radiologists are more tolerant to false positives because they can dismiss them easily. They also recognize the main value of current CAD systems: the very high sensitivity. When reading scans rapidly, perceptual oversights occasionally occur, and CAD may help to avoid those. Of course, this is particularly important when there is no double reading. For less experienced radiologists, dismissing false positives is less easy, and they may have to spend a significant amount of time interpreting CAD-marked regions.
Search and interpretation
The fact that CAD may confuse readers is an important message to CAD researchers. It reveals the rather obvious, but often neglected truth, that radiologists have difficulty in interpreting mammographic regions and making the right decisions. In screening programs, the decision to be made is whether a woman should be recalled or not. Ideally, there should be a clear line between suspicious mammographic regions that require additional workup and other mammographic findings. Screening would then simply involve finding these regions, and misses could be classified as "search errors".
This simple model of breast-cancer screening is often advocated by "expert" screening radiologists and, as such, forms the basis of current CAD technology. CAD users are instructed to use the technology as a "checker" to avoid oversights, but are strongly discouraged to use it as an interpretation aid.
Real life is more complex, however. Radiologists often find it hard to decide whether a subtle abnormal finding is suspicious enough to warrant recall. Moreover, perception studies suggest that errors due to incomplete search patterns as targeted by CAD are only a minor source of screening error. The large majority of errors appear to be due to the insufficient ability of readers to interpret the already detected suspicious regions. This may well explain why readers are not too excited about current mammography CAD systems as far as mass detection is concerned: CAD is not focused on the main problem.
Vendors are adding to the confusion by sending out conflicting messages with regard to intended use of their products. One the one hand, they warn users not to change decisions based on CAD. On the other hand, systems are released with variable size markers, where the marker size indicates the CAD-computed confidence that a cancer is present. Such a variable size marker system was used in the British CADETT II trial, which showed that CAD may be as good as double reading.1,2
Although CAD is not intended for this purpose, it is highly likely that many radiologists do use CAD as an interpretation aid when they become more familiar with the technique. They realize, for instance, that the high negative predictive value of CAD is very valuable. Suppose a reader doubts whether to make a recall or not given the presence of an uncertain mammographic finding. When using CAD, the reader will know that the likelihood that a cancer is present becomes lower when CAD does not mark the region and higher when CAD does mark it, in particular if it is marked in both views. Using this information makes a lot of sense and will lead to better decisions on average.
To this end, CAD information should be weighed properly, something that the reader should learn from experience. This potential of CAD has been convincingly demonstrated in a study where findings of radiologists were combined independently with CAD scores, restricting analysis to the findings identified by the readers. The benefit was comparable to the effect of double reading.3,4
Interactive use of CAD
To further explore the idea of using CAD as an interpretation aid, rather than as a perception aid, we developed an experimental environment in which we can present CAD interactively to the readers. This works as follows: whenever a reader sees a suspicious mammographic region, they can immediately activate CAD by using a pointer on the screen, indicating the location of the finding with a mouse click. The display software checks whether there is a CAD mark at the indicated location and, if one is present, displays the CAD region with a coloured region boundary indicating the CAD-computed likelihood of malignancy.
In addition, if the CAD system has corresponding marks related to the same finding in other views of the same breast, as determined by registration, these corresponding regions are also shown. Results are striking. Readers don't see false positives unless they activate them themselves. In this way, irrelevant false positives are not activated and are thus harmless. Correlation between views and display of the likelihood of malignancy transform the CAD system to an intelligent machine that users can have faith in. Because CAD usually confirms the reader's own judgment, the occasional discrepancies are events that trigger the reader to think twice, which is exactly what is intended.
We have started to perform a series of observer experiments with this system and the results are very encouraging. Even without much training, reader performance improved significantly with CAD - much more than when the same CAD system was used as a traditional prompting device. It also took readers no longer to read cases in the CAD sessions than in the unaided sessions.
CAD as a predictor
While one of the keys to better use of CAD is simply presenting the information in a better way, further development of the technology is also needed. In particular, research should focus on improving CAD as a quantitative predictor of cancer, because this is what can make it more powerful as an interpretation aid.
One of the major limitations of current CAD systems within mammography is their inability to perform temporal analysis. In breast-cancer screening, the importance of detecting changes in mammograms from one screening to the next is stressed by expert mammographers. Introduction of digital mammography and PACS in screening offers a great opportunity for development of CAD technology, including temporal comparison of mammograms.
Huge databases will be available in the near future, especially if screening organizations have enough vision to start archiving unprocessed mammograms (which are needed for CAD development and assessment of quantitative parameters like breast density). By exploiting these digital databases, it should be possible to develop CAD technology that operates with at least the same level of performance as the expert.