Deep Learning in Radiology and Medical Imaging: Where Does the Evidence Stand in 2026?

Creator: Cem Akaltun, MD
Published: 2026-05-26

The first randomized trial in mammography (MASAI) found AI-supported reading non-inferior to double reading, yet no imaging AI has shown a mortality benefit in a randomized trial.

By Cem Akaltun, MD · June 9, 2026Updated · ~12 min read Imaging & Radiology

Radiology has become the corner of medicine where artificial intelligence took root fastest, and the reason is straightforward: images are digital, abundant and labellable, and deep learning excels precisely at this kind of pattern-recognition task. Yet the distance between "FDA-cleared" and "actually helps the patient" is the most critical and most frequently glossed-over distinction in the field. This article assembles the most current evidence (2025-2026) across mammography, computed tomography (CT), magnetic resonance imaging (MRI) and chest radiography, without overstatement: what has genuinely been proven, what remains unproven, and exactly where the sources contradict one another.

Mammography: the strongest evidence, delivered by the first randomized trial

For years the debate over AI in mammography ran on retrospective data alone. That changed in early 2026. The MASAI trial is the first completed randomized controlled trial (RCT) in breast imaging AI. In Sweden, 105,934 women were randomized 1:1, comparing AI-supported screening with standard double reading (Gommers et al., The Lancet, January 2026).

The results deserve a balanced reading. For the primary safety endpoint of interval cancer (cancer that surfaces between screens, i.e. missed at screening), the AI arm was non-inferior to double reading: 1.55 versus 1.76 per 1,000 (ratio 0.88; 95% CI 0.65-1.18; p=0.41). The AI arm showed a significantly higher sensitivity, 80.5% versus 73.8% (p=0.031), while specificity was identical at 98.5% in both arms. The AI arm also produced fewer invasive and fewer aggressive interval cancers, and cut the screen-reading workload by roughly 44%. The "+29% screen-detected cancer" figure reported in an earlier interim analysis has now been placed in context by this final paper's interval-cancer and sensitivity/specificity endpoints.

The RCT should be read alongside the largest prospective real-world evidence: the PRAIM study (Eisemann et al., Nature Medicine, January 2025), covering 461,818 women across 12 sites and 119 radiologists. The cancer detection rate was 6.7 versus 5.7 per 1,000 (relative +17.6%; 95% CI +5.7%, +30.8%); crucially, this gain came without raising the recall rate (37.4 versus 38.3 per 1,000, non-inferior). In other words, more cancers were caught without recalling more women unnecessarily. But this is an observational implementation study: because radiologists chose their reading pathway after seeing the AI's prediction, a "reading-behaviour bias" arose, and the results had to be corrected with propensity-score weighting — a structural weakness of real-world designs.

Two positions in the same field, side by side

MASAI and PRAIM strongly support that AI is "ready" in mammography. By contrast, an editorial in the Journal of Breast Imaging (McDonald et al., 2025) argues for preserving the RCT standard because of "the gap between rapid regulatory clearance and reliable RCT evidence." The two views do not conflict; they stress, from opposite ends, that the picture is not yet complete.

Stroke triage: time was saved, but outcomes (mortality, disability) remain unproven

In large-vessel-occlusion (LVO) stroke, minutes are brain tissue. AI-based triage software aims to accelerate the workflow by automatically flagging an occlusion on CT angiography and alerting the relevant team. A systematic review and meta-analysis of Viz.ai software (Translational Stroke Research, 2025) reported a significant reduction in door-in–door-out time, particularly for cases requiring inter-facility transfer. The independent VALIDATE study spanned 166 facilities and 14,116 patients across 17 states.

The honest limit here must be stated plainly: most of these studies run on surrogate (time) endpoints. Robust, independent RCT data measuring the patient's actual outcome (functional independence by mRS, mortality) are limited, and most are manufacturer-sponsored. The striking sensitivity figures vendors present — for instance Aidoc's report at ISC 2026 of 92.6% in head-of-the-vessel LVO versus 70.4% for a conventional solution — still await independent, peer-reviewed, prospective confirmation.

TB screening with chest X-ray: AI catches the cases that would slip through

In low-income, high-TB-burden settings a radiologist is not always available. Computer-aided detection (CAD) software shows its most concrete public-health contribution here; the WHO has recommended CAD for TB screening and triage since its 2021 guideline. The value shows in practice: in some real-world programmes more than half of microbiologically confirmed TB cases were asymptomatic — they would have been missed by symptom screening alone.

But the products are not equal. The external validation within South Africa's TB prevalence survey (Qin et al., The Lancet Digital Health, 2024; a correction was published) compared 10 commercial CAD products. At 90% sensitivity, five products achieved >60% specificity; yet only Lunit and Nexus had confidence intervals encompassing the WHO target-product-profile of 70% specificity. CAD4TB, InferRead and Genki stayed in the 50-60% band. This nuance matters: sweeping claims that "qXR/CAD4TB meets the target" are overly optimistic. A study in household contacts (Clinical Infectious Diseases, 2025) found AUCs for prevalent TB of 0.87 for CAD4TBv7, 0.88 for qXRv3 and 0.91 for Lunit v3.1, with CAD outperforming a blood test. Even so, thresholds vary widely by population, and the literature warns against transplanting published thresholds uncritically.

Nodules and cancer on chest CT: sensitivity rises, at the cost of unnecessary follow-up

A systematic review of AI's effect on chest CT screening (Geppert et al., Thorax, 2024) summarizes the trade-off in balanced terms. AI-supported reading increased sensitivity by +5% to +20% but lowered specificity by −3% to −8%. The cost side of this equation is concrete: assuming a 0.5% cancer prevalence, per million screens roughly +150 to +750 additional cancers are caught while at the same time +59,700 to +79,600 people are sent for unnecessary CT follow-up. Moreover, all 11 included studies carried a high risk of bias and applicability concerns. Net benefit, therefore, depends tightly on the population screened and the threshold chosen.

Domain / Study	Key metric	Evidence type
Mammography — MASAI (Lancet 2026)	Sensitivity 80.5% vs 73.8% (p=0.031); interval cancer non-inferior (ratio 0.88)	RCT
Mammography — PRAIM (Nat Med 2025)	Detection +17.6%; recall non-inferior	Prospective observational
TB — South Africa (Lancet Digit Health 2024)	At 90% sensitivity only Lunit/Nexus met the 70% specificity target	External validation
Chest CT (Thorax 2024)	Sensitivity +5%/+20%; specificity −3%/−8%; ~60-80k unnecessary follow-ups per million	Systematic review

"Clearance" is not "evidence": the regulatory landscape

Radiology dominates the FDA's AI authorizations. As of December 2025, roughly 1,451 AI/machine-learning devices had been cleared cumulatively, with about 71.5-75% of 2025 clearances in radiology (historical average ~76%). The more important finding, however, lies not in the count but in the depth of evidence: a systematic review in JAMA Network Open (Potnis et al., 2025) found that, as of September 2025, 723 of 950 devices (76%) were radiology, of which 97% went through the 510(k) pathway, only 29% included any clinical testing, and just 5% included prospective testing. This is the central pillar of the "clearance does not equal clinical effectiveness" argument.

The regulatory framework must not be misrepresented either. The EU AI Act classes radiology AI as "high risk," but its effective dates have been pushed back: 2 December 2027 for standalone high-risk systems and 2 August 2028 for AI embedded in medical devices. The claim that it is "mandatory in 2026" is therefore wrong. As for fully autonomous reading: Oxipit ChestLink, which can report normal chest radiographs without a radiologist's review, is the first fully autonomous imaging AI to carry a CE mark (Class IIb); the manufacturer reports >500,000 radiographs and a 99% sensitivity claim. By contrast, there is as yet no FDA clearance in the United States — so it would be incorrect to say "autonomous reading is permitted in the US."

What remains unproven? An honest framing

Despite all the positive data, several gaps stubbornly persist. First, no imaging AI has demonstrated a mortality benefit in an RCT; even MASAI uses interval cancer, not mortality, as its endpoint. Second, generalizability is a chronic problem: a systematic review of AI performance across different clinical settings in radiology (January 2022-June 2025; CT/MRI/X-ray) found that every included study was retrospective and not one was a prospective clinical study — even low bias does not guarantee real-world benefit. "Distribution shift," the drop in performance across different scanners, hospitals and populations, remains the principal deployment barrier.

The third and clinically most insidious risk is automation bias, and this is now a measured reality. In a TOF-MRA study for cerebral aneurysm (Kim et al., La Radiologia Medica, 2025), a false-positive AI finding significantly increased radiologists' suspicion of an aneurysm (p=0.01), and inexperienced readers tended to recommend unnecessary, more intensive follow-up (p=0.005). In other words, the assistive tool can pull the decision toward itself even when it is wrong. Add publication bias and manufacturer sponsorship, and it becomes clear that the most dazzling performance figures (e.g. 92.6% or 99%) await independent confirmation.

Conclusion

As of 2026, deep learning in radiology rests on a more mature evidence base than anywhere else in medicine — but that maturity is unevenly distributed. Mammography is the firmest ground: an RCT (MASAI) and a large prospective study (PRAIM) together show that AI-supported reading is non-inferior to double reading, raises sensitivity, and roughly halves the reading workload. In stroke triage the time savings are measurable; in TB screening AI catches cases that would otherwise be missed. Against this, no domain has shown a mortality benefit in a randomized trial, prospective external validation is chronically lacking, only a small minority of FDA clearances include prospective clinical evidence, and automation bias is a genuine patient-safety concern. The correct framing is not "AI is replacing the radiologist"; rather, AI that is well designed, validated in the right population and kept under human oversight is a tool that augments the radiologist on specific tasks. The critical next step is not more clearances, but more prospective, independent and outcome-oriented evidence.

References

Gommers J, et al. Interval cancer, sensitivity, and specificity comparing AI-supported mammography screening with standard double reading in the MASAI study (randomized controlled trial). The Lancet. 2026;407(10527):505-514. site
Eisemann N, et al. Nationwide real-world implementation of AI for cancer detection in population-based mammography screening (PRAIM). Nature Medicine. 2025. site
Potnis KC, et al. FDA Approval of Artificial Intelligence and Machine Learning Devices in Radiology (systematic review). JAMA Network Open. 2025. site
Geppert J, et al. AI for lung cancer detection on CT screening: a systematic review. Thorax. 2024;79:1040-1049. site
Qin ZZ, et al. Computer-aided detection products for TB screening (external validation; correction published). The Lancet Digital Health. 2024;6:e605-e613. site
Computer-aided detection accuracy in household TB contacts (prospective cohort). Clinical Infectious Diseases. 2025. site
Kim SH, et al. Automation bias in cerebral aneurysm detection on TOF-MRA (observer performance study). La Radiologia Medica. 2025;130:555-566. site
Automated emergent LVO detection using Viz.ai software: systematic review and meta-analysis. Translational Stroke Research. 2025. site
Assessing the generalizability of artificial intelligence in radiology: a systematic review across clinical settings. 2025. site
McDonald ES, et al. The gap between regulatory clearance and reliable RCT evidence in breast AI (editorial). Journal of Breast Imaging. 2025. site
European Society of Radiology. Guiding AI in radiology: ESR's recommendations for the European AI Act. Insights into Imaging. 2025. site
The Imaging Wire. Numbers from the FDA show radiology is maintaining its lead (industry statistics). 2026. site

Disclaimer: This content is for educational and informational purposes only and does not substitute for diagnosis or treatment decisions. The AI systems referenced should be used under physician supervision and within their respective regulatory clearances and institutional validations. Clinical decisions belong to the physician evaluating the patient.