AI for Disease Prediction and Early Detection: The 2025-2026 Evidence Scorecard
AI now stretches from spotting cancer in blood to reading mammograms and predicting mortality from an ECG; yet the randomized trials of 2025-2026 draw a sharp line between what genuinely works and what remains unproven.
The boldest promise of artificial intelligence in medicine is to catch disease before it announces itself, or at the stage where treatment still has the best odds. The years 2025 and 2026 are when the long-awaited randomized controlled trial (RCT) evidence for that promise finally arrived. The picture demands both encouragement and restraint: some systems genuinely improved hard clinical endpoints such as death and advanced-stage cancer, while others demonstrated that detecting more disease does not automatically translate into more benefit. This article appraises the field in the language of evidence rather than marketing, placing what has been achieved side by side with what remains unresolved.
Multi-cancer early detection: a large promise, mixed proof
The field's most-discussed technology is the multi-cancer early detection (MCED) test, designed to screen for dozens of cancers from a single blood draw. Foremost among these is the Galleri test, which reads methylation signatures in circulating tumour DNA. For years this field had only observational data; in 2026 that changed.
The NHS-Galleri trial, conducted in the United Kingdom with more than 140,000 participants aged 50-77, is the first and only randomized controlled trial of an MCED test. The results require an honest reading: the trial did not meet its primary endpoint — there was no statistically significant reduction in combined Stage III and IV diagnoses one year after the final screening round compared with standard care. Secondary findings were more encouraging: across successive rounds, Stage IV diagnoses fell by 22% in the second round and 26% in the third, with an overall reduction of roughly 14% (nominally significant) across all three rounds. The test's specificity was 99.55% (a false-positive rate of just 0.45%), the cancer detection rate was 0.48% across three rounds, and the overall positive predictive value was 52%.
A clash of framing should not be concealed here. GRAIL, the test's developer, emphasised a "substantial reduction in Stage IV diagnoses" in its press release, while independent analysts stressed that the trial missed its pre-specified primary endpoint. Both statements are true and derive from the same data — the difference lies in which is foregrounded. An honest appraisal keeps both on the table.
The single-arm PATHFINDER 2 study (approximately 35,878 people in the US and Canada) reported a more than seven-fold increase in cancer detection and 99.6% specificity when Galleri was added to standard screening, with more than half of newly detected cancers at an early stage. However, this study has no control arm and therefore cannot speak to overdiagnosis or mortality. A regulatory point is critical: as of 2026, no MCED test is FDA-approved or reimbursed; Galleri remains available only as a laboratory-developed test.
"Earlier detection" does not always mean "better outcome"
NHS-Galleri showed a stage shift but could not show a reduction in mortality; that data does not yet exist. Overdiagnosis (detecting indolent cancers that would never cause harm), anxiety from false positives, and unnecessary procedures are real harms. Early detection is valuable only when it changes the treatment outcome.
AI-supported mammography: high-quality RCT evidence
The strongest evidence for image-based AI comes from breast screening. The MASAI trial, conducted in Sweden with more than 100,000 women, is the first randomized, population-based RCT of AI-supported mammography, with final results published in The Lancet in 2026. AI-supported reading increased the cancer detection rate by 29% compared with standard double reading, and did so with only seven additional false positives. More importantly, it produced a 12% reduction in interval cancers — those arising between screening rounds — a result that touches the true purpose of screening, namely catching aggressive cancers early. Sensitivity rose from 73.8% to 80.5% while specificity was unchanged (98.5%), and radiologist reading workload fell by 44%. Most of the additionally detected tumours were small, lymph-node-negative invasive cancers.
In the same vein, the MIT/Harvard Sybil model predicts future lung cancer risk from a single low-dose CT scan; one-year AUC values were 0.92 in the NLST cohort and validated at 0.86-0.94 in external cohorts, with validation in 2025 strengthened in East Asian and predominantly Black populations. Even so, prospective clinical trials of Sybil remain in the planning stage.
Risk prediction crosses the regulatory threshold
2025 was the year risk-prediction tools first obtained formal regulatory clearance. Clairity Breast received FDA De Novo authorization as the first tool to predict five-year breast cancer risk from a standard mammogram alone; its AUC was 0.72, well calibrated, developed on more than 1.7 million mammograms, and it was added to NCCN guidelines. Likewise, ArteraAI Prostate received De Novo authorization as the first AI-based prostate prognostic tool, combining digital pathology with clinical data to predict ten-year metastasis and cancer-specific mortality; for distant metastasis it outperformed standard models with a hazard ratio (HR) of 1.54 (95% CI 1.36-1.74).
EHR and ECG-based prediction: the first evidence reaching hard endpoints
The most striking evidence comes from trials showing that AI not only diagnoses but reduces death. A multicentre pragmatic RCT in Taiwan enrolling 15,965 patients showed that an AI alert detecting high mortality risk from the ECG lowered 90-day all-cause mortality from 4.3% to 3.6% (HR 0.83; 95% CI 0.70-0.99) — the primary endpoint was met. The effect was more pronounced in the high-risk ECG subgroup (HR 0.69). This is among the strongest evidence that AI in healthcare can improve a hard endpoint in a randomized trial.
A second trial by the same group, focused on atrial fibrillation (AF), teaches the opposite lesson. The AI-ECG alert for latent AF raised anticoagulant prescribing among non-cardiologists from 12% to 23.3% (HR 1.85) and increased new AF diagnoses; yet it produced no difference in stroke, cardiovascular death, or all-cause death. The process outcomes (prescribing, diagnosis) improved, but the hard clinical endpoint did not. This is the clearest example that boosting prediction is not, on its own, enough.
Meta-analyses of AI early-warning systems for sepsis report an average mortality reduction of roughly 30%, but these studies are small, heterogeneous, and prone to publication bias. Indeed, external validation of the widely deployed Epic Sepsis Model performed poorly: in one study the positive predictive value was only 7.6% and sensitivity 14.7%; in one cohort it missed two-thirds of patients with sepsis while alerting on 18% of all inpatients — a textbook picture of alarm fatigue.
| Technology / Trial | Evidence level | Key result | Hard endpoint? |
|---|---|---|---|
| NHS-Galleri (MCED) | RCT | Specificity 99.55%; Stage IV ↓ 22-26% (secondary) | Primary endpoint not met; no mortality data |
| MASAI (AI mammography) | RCT | Detection ↑ 29%; interval cancer ↓ 12%; sensitivity 80.5% | Yes (interval-cancer reduction) |
| AI-ECG mortality (Taiwan) | Pragmatic RCT | 90-day mortality HR 0.83 | Yes (mortality ↓) |
| AI-ECG / AF (Taiwan) | Cluster RCT | Anticoagulant HR 1.85; new AF diagnoses ↑ | No (stroke/death unchanged) |
| Clairity Breast | De Novo clearance + validation | 5-year risk; AUC 0.72 | No outcome trial yet |
The regulatory and governance landscape
By 2025, the FDA had authorized more than 1,000 AI/machine-learning-based medical devices; a taxonomy of 1,016 authorizations published in npj Digital Medicine shows that 76% of them are in radiology. The vast majority clear via the 510(k) pathway; De Novo accounts for only about 2-3% and PMA for roughly 0.4%. The FDA's January 2025 draft guidance introduces concepts such as total product lifecycle management, post-deployment performance monitoring, and predetermined change control plans for AI-enabled devices. The World Health Organization, in early 2024, published guidance with more than 40 recommendations for large multi-modal models, flagging bias, fabricated misinformation, high-income-country skew, and the need for independent audit.
Honest limits: what is proven, what is not
What is proven is clear: AI-supported mammography can improve screening accuracy and reduce workload while genuinely lowering interval cancers; an AI-ECG alert can reduce 90-day mortality in a selected population; MCED tests can catch cancers with no screening programme and produce a stage shift. Against this, several core questions remain open.
First and foremost, MCED has not been proven to reduce cancer mortality. NHS-Galleri missed its stage-shift primary endpoint, and mortality data do not exist. Measuring overdiagnosis is methodologically hard by its very nature; independent reviews list emotional and procedural harm from false positives, detection of indolent cancers, false reassurance, and health-system burden as "substantial harm." Patient and clinician focus groups call for stronger evidence — mortality and harm data — before routine use.
Second, models "age" after deployment. Clinical AI models lose performance over time owing to shifts in patient demographics, laboratory methods, and hospital type; some benchmarks report a decline of around 20%. A large seven-hospital study documented significant dataset shift. This makes continuous post-deployment monitoring mandatory.
Third, the narrative itself is biased. A cross-sectional study of how these tests are promoted on social media found that of 982 posts, 87.1% mentioned only benefits, just 14.7% mentioned harms, and 6.1% mentioned overdiagnosis; 68% of posts carried a financial interest. The "early detection is always good" narrative is systematically inflated by commercial and social incentives. Finally, the strongest mortality evidence comes from a single country (Taiwan); generalisability is not yet clear.
Conclusion
AI for early detection and disease prediction moved from speculation to evidence in 2025-2026 — but in two directions at once, not one. On one side are real, hard-endpoint gains, such as MASAI's reduction in interval cancers and the Taiwanese AI-ECG trial's mortality benefit. On the other, NHS-Galleri's missed primary endpoint and the AI-ECG/AF trial's lesson that "prescribing rose but death did not change" show a field that is maturing but not yet complete. The right stance is neither uncritical enthusiasm nor wholesale rejection: weigh each tool on its own evidence, on the scale of hard clinical endpoints, read conflicting data side by side, and never mistake "more detection" for "more benefit." For the clinician, the practical conclusion is clear — before adopting an AI tool, the question should not be "how accurately does it predict?" but "with what randomized evidence does it produce a better outcome for the patient?"
References
- GRAIL. Full results from the NHS-Galleri trial: reduction in Stage IV cancer diagnoses (primary endpoint not met). GRAIL Press Release / ASCO 2026. 2026. site
- Patient Care Online. NHS-Galleri data show fewer Stage IV cancers but primary endpoint missed. Patient Care Online. 2026. site
- OncLive. PATHFINDER 2 trial releases positive topline data for multi-cancer early detection test. OncLive. 2025. site
- Hernström V, et al. AI-supported mammography screening (MASAI): final results of a randomized controlled trial. The Lancet. 2026. site
- Sybil external validation cohort: lung cancer risk prediction from a single LDCT. PubMed (ATS 2025). 2025. site
- Pharmacy Times / FDA. De Novo authorization for the five-year breast cancer risk prediction device Clairity Breast. Pharmacy Times. 2025. site
- Urology Times. FDA De Novo authorization for ArteraAI Prostate (first AI-based prostate prognostic tool). Urology Times. 2025. site
- Lin CS, et al. Artificial intelligence-enabled electrocardiography alert intervention and all-cause mortality: a pragmatic randomized clinical trial. Nature Medicine. 2024. site
- Liu CM, et al. AI-ECG detection of latent atrial fibrillation and anticoagulant adoption: a pragmatic cluster-randomized trial. Journal of the American Heart Association. 2025. site
- Snead R, et al. Overdiagnosis and harms of multi-cancer early detection tests: a review. The Permanente Journal. 2026. site
- Nickel B, et al. Bias in the promotion of medical tests on social media (analysis of 982 posts). JAMA Network Open. 2025. site
- npj Digital Medicine. A taxonomy of FDA-authorized artificial intelligence/machine-learning medical devices (1,016 authorizations, 76% radiology). npj Digital Medicine. 2025. site
- JAMIA Open. External validation of the Epic Sepsis Model in the emergency department. JAMIA Open. 2024. site