Artificial Intelligence in Healthcare: Regulation and the Future

Creator: Cem Akaltun, MD
Published: 2026-05-26

By 2025-2026, AI in healthcare has reached a threshold where the regulatory framework is maturing, yet clinical evidence still rests on a "process gains are real, hard-outcome proof is thin" balance. This piece presents the most current regulation and evidence without hype.

By Cem Akaltun, MD · June 9, 2026Updated · ~12 min read Regulation FDA EU AI Act

Artificial intelligence (AI) in healthcare is no longer a future promise; it is a reality embedded in clinical practice. Yet this reality has two faces. On one hand, regulators are rapidly maturing the rules and the number of cleared devices is breaking records. On the other, randomized controlled trials (RCTs) are showing us, with equal clarity, both what the technology has achieved and what it has not yet achieved. This article examines the most current regulatory developments and clinical evidence as of June 2026, within a measured and balanced frame. The aim is neither blind optimism nor categorical pessimism, but fidelity to the numbers on the ground.

The Regulatory Landscape: Rules Are Maturing

The fastest-changing dimension of healthcare AI is regulation. Over the past two years, three major authorities — the FDA in the United States, the European Union, and the World Health Organization (WHO) — have markedly updated their frameworks.

FDA: A record clearance year and the "learning device" problem

According to FDA data, a cumulative total of 1,451 AI-enabled medical devices received marketing authorization from 1995 through the end of December 2025. The number cleared in 2025 alone broke records at 295 (compared with 253 in 2024 and 221 in 2023). Roughly 76% of these devices are in radiology — image interpretation remains AI's most mature clinical application area.

Perhaps a more consequential development is the FDA's final guidance on Predetermined Change Control Plans (PCCP), published on 3 December 2024. This guidance addresses a fundamental tension between "learning" algorithms and classic regulatory logic: if a device updates itself after reaching the market, does each update require fresh clearance? The PCCP allows the manufacturer to update the device within predefined boundaries (Description of Modifications, Modification Protocol, Impact Assessment). The final guidance broadened its scope relative to the draft to cover all AI-enabled devices, not only machine learning, and introduced a transparency (labeling) requirement for users. In August 2025, the FDA, Health Canada, and the UK's MHRA strengthened international harmonization by publishing five guiding principles in this area.

The first regulatory signals for generative AI (large language models) have also arrived. At its second meeting on 6 November 2025, the FDA's Digital Health Advisory Committee took up prescription large-language-model-based mental health (therapy chatbot) devices. The committee's recommendations included routing to qualified human intervention during a crisis (human oversight), a PCCP and a performance-monitoring plan in the premarket submission, and double-blind RCTs to account for the placebo response in psychiatry.

The EU AI Act: A significant timeline change

The most critical update to the current picture has come from Europe. On 19 November 2025, the European Commission published its "Digital Omnibus on AI" proposal, postponing the application timeline for high-risk obligations. Under the previous plan, use-based high-risk obligations (Annex III) were to take effect on 2 August 2026, and obligations for AI embedded in CE-marked regulated products — that is, AI-enabled medical devices (Annex I) — on 2 August 2027. The new proposal pushes these dates to 2 December 2027 and 2 August 2028, respectively.

The key point to note is this: the delay merely shifts the application date; it does not remove the substance of the obligations. AI-enabled medical devices remain in the high-risk class, and the obligations for data governance, transparency, human oversight, risk management, and post-market monitoring all stand. Moreover, this is still at the proposal/provisional political agreement stage; formal adoption will be finalized upon publication in the Official Journal.

WHO: A global framework for multimodal models

On 18 January 2024, the WHO set out the global ethical framework with its guidance on large multimodal models (LMMs). The document, which contains more than forty recommendations, defines five application areas in health (diagnosis/clinical care, patient-guided use, administrative tasks, medical education, and scientific research) and lists the documented risks: incorrect, biased, or incomplete output; "automation bias"; and data privacy. This document remains the primary global framework for language models.

Three authorities, one direction

The FDA addresses the "learning device" through the PCCP, the EU addresses "high risk" through continuous monitoring, and the WHO addresses "automation bias" through human oversight. All three converge on a single point: AI is not a product to be approved once and forgotten, but a process that must be continuously monitored, audited, and kept under human oversight.

Clinical Evidence: Where Does It Work?

As important as regulation is the real question: does AI actually work in the clinic? Here the level of evidence varies markedly by domain.

The most mature evidence is in colonoscopy. A 2024 systematic review and meta-analysis in Annals of Internal Medicine, covering 28 RCTs and 23,861 participants, showed that AI-assisted colonoscopy markedly increased the adenoma detection rate (ADR): RR 1.20 (95% CI 1.14–1.27). The adenoma miss rate fell by more than half, RR 0.45 (95% CI 0.37–0.54). This is a robust, replicated finding with direct clinical meaning.

Cardiology offers another strong example. A pragmatic RCT in Taiwan enrolling 13,631 patients showed that AI-based detection of low ejection fraction (heart failure) from the ECG increased the rate of new diagnoses: overall hazard ratio (HR) 1.50 (95% CI 1.11–2.03, p=0.023). Notably, this was achieved without unnecessarily increasing echocardiography use; the positive predictive value for echocardiography rose from 20.2% to 34.2%.

In diabetes management, an AI-based insulin titration system (iNCDSS) was found non-inferior to senior endocrinologists in a multicenter RCT of 149 patients: time in the target glucose range was 76.4% versus 73.6% (difference +2.7%; the non-inferiority margin was met), with no difference in adverse events. In conversational diagnostic AI, Google's AMIE system was compared with 20 primary care physicians across 159 scenarios in a randomized, double-blind study published in Nature in 2025, and was rated equal or superior to physicians on 30 of 32 axes by specialist raters. The authors, however, emphasize an important limitation: the study was conducted solely via text chat — an environment that differs from a real clinical encounter.

The most widespread real-world deployment of 2025 was ambient AI medical scribes. A quality-improvement study across six U.S. health systems reported meaningful reductions in burnout, cognitive task load, and documentation time; physicians using AI spent 8.5% less time in the EHR and more than 15% less time on note writing. Even so, there is a caveat: the clinician remains responsible for verifying the note, and some report spending time "correcting rather than writing."

Limitations: What AI Has Not Yet Achieved

An honest assessment must make the limitations as visible as the successes. The most striking findings here concern direct patient outcomes.

Evidence on hard clinical outcomes remains thin. In a stepped-wedge cluster RCT covering 200,354 visits across four emergency departments, an AI clinical decision support system for pediatric septic shock did not change the rate of administering antibiotics plus fluids within one hour at all: 39.0% in the intervention arm versus 38.9% in the control arm (adjusted OR 1.07; 95% CI 0.61–1.88; no significant difference). In other words, the system could not even improve the process metric. This is an important illustration of how limited positive RCT evidence is for hard outcomes such as mortality.

Human-AI integration is weaker than expected. A single-blind RCT of 50 physicians published in JAMA Network Open in 2024 found no difference in diagnostic reasoning scores between physicians with access to a language model and those using conventional resources (+2 points; 95% CI −4 to +8; p=0.60). The striking part: the language model alone scored 16 points higher than the conventional resource (95% CI 2–30; p=0.03). The problem, then, is not the capability of AI but how humans use it. Against this stands a conflicting finding: a GPT-4 RCT of 92 physicians published by the same researchers in Nature Medicine in 2025 reported a significant improvement this time (+6.5 points). The two results must be read side by side — the integration problem is not unsolvable, but it has not yet been solved consistently.

Deskilling — a new and sobering warning. A multicenter observational study published in Lancet Gastroenterology & Hepatology in 2025, conducted with 19 experienced endoscopists in Poland, showed that after regular AI exposure, the ADR in colonoscopies performed without AI fell from 28.4% to 22.4% — a 6-percentage-point absolute and 20% relative decline. The authors describe this as "the first study to demonstrate a negative effect of AI exposure on a patient-relevant outcome in medicine." Let us frame it honestly: the study is observational, retrospective, and single-country; it does not establish causation. But it raises a hypothesis of high importance, and as of June 2026 there is no retraction or correction notice in PubMed.

The Epic Sepsis Model lesson: The real-world validation gap

A widely deployed sepsis early-warning model yielded an AUC of only ~0.63 on independent external validation; in one study it missed roughly two-thirds of sepsis patients and generated many false alarms. On 2024 emergency-department data, sensitivity was 14.7%, specificity 95.3%, and PPV 7.6% within a 6-hour window; the developer subsequently withdrew its claims. By contrast, an FDA-authorized next-generation sepsis model was published in NEJM AI with prospective, multicenter development and validation. The two examples, side by side, capture a lesson: the gulf between "single-center development + weak external validation" and "prospective/multicenter + regulatory clearance" is the difference between a model being marketable and being trustworthy.

Equity, bias, and the reporting gap. In a review of FDA-cleared AI clinical decision support tools, none included a bias assessment; most radiology AI studies are single-institution development efforts, and demographic subgroup performance is rarely reported. Reporting standards such as TRIPOD-AI, DECIDE-AI, and CONSORT-AI call for external validation and subgroup transparency, yet adherence remains low. In addition, clinical AI models can lose performance over the years after deployment (distribution shift); it is no coincidence that both the FDA and the EU emphasize continuous post-market monitoring for this reason.

Process Improvement or Patient Benefit? A Summary of the Evidence

To summarize the current evidence in a single sentence: AI delivers strong, replicated improvements in process and diagnostic-efficiency metrics (adenoma detection, heart failure diagnosis, documentation time), but positive RCT evidence on hard clinical outcomes (mortality, progression to shock) remains limited. Moreover, a substantial portion of the evidence comes from simulated or vignette settings; real-world prospective outcome studies are in the minority. To this is added publication bias: while positive AI studies are more likely to be published, registered but unpublished studies may make the picture appear more optimistic than it truly is.

Application area	Evidence type	Effect size	Outcome type
Colonoscopy (ADR)	Meta-analysis of 28 RCTs	RR 1.20 (1.14–1.27)	Process (detection)
Low EF diagnosis via ECG	Pragmatic RCT (n=13,631)	HR 1.50 (1.11–2.03)	Diagnostic efficiency
Insulin titration	RCT (n=149)	+2.7%, non-inferior	Process (glucose control)
Pediatric septic shock CDS	Cluster RCT (200,354)	aOR 1.07 (0.61–1.88), NS	Process + hard outcome
Language-model diagnostic aid	RCT (50 physicians)	+2 points, NS	Diagnostic reasoning
Deskilling	Observational (19 endoscopists)	ADR 28.4%→22.4%	Patient-relevant (warning)

A Forward View: Concrete Turning Points

Rather than abstract optimism about the future, it is more illuminating to look at the concrete turning points of 2025. Ambient AI scribes, as the most widespread real-world deployment, are visibly reducing physicians' administrative burden — and perhaps AI's first great "quiet revolution" is occurring not in making diagnoses but in freeing the physician from the keyboard. Conversational diagnostic systems (AMIE) yield impressive results in the lab but await real-world validation. The deskilling signal and the Epic Sepsis Model lesson, meanwhile, remind us that the future is a matter not merely of "a better algorithm" but of "better integration, stricter validation, and sustainable human oversight."

On the regulatory side the direction is clear: a shift from static clearance to continuous life-cycle oversight. The PCCP's "learning device" logic, the EU's emphasis on post-market monitoring, and the WHO's warning about automation bias all point to the same truth: AI is not a device to be installed once and forgotten, but a process that must be continuously monitored. In the coming years, it is reasonable to anticipate that the regulatory framework for generative AI and language models will take shape rapidly, and that high-risk applications such as mental health chatbots will come under priority scrutiny.

Conclusion

As of 2026, artificial intelligence in healthcare is a place for neither exaggerated promises nor wholesale rejection. The regulatory framework is maturing rapidly: the FDA is clearing record numbers of devices and answering the "learning device" problem through the PCCP; the EU is delaying its high-risk timeline but preserving the obligations; and the WHO is consolidating the global ethical framework. On the clinical side the picture is balanced: there are robust, replicated improvements in areas such as colonoscopy, cardiology, and documentation; but evidence on hard clinical outcomes is thin, human-AI integration is fragile, and new risks such as deskilling are on the agenda. The right stance is to weigh each claim against its own level of evidence; to read conflicting findings (for and against integration, benefit and deskilling) side by side; and, while celebrating what the technology has achieved, not to conceal what it has not yet achieved. Artificial intelligence is transforming medicine — but this transformation must be built on augmenting the physician rather than replacing them, and on always advancing carefully, audited, and under human oversight.

References

U.S. Food and Drug Administration. Artificial Intelligence-Enabled Medical Devices (list). FDA. 2025–2026. site
U.S. Food and Drug Administration. Marketing Submission Recommendations for a Predetermined Change Control Plan for AI-Enabled Device Software Functions (final guidance). FDA. 2024. site
European Commission. Digital Omnibus on AI (high-risk postponement proposal). European Commission Digital Strategy. 2025. site
World Health Organization. Ethics and governance of artificial intelligence for health: large multi-modal models. WHO. 2024. site
Aziz M, et al. AI-Assisted Colonoscopy for Polyp Detection: A Systematic Review and Meta-analysis. Annals of Internal Medicine. 2024. site
Budzyń K, et al. Endoscopist deskilling risk after exposure to artificial intelligence in colonoscopy: a multicentre, observational study. Lancet Gastroenterology & Hepatology. 2025. site
Goh E, et al. Large Language Model Influence on Diagnostic Reasoning: A Randomized Clinical Trial. JAMA Network Open. 2024. site
Tu T, et al. Towards conversational diagnostic artificial intelligence (AMIE). Nature. 2025. site
Tsai DJ, et al. AI-assisted diagnosis of low ejection fraction using ECG: a pragmatic randomized controlled trial. BMC Medicine. 2025. site
Ying Z, et al. Real-Time AI-Assisted Insulin Titration System for Glucose Control in Type 2 Diabetes: A Randomized Clinical Trial. JAMA Network Open. 2025. site
Scott HF, et al. Machine Learning Clinical Decision Support for Septic Shock in the Emergency Department: A Cluster Randomized Trial. Pediatrics. 2025. site
U.S. Food and Drug Administration. Digital Health Advisory Committee — Generative AI-Enabled Digital Mental Health Devices. FDA. 2025. site

Disclaimer: This content is for general informational and educational purposes only. Regulatory dates and classifications can change; EU AI Act compliance dates are projected/proposed and may be updated as the legislative process unfolds. For current and binding information, consult the official publications of the relevant regulatory authorities.