Artificial Intelligence in Obstetrics: Automated Fetal Biometry and CTG Interpretation

Creator: Cem Akaltun, MD
Published: 2026-05-26

AI now measures fetal biometry at expert level and shortens scan time, yet there is still no randomized evidence that it improves clinical outcomes in cardiotocography.

By Cem Akaltun, MD · June 9, 2026Updated · ~12 min read Obstetrics Fetal Ultrasound CTG / NST

Until recently, artificial intelligence (AI) in obstetric imaging was discussed largely at the level of promise. By 2025-2026, however, it has matured into a genuine clinical domain backed by concrete trial evidence and regulatory clearances. The progress in automated fetal biometry and standard-plane recognition is particularly striking. Intrapartum cardiotocography (CTG) interpretation, by contrast, remains far more cautious: technical accuracy is high, but robust proof that AI directly improves infant outcomes is still lacking. This article reviews the current evidence in both areas, avoiding hype and preserving a clear distinction between what AI has achieved and what it has not.

Automated Biometry: Now at Expert-Level Accuracy

The single most important development is the long-awaited arrival of randomized clinical-benefit evidence. The PROMETHEUS randomized trial (NEJM AI, March 2025), led by a King's College London team, was conducted at a single centre with 78 pregnant women and 58 sonographers. Here the AI did not make diagnoses; it recognised and captured 13 standard planes and measured four key biometric parameters. The results were notable: sensitivity for detecting fetal malformations rose to 88.9% in the AI-assisted arm (versus 81.5% with standard scanning), and specificity to 98.0% (versus 92.2%). Even more striking, scan time fell from 19.7 to 11.4 minutes (about a 42% reduction) and the operator's cognitive load (NASA-TLX scale) dropped from 46.5 to 35.2 (about a 24% reduction). Because the AI evaluated thousands of frames per measurement, it surpassed human operators in reproducibility.

This finding does not stand alone. A video-based deep-learning study from Strasbourg (J Gynecol Obstet Hum Reprod, 2025; 281 videos) measured the mean absolute relative error between AI and experts at 0.96% for head circumference, 1.56% for abdominal circumference, 1.77% for femur length and 3.10% for estimated fetal weight. The AI performed significantly better than inexperienced sonographers in particular — suggesting that the technology's real value may lie in reducing inter-operator variability.

Standard-Plane Recognition: High Success, But the Heart Remains the Weak Link

Overall performance in automated plane classification is high, but the picture must be read with nuance. A prospective cohort study from Alexandria (Int J Gynaecol Obstet, 2025; 772 participants, 11,823 planes) reported, using the BioticsAI system across 18 ISUOG standard planes, an overall sensitivity of 89.9%, a specificity of 77.4%, and a 44% gain in reporting speed (4.02 versus 7.16 minutes). However, while accuracy exceeded 98% for the femur and abdomen, cardiac and craniofacial planes were markedly weaker: sensitivity for the three-vessel-trachea view was only 62.7%, with a negative predictive value of 30.8%. This confirms that imaging the fetal heart remains where automation struggles most.

FDA Clearance Is Not the Same as Clinical Benefit

Most US devices are cleared via the 510(k) pathway, which rests on substantial equivalence to an existing device and does not require a randomized outcome trial. Therefore the label "FDA-cleared" does not, on its own, mean that a clear clinical benefit to patients has been demonstrated — the two must be kept distinct.

AI in Anomaly Screening: Boosting Reader Performance

One of the most mature products in anomaly screening, Sonio Suspect (FDA 510(k) clearance, 2023), flags eight common fetal anomalies using seven routine images from the heart, brain and abdomen. In a 47-site multi-reader study, the area under the curve (AUC) for anomaly detection rose from 69% to 91% (a 22-point gain, p<0.001), and this effect was independent of clinician experience and patient body mass index.

A systematic review and meta-analysis focused on fetal cardiac anomalies (eClinicalMedicine, May 2025; 15 studies, 30,121 fetuses) reported a pooled sensitivity of 0.89 (95% CI 0.83-0.93) and specificity of 0.91 (95% CI 0.84-0.95) for distinguishing normal from abnormal hearts. These figures, however, must be read carefully: most studies were retrospective, carried a moderate-to-high risk of bias on QUADAS-2 assessment, and showed low reporting quality (median TRIPOD+AI adherence of 53%). Heterogeneity was also high (I²≈78%). In short, a promising overall picture, but an evidence base that demands methodological caution.

CTG / Electronic Fetal Monitoring: High Accuracy, Unproven Benefit

Here the evidence landscape is fundamentally different, and an honest framing is essential. The field's key reference, the INFANT randomized trial (Lancet, 2017; 47,062 women), showed that computerised CTG interpretation did not reduce poor neonatal outcomes: the adverse-outcome composite was 0.7% in both arms (adjusted RR 1.01; 95% CI 0.82-1.25), with no difference in neurodevelopmental scores at two years either. This landmark negative result remains valid despite the intervening years.

By contrast, newer deep-learning models report high technical accuracy. A nationwide study covering 14 hospitals in Korea (Scientific Reports, 2025; 22,522 deliveries) reached an internal AUC of 0.880 and external-validation values of 0.862-0.895 for distinguishing normal from abnormal CTG. A dual-modal model from China (Front Physiol, 2025) combined fetal heart rate with uterine contractions to push the AUC to 0.944. Yet even these authors explicitly state that clinical applicability remains to be evaluated in future prospective studies.

At this point it is right to present the conflicting evidence side by side, rather than reducing it to a single source:

Source	Design	Key finding	What it shows
INFANT (Lancet, 2017)	Randomized, 47,062 women	Poor neonatal outcome 0.7% in both arms (RR 1.01)	No improvement in patient outcome
Nationwide DL (Sci Rep, 2025)	Multicentre cohort, 22,522 births	AUC 0.862-0.895	High diagnostic accuracy (not outcome)
Dual-modal (Front Physiol, 2025)	Single centre, n=326	AUC 0.944	Technical promise, validation lacking

The difference is essentially conceptual: diagnostic accuracy (how well a model recognises pathological traces) and patient-outcome evidence (whether that recognition actually yields less asphyxia, fewer caesareans or better neurodevelopment) are not the same thing. An expert review published in BJOG in 2025 makes this self-criticism explicit: after 50 years, evidence that intrapartum CTG prevents morbidity and mortality remains uncertain; its high false-positive rate and low specificity contribute to unnecessary caesareans; and many obstacles must be overcome before deep-learning methods enter clinical practice.

The Access Dimension: "Blind Sweep" and Low-Resource Settings

Perhaps the most exciting new direction is AI tools that untrained users can operate. Butterfly Network's "blind-sweep" gestational-age tool received FDA 510(k) clearance in March 2026 — the first blind-sweep FDA clearance in the field. An untrained health worker performs six guided abdominal sweeps with a handheld device without even looking at the screen; the system produces, within under two minutes and across 16-37 weeks, a result equivalent to that of an experienced sonographer. In the underlying NEJM Evidence validation study, the mean absolute error in novice hands was 4.9 days with the model method versus 5.4 days with biometry. Similarly, a deep-learning model that requires no image interpretation (npj Digital Medicine, 2025) estimated gestational age with errors of 1.7 days at 14-18 weeks and 2.8 days at 18-24 weeks. This approach could genuinely open a door to antenatal care in regions where access to sonographers is limited.

The Core Constraint: Domain Shift and External Validation

The shared, critical limitation of all these results is how models behave in a setting different from the one in which they were trained. A multicentre benchmark study from a UCL team (Scientific Reports, 2026; 4 centres, 7 devices, 4,513 images) demonstrated that a model trained and tested at a single centre appears overly optimistic compared with multicentre testing — concrete evidence of the "domain shift" phenomenon. The Intrapartum Ultrasound Grand Challenge (Med Image Anal, 2026) likewise stresses that the results are promising but the research is at an early stage and requires in-depth scrutiny before clinical use.

The Authority Framework: The ISUOG Position Statement

The International Society of Ultrasound in Obstetrics and Gynecology (ISUOG) issued an official artificial intelligence position statement in November 2025. The statement mandates, before any AI system is deployed: demonstration of diagnostic performance, assessment of workflow impact, clinical safety, population- and user-specific testing, transparency regarding the demographic representativeness of the training dataset, and post-deployment surveillance. ISUOG also proposes a risk stratification: low-risk administrative tools, medium-risk human-supervised decision support, and high-risk autonomous systems. This framework shows that the field has moved from "does it work?" to "how is it integrated safely and equitably?"

Conclusion

The picture in obstetric AI is clear enough to leave room for neither hype nor dismissal. Automated biometry and standard-plane detection now deliver expert-level accuracy and reproducibility that surpasses experts, measurably reducing scan time and clinician workload; AI-assisted anomaly screening can meaningfully improve reader performance; and blind-sweep tools carry real access potential for low-resource settings. By contrast, there is still no prospective randomized evidence that AI improves neonatal outcomes in CTG/electronic fetal monitoring; high AUC values demonstrate diagnostic accuracy but do not prove patient benefit. Most biometry studies are single-centre with selected cohorts, domain shift weakens external validity, and the cardiac-anomaly evidence carries a moderate-to-high risk of bias. In clinical practice, the sensible approach is to position AI alongside the experienced clinician rather than in their place — as a quality and efficiency tool — and to adopt it cautiously with the population-specific validation and post-deployment monitoring that ISUOG recommends. The technology is maturing rapidly; yet critical vigilance against automation bias, inequity, and the "clearance ≠ benefit" trap is at least as important as the technology itself.

References

Day TG, Matthew J, et al. AI-assisted fetal anomaly screening (PROMETHEUS RCT). NEJM AI. 2025. site
INFANT Collaborative Group. Computerised interpretation of fetal heart rate during labour (INFANT). The Lancet. 2017. site
Lovers AAA, et al. Advancements in fetal heart rate monitoring. BJOG. 2025. site
AI-enabled prenatal ultrasound for fetal cardiac abnormalities: systematic review & meta-analysis. eClinicalMedicine. 2025. site
Park J, et al. Automated interpretation of CTG using deep learning (nationwide multicenter). Scientific Reports. 2025. site
Goetz-Fu A, et al. Deep learning for automatic fetal biometry from ultrasound videos. J Gynecol Obstet Hum Reprod. 2025. site
Atia H, et al. Computer vision AI for second-trimester anomaly plane classification (BioticsAI). Int J Gynaecol Obstet. 2025. site
Zhang Y, et al. 2D vs 3D AI-enhanced ultrasound for crown-rump length (3DCRL-Net). BMC Pregnancy Childbirth. 2025. site
Di Vece C, et al. Multicentre benchmark dataset for landmark-based fetal biometry (domain shift). Scientific Reports. 2026. site
ISUOG. Artificial Intelligence Positioning Statement. ISUOG. 2025. site
Butterfly Network. Blind-sweep gestational age tool FDA 510(k) clearance. BusinessWire. 2026. site
Sonio. Suspect FDA 510(k) clearance for fetal anomaly detection. Sonio. 2023. site

Disclaimer: This content is for general informational and educational purposes only and does not substitute for medical advice, diagnosis, or treatment. The AI tools mentioned are for research and/or support purposes; final diagnostic and treatment responsibility lies with the physician. Consult your physician for decisions regarding your pregnancy.