AI for Embryo Selection in IVF: What It Has Achieved, and What It Has Not
AI makes embryo grading roughly 10 times faster and more consistent; yet two independent randomized trials failed to show superiority over morphology for live birth or clinical pregnancy. The evidence says "faster and more standardized," not "more babies."
One of the most consequential decisions in in vitro fertilization (IVF) is choosing which embryo to transfer. For decades this choice has rested on morphological assessment by an embryologist at the microscope: blastocyst expansion, inner cell mass quality, and trophectoderm grade. In recent years, deep-learning artificial intelligence (AI) models entered this process, automatically scoring embryos from time-lapse sequences or static blastocyst images. The promise was bold: a selection that is more objective, more consistent, and perhaps more successful than the human eye. But does the evidence live up to it? The data from 2024-2026 offer a clear and honest answer — one that diverges sharply from the marketing narrative.
The landmark trial: AI did not beat morphology
The turning point for the field is a multicenter, double-blind, randomized controlled trial published in Nature Medicine in 2024 (Illingworth et al.). Across 14 centers in Australia and Europe, 1,066 patients were randomized to embryo selection by the deep-learning model iDAScore or by standard manual morphology. Designed to test a noninferiority hypothesis, this remains the strongest prospective evidence in the field.
The results were unambiguous. The clinical pregnancy rate was 46.5% in the AI arm versus 48.2% with morphology; the risk difference was −1.7 percentage points (95% CI −7.7 to +4.3). Because the lower bound of the confidence interval crossed the prespecified noninferiority margin (−5%), AI could not even be shown to be noninferior to morphology. For the secondary endpoint, the live birth rate was 39.8% with AI versus 43.5% with morphology (risk difference −3.9 points, 95% CI −9.9 to +2.2; p=0.24) — no statistical difference, and certainly no trend favoring AI.
A notable subgroup signal emerged: while AI performed marginally better in fresh transfers (RR 1.08), it performed distinctly worse in freeze-all cycles (clinical pregnancy 49.5% vs 61.3%; interaction p=0.032). So, far from superiority, a possible signal of harm appeared in frozen embryo transfers. Furthermore, the embryologist and the AI chose the same embryo in only 65.8% of cases — meaning a different embryo was prioritized in roughly one-third of cases, yet pregnancy rates remained similar. This strongly suggests that picking "the single best embryo" did not change the outcome.
The one proven advantage: speed
In the same trial, the only domain where AI clearly won was assessment time: 21.3 seconds per embryo (AI) versus 208.3 seconds (manual) — an approximately 10-fold reduction (p<0.001). AI offers "faster and more consistent assessment," not "more babies."
A second independent trial points the same way
It is important to recognize that these findings are not unique to a single center or product. The Alife Health "LOTUS" randomized trial (440 patients across 7 US centers) tested the prediction of ongoing pregnancy from static blastocyst images. In a published interim analysis (first 100 patients), the ongoing pregnancy rate was 69% in the AI arm versus 64% in the control arm (+5 points, p=0.5 — not statistically significant). The full peer-reviewed results are not yet published; however, the failure of two independent randomized trials to demonstrate superiority indicates this is no fluke.
Regulatory clearances are fast, but clearance ≠ proof
While the clinical evidence has remained neutral, the commercial market has moved quickly. Fairtility CHLOE Blast received FDA 510(k) clearance in 2025, positioned as the "first FDA-cleared machine-learning-based embryo evaluation clinical decision-support software." Alife Health Embryo Predict obtained FDA clearance on May 28, 2026, and is now on the market in the US, EU, and UK alongside a CE Mark. A crucial nuance applies here: these clearances are for "embryo assessment / decision support" — they do not establish, in regulatory terms, that the tools increase live birth. Clearance shows a device works safely and as intended; it does not demonstrate clinical superiority.
Ploidy (chromosome) prediction: moderate, and no substitute for PGT-A
Another hope for AI is predicting an embryo's chromosomal status (euploid versus aneuploid) from images — raising the question of whether it could replace invasive PGT-A biopsy. A 2024 systematic review and meta-analysis in eClinicalMedicine (20 studies, 6,879 embryos) found a pooled AUC of 0.80 (sensitivity 0.71; specificity 0.75). But there is a critical detail: image-only models reached an AUC of just 0.62, whereas adding maternal age and clinical data to the image raised the AUC to 0.71 — meaning much of the discriminative power comes not from the AI itself but from clinical context. Heterogeneity was very high (I²≈97%), and only 7 of 20 studies had external validation. The authors' clear conclusion: AI cannot replace invasive methods (PGT-A) and serves only as a decision-support tool.
| Task / endpoint | Performance | Study / evidence |
|---|---|---|
| Clinical pregnancy (AI vs morphology) | 46.5% vs 48.2% (diff −1.7; noninferiority not shown) | Illingworth 2024, RCT |
| Live birth (AI vs morphology) | 39.8% vs 43.5% (diff −3.9; p=0.24) | Illingworth 2024, RCT |
| Assessment time | 21 s vs 208 s (~10× faster) | Illingworth 2024, RCT |
| Ongoing pregnancy (AI vs control) | 69% vs 64% (p=0.5) | Alife LOTUS, interim |
| Ploidy prediction | AUC 0.80 (image-only 0.62) | Xin 2024, meta-analysis |
| General embryo selection (diagnostic) | AUC 0.70 (sensitivity 0.69) | Diagnostic meta-analysis 2025 |
What has been proven, and what has not
Proven: AI makes embryo assessment much faster (~10-fold) and produces a consistent, reproducible score that is independent of the assessor; by reducing inter-observer variability, this is a genuine gain for workflow and standardization. It can also rank embryos with an accuracy comparable to morphology (implantation AUC ~0.70; ploidy AUC ~0.80).
Not proven: No superiority over morphology in live birth or clinical pregnancy has been demonstrated — indeed, the landmark trial could not even show noninferiority. AI cannot replace PGT-A. Most importantly, no effect on cumulative live birth has been shown: since all of a patient's usable embryos are eventually transferred, "choosing the best first" may shorten time-to-pregnancy but is not expected to increase the total number of babies. This endpoint has not yet been tested in a randomized trial.
Ranking ≠ creating a better embryo
AI re-ranks the existing pool of embryos; it does not change the biological quality of that pool. The birth potential per patient is ultimately constrained by embryo biology. This conceptual distinction explains why the expectation that "AI increases pregnancy" is so often unmet.
Methodological and ethical risks to keep in mind
Data shift and lack of external validation: Nearly all studies rely on proprietary datasets, with AUC fluctuating between 0.60 and 0.75 across centers and subgroups. Models trained on high-income-country data may not generalize to different laboratories and populations. Publication bias: the abundance of impressive retrospective results alongside neutral/negative prospective trials is a classic positive-study bias pattern. Automation bias and deskilling: the risk that embryologists over-trust the AI score and abandon independent assessment is real. The "black box" problem: although heat-map studies show the model focuses on morphology-like features, its clinical value is unproven — which is precisely why new randomized trials using interpretable AI have been launched.
Another example demands the honest, side-by-side presentation of conflicting evidence: some retrospective studies reported a signal of sex-ratio bias (a skew toward males) in high-AI-score embryos; however, this was not confirmed in Illingworth's prospective randomized trial (no difference in male/female ratio). Finally, conflict of interest: the landmark trial was funded by a manufacturer (Vitrolife), and commercial clearances rely largely on manufacturer data — which only heightens the value of independent verification.
Where is the field heading?
The encouraging news is that the field has now turned toward prospective evidence. A randomized trial protocol in China comparing interpretable AI with conventional morphology (n=1,100, primary endpoint ongoing pregnancy) exemplifies this shift. Such trials aim to provide the embryologist with a justified, explainable decision aid rather than a "black-box score," and to measure real clinical endpoints. In the coming years the key question will no longer be "is AI better than morphology?" but rather "in which patient subgroup, for which endpoint, does it genuinely help?"
Conclusion
At the current level of evidence, AI in embryo selection is a valuable tool for efficiency and standardization — it speeds assessment roughly 10-fold, improves inter-observer consistency, and ranks embryos with accuracy comparable to morphology. Yet, as two independent randomized trials have shown, there is no evidence that it increases live birth or pregnancy rates; the landmark trial failed to establish even noninferiority, with a possible harm signal in freeze-all cycles. AI does not replace PGT-A, and its effect on cumulative live birth remains untested. The honest framing is clear: this technology is a powerful assistant that accelerates and standardizes the embryologist's work — not a replacement for it, nor a means of magically improving the chance of pregnancy. In clinical use, manufacturer-independent prospective evidence, transparent subgroup analyses, and balanced patient counseling are indispensable. Good blastocyst morphology remains, for now, "the standard to beat."
References
- Illingworth PJ et al. Deep learning versus manual morphology-based embryo selection in IVF: a randomized, double-blind noninferiority trial. Nature Medicine. 2024. site
- Sakkas D. The 'golden fleece of embryology' eludes us once again: a recent RCT using artificial intelligence reveals again that blastocyst morphology remains the standard to beat. Human Reproduction. 2025. site
- Xin X et al. Non-invasive prediction of human embryonic ploidy using artificial intelligence: a systematic review and meta-analysis. eClinicalMedicine. 2024. site
- Predicting pregnancy outcomes in IVF cycles: a systematic review and diagnostic meta-analysis of artificial intelligence in embryo assessment. Contraception and Reproductive Medicine. 2025. site
- Mrugacz G et al. Noninvasive Preimplantation Genetic Testing in Recurrent Pregnancy Loss and Implantation Failure: Breakthrough or Overpromise? Cells. 2025. site
- Wang S et al. Blastocyst selection through an interpretable artificial intelligence method versus traditional morphology grading: study protocol for a randomised controlled trial. BMJ Open. 2025. site
- Alife Health LOTUS interim analysis. Evaluation of the effect on ongoing pregnancy rate of using artificial intelligence for embryo prioritization: an interim analysis of a prospective RCT. Fertility and Sterility. 2024. site
- Alife Health. Receives FDA Clearance for AI-Powered Embryo Assessment (Embryo Predict). PR Newswire. 2026. site
- Fairtility. CHLOE Blast Achieves U.S. FDA 510(k) Clearance. Femtech Insider. 2025. site
- AI-based live birth prediction in IVF cycles: a systematic review without meta-analysis of model performance and validation. Middle East Fertility Society Journal. 2026. site
- Current progress and open challenges for applying artificial intelligence across the in vitro fertilization cycle. Patterns (Cell Press). 2025. site