AI in Genomics and Personalized Medicine: What It Achieved and What It Hasn't by 2026
AI has largely proven its promise of reading disease risk and variant effect from the genome at the laboratory level; yet prospective clinical outcomes, equity, and the "black box" problem remain unsolved.
As the cost of genome sequencing has fallen, the real bottleneck is no longer the sequence itself but its interpretation. A single human exome harbors thousands of variants, most of which are classified as "variants of uncertain significance" (VUS). Artificial intelligence (AI) entered precisely this gap: it promises to predict which variant is harmful, who is genetically predisposed to which disease, and how a given patient will respond to a given drug. The years 2025 and 2026 saw part of this promise become concrete at the laboratory level, while another part still awaits evidence. This article examines the field without hype, through an honest "what it achieved / what it has not" lens.
Variant interpretation: foundation models take the stage
Predicting whether a missense variant (a mutation that changes a single amino acid in a protein) is harmful is one of the oldest problems in clinical genetics. Beginning with classical tools such as SIFT and PolyPhen, this effort advanced markedly with deep learning. A comprehensive 2025 review lays out a traceable performance hierarchy: where SIFT/PolyPhen reach ROC-AUC values of roughly 0.80–0.83, tools such as CADD and REVEL climb to ~0.90–0.91, and DeepMind's AlphaMissense reaches ~0.94.
AlphaMissense has indeed shown strong performance in clinical validation. Tested against expert ACMG/AMP curation across 5,845 missense variants in 59 genes, it achieved 92% sensitivity and 78% specificity, and outperformed a competing model, EVE, in 26 of the 34 ACMG genes examined (77%). For VUS with no prior computational evidence, the evidence weight of 878 variants shifted.
But there is another side to the coin. In 2025, an independent assessment in a neuromuscular diagnostic setting found that AlphaMissense correctly classified only 69% of 45 known (likely) pathogenic variants, left 22% as "uncertain," and incorrectly flagged 9% as benign. This underscores a critical message: these models are decision-support tools, not definitive classifiers that replace ACMG guidelines. Indeed, their proper use is to strengthen the ACMG "computational evidence" (PP3/BP4) tier — not to deliver the final word.
Decision support, not a verdict
Even the best variant predictors can misclassify at a rate of 5–10%. Clinical reporting of a variant is driven not by a model's standalone output but by expert review integrating segregation, functional data, and family history. The AI output is one link in the evidence chain — not the whole of it.
Genomic foundation models: the 2025-2026 leap
Perhaps the most striking development of this period was the maturation of "genomic foundation models." Developed through a collaboration of Arc Institute, NVIDIA, and Stanford, Evo 2 was trained with 40 billion parameters, a 1-megabase context window, and more than 9 trillion nucleotides spanning all three domains of life. Its remarkable feature: learning from DNA sequence alone, it can predict the functional impact of variants without task-specific fine-tuning — that is, zero-shot. It was reported to yield results consistent with experimental deep mutational scanning data for BRCA1 variants, and even to distinguish pathogenic from benign among clinically unclassified BRCA1 variants. The Evo 2 preprint was announced in early 2025; the peer-reviewed version appeared in Nature in 2026 (its predecessor, Evo 1, was published in Science in 2024).
Another important model, popEVE, is a generative approach calibrated across the proteome that combines evolutionary data with human population data. Published in Nature Genetics (2025), the study identified variants in 442 genes (123 new candidates) in a cohort of severe developmental disorders. Its most clinically valuable feature: it can prioritize a likely causal variant using only the child's exome, without parental sequencing — a genuine gain for "singleton" cases where the family cannot be sequenced. Similarly, a large machine-learning study published in Science in 2025 re-estimated the penetrance (how often carriers actually develop disease) of 1,648 rare variants across 31 autosomal dominant genes, using data from more than 1.3 million participants linked to electronic health records.
The honest limit here must be clear: as impressively powerful as these models are, their performance has largely been measured on experimental and benchmark data. Models like Evo 2 have no prospective clinical external validation yet; there is no accumulated evidence that they improve outcomes in real patients at real decision points.
Polygenic risk scores: the gap between promise and proof
A polygenic risk score (PRS) summarizes an individual's genetic predisposition to a common disease (heart disease, diabetes, certain cancers) by aggregating thousands of small-effect variants into a single number. The 2025-2026 period brought the first prospective steps beyond observational data for PRS. The first 204 participants of the PROACT study (mean age 56; 69% women) carried subclinical plaque in half of cases (102/204) despite low clinical risk and good cardiovascular health scores — 76% in men, 38% in women. This points to a large "silent but genetically high-risk" group missed by standard clinical assessment. PROACT-2 is testing the effect of high-dose statin plus low-dose colchicine in these individuals in a double-blind randomized design.
Even so, the full picture must be read in balance. The ESC consensus statement published in European Heart Journal in 2025 — the first formal clinical consensus on PRS in cardiovascular disease — explicitly frames it as "promising, but more randomized trials are needed for routine use." The critical point: it has not yet been proven in large randomized trials that PRS actually reduces hard clinical endpoints (heart attack, death). Moreover, a 2025 randomized trial showed that giving PRS information to patients had a limited and mixed effect on behaviors such as physical activity or diet — that is, "knowing the risk" does not automatically translate into "changing behavior."
| Domain | Evidence level (2026) | Key data |
|---|---|---|
| Pharmacogenomics (PGx) | RCT — most robust | PREPARE: 30% reduction in adverse drug reactions |
| Variant interpretation | Strong, but decision-support | AlphaMissense AUC ~0.94; 9% false-benign (neuromuscular) |
| Polygenic risk score | Prospective start, no outcomes yet | PROACT: 50% subclinical plaque; ESC "more RCTs needed" |
| Genomic foundation models | Strong benchmark, no clinical validation | Evo 2: 40B parameters, BRCA1 zero-shot |
| Cancer genomics (CDx) | Regulator-approved, clinically integrated | Tempus xT CDx: 648 genes, tumor-only FDA approval |
Pharmacogenomics: the field with the most robust evidence
The area showing the clearest clinical benefit of genomic AI and personalized medicine comes, somewhat surprisingly, not from the flashiest models but from pharmacogenomics (PGx). The landmark PREPARE study tested a 12-gene panel across 39 drugs in seven European countries and reported that genotype-guided drug/dose selection reduced clinically relevant adverse drug reactions by 30% (Lancet, 2023). This is valuable as prospective, randomized evidence. In fairness, it should be added that the same journal published critiques arguing the "benefit is unclear," along with the authors' replies; the effect size was discussed as deriving largely from a reduction in moderate-severity reactions. As of 2026, this core finding is supported by sustainable, real-world implementations integrated into the electronic health record at academic hospitals (Clinical Pharmacology & Therapeutics, 2026).
Cancer genomics and the regulatory landscape
Oncology is the field where concrete clinical integration of genomic AI is most advanced. In May 2026, the FDA approved a tumor-only indication for the Tempus xT CDx test; with this 648-gene tissue-based sequencing assay, Tempus became the first laboratory to hold companion diagnostic approval for both tumor-only and tumor-normal comprehensive genomic profiling. This means a patient's eligibility for targeted therapy in colorectal cancer can be determined even when matched normal tissue is unavailable. Multimodal approaches are also maturing: a Transformer model combining histopathology images with EHR phenotype (MAIGGT) reached 0.83–0.93 AUC across three independent cohorts for pre-screening BRCA1/2 germline mutations in breast cancer, showing potential to widen access to genetic testing (Advanced Science, 2025).
On the regulatory side, the momentum is real: the FDA has authorized more than 1,300 AI-based devices since 1995, breaking its all-time record with 258 devices in 2025 alone. However, a taxonomy in npj Digital Medicine (2025) analyzing 1,016 authorizations shows that the overwhelming majority are traditional machine learning (radiology-dominant), with generative AI still marginal. The FDA's "Predetermined Change Control Plan" (PCCP) guidance opened a critical door for genomic AI by allowing continuously learning models to be updated without re-authorization at each update.
Systematic risks: inequity and the black box
The most persistent shadow over all this progress is ancestry bias. In genome-wide association studies (GWAS), individuals of European ancestry are represented at roughly 4.5 times their share of the world population, while populations of African ancestry are represented at about one-fifth of what they should be. This imbalance lowers the transferability of PRS across ancestries, because linkage disequilibrium (LD) structures and allele frequencies differ between populations. The All of Us program's more than 245,000 whole-genome participants and multi-ancestry transfer-learning methods (such as PRS-CSx) are trying to close this gap; it is telling that the greatest improvement appears precisely in underrepresented populations when training includes diverse data.
Other systematic risks should not be overlooked: models can degrade on data from different institutions, devices, or populations (distribution shift); the "black box" nature of advanced models makes alignment with rule-based guidelines such as ACMG difficult; and the literature carries a publication bias in which positive/high-AUC results are overrepresented. As the 2025 review states plainly: experimental validation lags behind computational predictions.
Conclusion
As of 2026, the balance sheet for AI in genomics and personalized medicine is neither a utopian revolution nor an empty promise; it is a balanced, layered advance. AI is helping to reduce the VUS burden in variant interpretation, and genomic foundation models have made a genuine leap in reading functional effect from a DNA sequence — but as decision-support tools, and without prospective clinical validation yet. The most robust clinical benefit comes not from the flashiest models but from pharmacogenomics (PREPARE's 30% reduction in ADRs); polygenic risk scores, while generating promising prospective signals, still await the evidence of large randomized trials on hard endpoints. The real test of the coming period will be to show not these models' technical AUC values, but whether they improve outcomes in real patients, at real decision points — and equitably across all populations. The clinician's judgment is not an "old method" in this equation; it is the indispensable link that anchors AI output in clinical context.
References
- Frontiers in Genetics. Comprehensive evaluation of AlphaMissense predictions by evidence quantification for VUS. Frontiers in Genetics. 2024. site
- Journal of Neuromuscular Diseases. AlphaMissense prediction for missense variants in neuromuscular disorders. J Neuromuscul Dis (SAGE). 2025. site
- Harnessing artificial intelligence for genomic variant prediction: advances, challenges, and future directions (review). PMC. 2025. site
- Brixi G, et al. Genome modelling and design across all domains of life with Evo 2. Nature. 2026. site
- popEVE: Proteome-wide model for human disease genetics. Nature Genetics. 2025. site
- Machine learning-based penetrance of genetic variants. Science. 2025. site
- PROACT: Polygenic Risk Based Detection and Treatment of Subclinical Coronary Atherosclerosis. JACC. 2025–2026. site
- ESC. Clinical utility and implementation of PRS for CVD — ESC clinical consensus statement. European Heart Journal. 2025. site
- Swen JJ, et al. A 12-gene pharmacogenetic panel to prevent adverse drug reactions (PREPARE). The Lancet. 2023. site
- Tempus receives FDA approval for tumor-only xT CDx (648 genes). BusinessWire. 2026. site
- MAIGGT: Explainable multimodal AI for germline genetic testing in breast cancer. Advanced Science. 2025. site
- How AI is used in FDA-authorized medical devices: a taxonomy across 1,016 authorizations. npj Digital Medicine. 2025. site