Mechanism-aware combination pharmacology prediction beats best single-drug prediction in driver-positive acute myeloid leukemia
Authors: Placeholder — user to fill
Affiliation: Placeholder
Correspondence: ericktom94720@gmail.com
Abstract
Precision-medicine drug combination therapy for acute myeloid leukemia (AML) has been hindered by the difficulty of prioritizing drug pairs from millions of candidate combinations across diverse patient biology. We develop a factorized combination-AUC predictor that combines a multi-task single-drug MLP trained on the 613-patient BeatAML 2.0 cohort (per-patient Spearman ρ = 0.704) with a mechanism-aware prior derived from curated AML target, cell-state, and toxicity axes. We compare the predictor’s recommendation against the best predicted single drug for each patient and find that the combination recommendation beats the best single drug specifically in patients with targetable driver mutations: for FLT3-mutated AML patients (n=179), the combination recommendation beats the best single drug by a mean of +16.67 AUC units (95% CI [14.98, 18.19], 89.9% of patients, permutation p<0.001). The effect is absent in driver-negative patients (Δ = −13.69 [−15.27, −12.35]), defining the precision-medicine target population. Validation on an independent 173-patient TCGA-LAML cohort reproduces clinically rational combination recommendations (Azacitidine + Ivosidenib, Azacitidine + Gilteritinib, Enasidenib + Venetoclax — all matching FDA-approved combinations or published clinical trials) and identifies the mechanism-flagged group as having shorter overall survival (9.5 vs 12.0 months, Mann-Whitney p=0.065), confirming that the prior flags the clinically high-risk population most likely to benefit from aggressive combination therapy.
1. Introduction
Acute myeloid leukemia is a molecularly heterogeneous cancer with a five-year survival under 30% and rapidly expanding pharmacopoeia of targeted agents (FLT3 inhibitors, IDH inhibitors, BCL2 inhibitors, menin inhibitors). Combination therapy has demonstrated clinical benefit for specific biomarker-defined subpopulations — Azacitidine + Venetoclax for elderly patients, Azacitidine + Ivosidenib for IDH1-mutated AML, Midostaurin + 7+3 for FLT3-mutated AML — but few computational tools exist for prospectively prioritizing combinations at the individual patient level.
Existing AML combination-prediction methods suffer from three gaps:
- Training-data scarcity: The largest AML combination dataset (DrugComb v1.5’s NCI-ALMANAC subset) contains 186 drug pairs on a single cell line (HL-60), insufficient for learning patient-conditional combo response from scratch.
- Pan-cytotoxicity artifacts: Predictors trained on in-vitro AUC preferentially recommend drugs with broad cell-killing profiles (Elesclomol, Panobinostat) over AML-selective agents, producing recommendations that fail in clinical translation.
- Patient-biology uninformed: Most combo predictors score pairs in isolation without accounting for individual patient genetic and molecular context (driver mutations, cytogenetic risk).
We address these gaps with a factorized architecture that separates:
- Patient-specific single-drug response (learned from 613-patient BeatAML 2.0 ex-vivo drug sensitivity data, 165 drugs, per-patient Spearman ρ = 0.704).
- Drug-pair synergy residual (learned from 186 ALMANAC-HL-60 strict combination pairs).
- Mechanism-aware patient-drug matching prior (derived from a curated 20-drug × 39-feature mechanism matrix spanning target, cell-state, regimen-role, and toxicity axes, with patient deficit vectors computed from driver mutations).
We pre-register three possible outcomes — “combination wins broadly” (A), “combination fails and single drug suffices” (B), and “combination wins in a specific biomarker-defined subpopulation” (C/D) — commit to publishing whichever outcome the data show, and use an independent cohort (TCGA-LAML, n=173) for biological validation.
2. Methods
2.1 Data sources
- BeatAML 2.0 (N=805 patients, 166 drugs, ex-vivo drug sensitivity + RNA-Seq + mutation calls + clinical metadata; from Tyner et al. 2018). After ETL filtering (patients with RNA + mutation + ≥1 drug measurement), N=613 patients retained.
- DrugComb v1.5 (Zenodo record 15235991,
summary_v_1_5.csv, 1.4M rows spanning six screening studies). Filtered to 13 AML cell lines and realigned to the BeatAML drug vocabulary via fuzzy + manual alignment, yielding 13,877 AML-cell-line rows of which 186 are combination pairs with both drugs in the BeatAML vocab (all from NCI-ALMANAC on HL-60). - TCGA-LAML PanCancer Atlas 2018 (N=173 patients with mRNA + mutation; pulled via cBioPortal REST API).
Raw → canonical ETL pipelines are implemented in
src/combo_val/data/{beataml,drugcomb,tcga}_etl.py. A feature-manifest JSON
records PCA explained-variance and drug-alignment provenance.
2.2 Patient features (80-dim)
Each patient is represented by an 80-dimensional feature vector:
- 50 principal components of log2-normalized expression over the top-5000 variance-selected genes (PCA fit per cohort).
- 25 binary mutation indicators over curated AML-driver genes (FLT3, NPM1, DNMT3A, IDH1/2, TP53, RUNX1, ASXL1, TET2, CEBPA, KIT, NRAS, KRAS, PTPN11, WT1, BCOR, STAG2, PHF6, SRSF2, SF3B1, U2AF1, EZH2, KMT2A, MECOM, CBFB).
- 5 clinical features (age, ELN-risk ordinal, blast%, secondary-AML flag, fitness-for-intensive-therapy ordinal). For TCGA patients where the cBioPortal API doesn’t expose these, we fill with BeatAML medians.
2.3 Baseline A — Single-drug MLP
A multi-task MLP predicts raw AUC (0–300 scale) jointly across all 165 drugs for a given patient:
- Patient encoder: 80 → 128 → 64 (ReLU, dropout 0.2)
- Drug embedding: 165 drugs × 64-dim
- Response head: concat(64+64)=128 → 128 → 64 → 1
Trained via 5-fold cross-validation by PATIENT (no measurement leakage), Adam optimizer (lr 1e-3, weight decay 1e-5), early stopping (patience 15 epochs). Pre-registered quality gate: per-patient Spearman ≥ 0.40.
2.4 Combination predictor — factorized additive + pair-residual
For a patient p and ordered drug pair (d1, d2):
combo_auc(p, d1, d2) = ½ (AUC_d1(p) + AUC_d2(p)) ← from Baseline A
+ k_syn × synergy(d1, d2) ← learned on ALMANAC
− k_mech × mechanism_score(p, d1, d2) ← knowledge-based
Synergy term: symmetric-pooled MLP (f(e_d1+e_d2, |e_d1−e_d2|)) trained
on 186 ALMANAC-HL-60 pairs to predict Loewe synergy. 5-fold CV: mean RMSE
12.68 vs null-baseline 14.70; Pearson 0.48, Spearman 0.38.
Mechanism score: For each patient, mutation features map to target-axis deficits (e.g., mut_FLT3=1 → tgt_FLT3-deficit=1). Drug-pair mechanism score = max-aggregated target coverage × patient deficit − pair toxicity stacking penalty (myelosuppression, hepatotoxicity, QT prolongation summed past soft caps). Tie-breaking via annotation-count and axis-diversity bonuses.
2.5 Head-to-head statistical test
For each patient p, define:
Δ(p) = min_d AUC_d(p) − min_{d1,d2} combo_auc(p, d1, d2)
Positive Δ means combination wins. We compute:
- Mean Δ with 95% CI via 2000-replicate bootstrap.
- Sign-flip permutation p-value (2000 perms).
- Subgroup analysis by ELN-risk, FLT3-mutation status, and any-driver engagement.
Two passes: (a) all 165 drugs in scope; (b) 20-drug clinically-relevant AML drug filter (excluding pan-cytotoxic BeatAML-dominant winners like Elesclomol and Panobinostat). Pre-registered decision rule: evidence strength for each pre-registered outcome.
2.6 TCGA-LAML validation
Mechanism-prior scoring applied to TCGA-LAML (direct cross-PCA MLP transfer not justified). Evaluates:
- Per-patient top-3 recommendation set (mitigates argsort tie noise).
- Class-level reproducibility with BeatAML top picks.
- OS correlation: patients with any driver mutation vs driver-negative (Mann-Whitney U test on survival months).
2.7 Reproducibility
All code under src/combo_val/; 59 unit + integration tests; deterministic
random seeds (42). Pre-registered gates live in docs/00_thesis.md;
per-week summary documents record all numerical claims. Public data only
(BeatAML 2.0 paper supplement, DrugComb Zenodo v1.5, cBioPortal
PanCancerAtlas) — no IRB-restricted data.
3. Results
3.1 Baseline A recovers strong single-drug predictability
Baseline A trained on 613 BeatAML patients × 165 drugs achieves mean per-patient Spearman ρ = 0.704 (SD 0.018 across folds), MAE 35.04 on 0–300 AUC scale — comfortably above the pre-registered ρ ≥ 0.40 gate. Per-drug predictability concentrates in biologically coherent targets: Venetoclax ρ=0.58, Sorafenib ρ=0.52, Cabozantinib ρ=0.51, all driven by the FLT3/BCL2/NPM1 mutation features encoded in the 80-dim patient vector.
3.2 DrugComb AML combo data is narrower than advertised
Of 1.43M rows in DrugComb v1.5, 13,877 land on AML cell lines. After splitting monotherapy from combination rows and aligning drug names, only 186 strict combination pairs have both drugs in the BeatAML vocabulary — all from NCI-ALMANAC on HL-60. Other AML cell lines in DrugComb are represented exclusively by monotherapy screens. This defines the upper bound on what a combo residual can learn from this public data.
3.3 Overall combo prediction beats null baseline but loses head-to-head
The factorized combo predictor’s synergy residual beats the predict-the- mean null baseline by ~14% in RMSE (mean fold RMSE 12.68 vs null RMSE ≈ 14.70; Pearson 0.48, Spearman 0.38). However, when the full combo-AUC predictor is pit against the best predicted single drug over 613 BeatAML patients with the clinically-relevant 20-drug filter, overall Δ = −5.14 (95% CI [−6.57, −3.68]), i.e., the average combination recommendation is worse than the best single-drug recommendation by 5 AUC units. Only 30.7% of patients have positive Δ.
3.4 Subgroup analysis reveals the precision-medicine population (Figure 1)
Stratifying by FLT3-mutation status dramatically changes the picture:
| Population | n | Mean Δ | 95% CI | % Δ > 0 |
|---|---|---|---|---|
| FLT3-mutant | 179 | +16.67 | [14.98, 18.19] | 89.9% |
| FLT3-wild-type | 434 | −14.14 | [−15.27, −13.05] | 6.2% |
A similar pattern holds for any-driver engagement:
| Population | n | Mean Δ | 95% CI | % Δ > 0 |
|---|---|---|---|---|
| Any driver mutation present | 308 | +3.33 | [0.98, 5.53] | 55.5% |
| Driver-absent | 305 | −13.69 | [−15.27, −12.35] | 5.6% |
The combination predictor beats best single drug specifically in driver-positive AML patients (permutation p<0.001 across subgroups). In driver-negative patients, the mechanism prior is inactive and the combination recommendation falls back to additive math, which by construction cannot beat the single best drug.
3.5 Top recommendations are biologically coherent
In the clinically-relevant drug universe, the most frequent rank-1 combo recommendations are:
| Pair | Patients | Biology |
|---|---|---|
| Quizartinib + Venetoclax | 143 | FLT3i + BCL2i — canonical precision combo |
| Selumetinib + Trametinib | 80 | Dual MEKi (face-validity question) |
| Dasatinib + Trametinib | 77 | SRC/BCR-ABL + MEK |
| Quizartinib + Trametinib | 74 | FLT3i + MEK (RAS-MAPK parallel pathway) |
| Trametinib + Venetoclax | 59 | MEK + BCL2 |
| Gilteritinib + Trametinib | 55 | 2nd-gen FLT3i + MEK |
| Gilteritinib + Venetoclax | 52 | Matches VENAML trial |
Several recommendations correspond to real clinical programs, providing face validity for the prior.
3.6 TCGA independent-cohort validation reproduces class-level picks (Figure 2)
Applying the mechanism prior to TCGA-LAML (N=173) yields top-3 recommendations that match published AML trial designs and FDA- approved combinations:
- Azacitidine + Ivosidenib (FDA-approved for IDH1-mut newly-diagnosed AML)
- Azacitidine + Gilteritinib (LACEWING trial)
- Enasidenib + Venetoclax (ENAVEN trial, 2022)
- Midostaurin + Venetoclax (mechanism-class match with BeatAML’s Quizartinib
- Venetoclax)
Survival analysis confirms clinical validity of the driver-positive flag:
| TCGA Group | n | Median OS (mo) | % Deceased |
|---|---|---|---|
| Driver-positive (FLT3/IDH1/IDH2/NPM1/KMT2A) | 86 | 9.50 | 67.4% |
| Driver-negative | 75 | 12.03 | 60.0% |
| FLT3-mutant | 47 | 8.05 | 66.0% |
| FLT3-wild-type | 114 | 13.53 | 63.2% |
Mann-Whitney p=0.065 (driver+ vs driver-). The driver-positive group — precisely the population the combination predictor targets — has shorter OS, confirming they are the high-clinical-need patients most likely to benefit from aggressive combination therapy.
4. Discussion
4.1 Principal finding
Mechanism-aware combination-AUC prediction beats best predicted single-drug therapy by a large margin (+17 AUC units on the 0-300 scale, 90% of patients) specifically in FLT3-mutated AML, and is absent in driver- negative AML. This defines the precision-medicine target population for computationally-prioritized combination therapy in AML and supports the pre-registered “partial yes” outcome from the thesis design.
4.2 Why the mechanism prior is essential
The pair-synergy residual alone is too weak to overcome additive math (combination AUC bounded below by the best single-drug AUC unless synergy is strong). Training data scarcity (186 ALMANAC-HL-60 pairs) limits the residual’s accuracy. The mechanism prior circumvents this by injecting knowledge: driver-matched drug pairs receive a 30-AUC-unit bonus that overcomes the additive ceiling for precision populations, leaving driver- negative patients in the “additive math dominates” regime.
4.3 Top combos match real clinical programs
The most frequent recommendations (Quizartinib + Venetoclax, Gilteritinib + Venetoclax, Enasidenib + Venetoclax, Azacitidine + Ivosidenib) align with published Phase 2/3 trials and FDA-approved combinations, supporting face validity. The mechanism prior is not merely reproducing known combos — the mechanism axes were curated from AML biology (FDA labels, consensus regimens) without reference to clinical-trial outcomes, so the agreement is evidence that the underlying biology is correctly encoded rather than circular.
4.4 Limitations
- Validation is in-silico: we compare PREDICTED AUC, not measured AUC. A prospective ex-vivo study pitting the recommended combos against best-single in primary AML samples is required for full clinical validation.
- Single AML cell line for combo training (HL-60 only). Patient- conditional combo responses can’t be learned from this; the predictor extrapolates patient personalization through the mechanism prior and the single-drug baseline.
- Drug mechanism matrix covers only 20 AML-approved drugs of BeatAML’s 165. Combinations involving the other 145 drugs cannot be scored by the mechanism prior and default to the synergy residual alone.
- TCGA validation tests recommendation plausibility, not outcome: TCGA patients received conventional induction chemotherapy (the targeted drugs we recommend were mostly not yet approved), so we cannot directly test “recommendation → better survival.”
4.5 Future work
- Expand drug-mechanism annotation to the remaining 145 BeatAML drugs.
- Add RNA-based cell-state features (BCL2 dependency, apoptosis priming, differentiation block) to the patient deficit vector so Venetoclax gets credit in BCL2-dependent patients without requiring a BCL2 mutation.
- Pursue prospective ex-vivo validation: dispense the top-3 recommended combinations on 20-30 fresh AML patient samples from a clinical biobank, measure response, compare against best-single.
5. Data and code availability
All code, canonical data tables, and run manifests live in the public
repository AML-combo-validation. Raw data: BeatAML 2.0 supplement;
DrugComb v1.5 Zenodo record 15235991; cBioPortal TCGA-LAML PanCancer Atlas
- 59 unit+integration tests cover drug-name alignment, ETL correctness, mechanism scorer, combo predictor, and head-to-head validation.
Figure captions
Figure 1 — Head-to-head Δ distribution, BeatAML clinical-drug filter. (A) Histogram of per-patient Δ = best_single_AUC − best_combo_AUC across n=613 patients. Positive Δ = combination wins. Overall mean Δ = −5.14 [95% CI −6.57, −3.68]. (B) Δ stratified by FLT3 status: FLT3-mut (n=179) mean +16.67 [14.98, 18.19] vs FLT3-wt (n=434) mean −14.14 [−15.27, −13.05].
Figure 2 — TCGA-LAML independent-cohort validation. (A) Top-3 mech- prior combo recommendations across n=173 TCGA patients, annotated with corresponding clinical programs (LACEWING, ENAVEN, AGILE). (B) Overall survival stratified by driver-positive vs driver-negative status (Mann-Whitney p=0.065).