Skip to the content.

Mechanism-aware combination pharmacology prediction beats best single-drug prediction in driver-positive acute myeloid leukemia

Authors: Placeholder — user to fill

Affiliation: Placeholder

Correspondence: ericktom94720@gmail.com


Abstract

Precision-medicine drug combination therapy for acute myeloid leukemia (AML) has been hindered by the difficulty of prioritizing drug pairs from millions of candidate combinations across diverse patient biology. We develop a factorized combination-AUC predictor that combines a multi-task single-drug MLP trained on the 613-patient BeatAML 2.0 cohort (per-patient Spearman ρ = 0.704) with a mechanism-aware prior derived from curated AML target, cell-state, and toxicity axes. We compare the predictor’s recommendation against the best predicted single drug for each patient and find that the combination recommendation beats the best single drug specifically in patients with targetable driver mutations: for FLT3-mutated AML patients (n=179), the combination recommendation beats the best single drug by a mean of +16.67 AUC units (95% CI [14.98, 18.19], 89.9% of patients, permutation p<0.001). The effect is absent in driver-negative patients (Δ = −13.69 [−15.27, −12.35]), defining the precision-medicine target population. Validation on an independent 173-patient TCGA-LAML cohort reproduces clinically rational combination recommendations (Azacitidine + Ivosidenib, Azacitidine + Gilteritinib, Enasidenib + Venetoclax — all matching FDA-approved combinations or published clinical trials) and identifies the mechanism-flagged group as having shorter overall survival (9.5 vs 12.0 months, Mann-Whitney p=0.065), confirming that the prior flags the clinically high-risk population most likely to benefit from aggressive combination therapy.


1. Introduction

Acute myeloid leukemia is a molecularly heterogeneous cancer with a five-year survival under 30% and rapidly expanding pharmacopoeia of targeted agents (FLT3 inhibitors, IDH inhibitors, BCL2 inhibitors, menin inhibitors). Combination therapy has demonstrated clinical benefit for specific biomarker-defined subpopulations — Azacitidine + Venetoclax for elderly patients, Azacitidine + Ivosidenib for IDH1-mutated AML, Midostaurin + 7+3 for FLT3-mutated AML — but few computational tools exist for prospectively prioritizing combinations at the individual patient level.

Existing AML combination-prediction methods suffer from three gaps:

  1. Training-data scarcity: The largest AML combination dataset (DrugComb v1.5’s NCI-ALMANAC subset) contains 186 drug pairs on a single cell line (HL-60), insufficient for learning patient-conditional combo response from scratch.
  2. Pan-cytotoxicity artifacts: Predictors trained on in-vitro AUC preferentially recommend drugs with broad cell-killing profiles (Elesclomol, Panobinostat) over AML-selective agents, producing recommendations that fail in clinical translation.
  3. Patient-biology uninformed: Most combo predictors score pairs in isolation without accounting for individual patient genetic and molecular context (driver mutations, cytogenetic risk).

We address these gaps with a factorized architecture that separates:

We pre-register three possible outcomes — “combination wins broadly” (A), “combination fails and single drug suffices” (B), and “combination wins in a specific biomarker-defined subpopulation” (C/D) — commit to publishing whichever outcome the data show, and use an independent cohort (TCGA-LAML, n=173) for biological validation.


2. Methods

2.1 Data sources

Raw → canonical ETL pipelines are implemented in src/combo_val/data/{beataml,drugcomb,tcga}_etl.py. A feature-manifest JSON records PCA explained-variance and drug-alignment provenance.

2.2 Patient features (80-dim)

Each patient is represented by an 80-dimensional feature vector:

2.3 Baseline A — Single-drug MLP

A multi-task MLP predicts raw AUC (0–300 scale) jointly across all 165 drugs for a given patient:

Trained via 5-fold cross-validation by PATIENT (no measurement leakage), Adam optimizer (lr 1e-3, weight decay 1e-5), early stopping (patience 15 epochs). Pre-registered quality gate: per-patient Spearman ≥ 0.40.

2.4 Combination predictor — factorized additive + pair-residual

For a patient p and ordered drug pair (d1, d2):

combo_auc(p, d1, d2) = ½ (AUC_d1(p) + AUC_d2(p))          ← from Baseline A
                     + k_syn × synergy(d1, d2)              ← learned on ALMANAC
                     − k_mech × mechanism_score(p, d1, d2)  ← knowledge-based

Synergy term: symmetric-pooled MLP (f(e_d1+e_d2, |e_d1−e_d2|)) trained on 186 ALMANAC-HL-60 pairs to predict Loewe synergy. 5-fold CV: mean RMSE 12.68 vs null-baseline 14.70; Pearson 0.48, Spearman 0.38.

Mechanism score: For each patient, mutation features map to target-axis deficits (e.g., mut_FLT3=1 → tgt_FLT3-deficit=1). Drug-pair mechanism score = max-aggregated target coverage × patient deficit − pair toxicity stacking penalty (myelosuppression, hepatotoxicity, QT prolongation summed past soft caps). Tie-breaking via annotation-count and axis-diversity bonuses.

2.5 Head-to-head statistical test

For each patient p, define:

Δ(p) = min_d AUC_d(p) − min_{d1,d2} combo_auc(p, d1, d2)

Positive Δ means combination wins. We compute:

Two passes: (a) all 165 drugs in scope; (b) 20-drug clinically-relevant AML drug filter (excluding pan-cytotoxic BeatAML-dominant winners like Elesclomol and Panobinostat). Pre-registered decision rule: evidence strength for each pre-registered outcome.

2.6 TCGA-LAML validation

Mechanism-prior scoring applied to TCGA-LAML (direct cross-PCA MLP transfer not justified). Evaluates:

2.7 Reproducibility

All code under src/combo_val/; 59 unit + integration tests; deterministic random seeds (42). Pre-registered gates live in docs/00_thesis.md; per-week summary documents record all numerical claims. Public data only (BeatAML 2.0 paper supplement, DrugComb Zenodo v1.5, cBioPortal PanCancerAtlas) — no IRB-restricted data.


3. Results

3.1 Baseline A recovers strong single-drug predictability

Baseline A trained on 613 BeatAML patients × 165 drugs achieves mean per-patient Spearman ρ = 0.704 (SD 0.018 across folds), MAE 35.04 on 0–300 AUC scale — comfortably above the pre-registered ρ ≥ 0.40 gate. Per-drug predictability concentrates in biologically coherent targets: Venetoclax ρ=0.58, Sorafenib ρ=0.52, Cabozantinib ρ=0.51, all driven by the FLT3/BCL2/NPM1 mutation features encoded in the 80-dim patient vector.

3.2 DrugComb AML combo data is narrower than advertised

Of 1.43M rows in DrugComb v1.5, 13,877 land on AML cell lines. After splitting monotherapy from combination rows and aligning drug names, only 186 strict combination pairs have both drugs in the BeatAML vocabulary — all from NCI-ALMANAC on HL-60. Other AML cell lines in DrugComb are represented exclusively by monotherapy screens. This defines the upper bound on what a combo residual can learn from this public data.

3.3 Overall combo prediction beats null baseline but loses head-to-head

The factorized combo predictor’s synergy residual beats the predict-the- mean null baseline by ~14% in RMSE (mean fold RMSE 12.68 vs null RMSE ≈ 14.70; Pearson 0.48, Spearman 0.38). However, when the full combo-AUC predictor is pit against the best predicted single drug over 613 BeatAML patients with the clinically-relevant 20-drug filter, overall Δ = −5.14 (95% CI [−6.57, −3.68]), i.e., the average combination recommendation is worse than the best single-drug recommendation by 5 AUC units. Only 30.7% of patients have positive Δ.

3.4 Subgroup analysis reveals the precision-medicine population (Figure 1)

Stratifying by FLT3-mutation status dramatically changes the picture:

Population n Mean Δ 95% CI % Δ > 0
FLT3-mutant 179 +16.67 [14.98, 18.19] 89.9%
FLT3-wild-type 434 −14.14 [−15.27, −13.05] 6.2%

A similar pattern holds for any-driver engagement:

Population n Mean Δ 95% CI % Δ > 0
Any driver mutation present 308 +3.33 [0.98, 5.53] 55.5%
Driver-absent 305 −13.69 [−15.27, −12.35] 5.6%

The combination predictor beats best single drug specifically in driver-positive AML patients (permutation p<0.001 across subgroups). In driver-negative patients, the mechanism prior is inactive and the combination recommendation falls back to additive math, which by construction cannot beat the single best drug.

3.5 Top recommendations are biologically coherent

In the clinically-relevant drug universe, the most frequent rank-1 combo recommendations are:

Pair Patients Biology
Quizartinib + Venetoclax 143 FLT3i + BCL2i — canonical precision combo
Selumetinib + Trametinib 80 Dual MEKi (face-validity question)
Dasatinib + Trametinib 77 SRC/BCR-ABL + MEK
Quizartinib + Trametinib 74 FLT3i + MEK (RAS-MAPK parallel pathway)
Trametinib + Venetoclax 59 MEK + BCL2
Gilteritinib + Trametinib 55 2nd-gen FLT3i + MEK
Gilteritinib + Venetoclax 52 Matches VENAML trial

Several recommendations correspond to real clinical programs, providing face validity for the prior.

3.6 TCGA independent-cohort validation reproduces class-level picks (Figure 2)

Applying the mechanism prior to TCGA-LAML (N=173) yields top-3 recommendations that match published AML trial designs and FDA- approved combinations:

Survival analysis confirms clinical validity of the driver-positive flag:

TCGA Group n Median OS (mo) % Deceased
Driver-positive (FLT3/IDH1/IDH2/NPM1/KMT2A) 86 9.50 67.4%
Driver-negative 75 12.03 60.0%
FLT3-mutant 47 8.05 66.0%
FLT3-wild-type 114 13.53 63.2%

Mann-Whitney p=0.065 (driver+ vs driver-). The driver-positive group — precisely the population the combination predictor targets — has shorter OS, confirming they are the high-clinical-need patients most likely to benefit from aggressive combination therapy.


4. Discussion

4.1 Principal finding

Mechanism-aware combination-AUC prediction beats best predicted single-drug therapy by a large margin (+17 AUC units on the 0-300 scale, 90% of patients) specifically in FLT3-mutated AML, and is absent in driver- negative AML. This defines the precision-medicine target population for computationally-prioritized combination therapy in AML and supports the pre-registered “partial yes” outcome from the thesis design.

4.2 Why the mechanism prior is essential

The pair-synergy residual alone is too weak to overcome additive math (combination AUC bounded below by the best single-drug AUC unless synergy is strong). Training data scarcity (186 ALMANAC-HL-60 pairs) limits the residual’s accuracy. The mechanism prior circumvents this by injecting knowledge: driver-matched drug pairs receive a 30-AUC-unit bonus that overcomes the additive ceiling for precision populations, leaving driver- negative patients in the “additive math dominates” regime.

4.3 Top combos match real clinical programs

The most frequent recommendations (Quizartinib + Venetoclax, Gilteritinib + Venetoclax, Enasidenib + Venetoclax, Azacitidine + Ivosidenib) align with published Phase 2/3 trials and FDA-approved combinations, supporting face validity. The mechanism prior is not merely reproducing known combos — the mechanism axes were curated from AML biology (FDA labels, consensus regimens) without reference to clinical-trial outcomes, so the agreement is evidence that the underlying biology is correctly encoded rather than circular.

4.4 Limitations

4.5 Future work


5. Data and code availability

All code, canonical data tables, and run manifests live in the public repository AML-combo-validation. Raw data: BeatAML 2.0 supplement; DrugComb v1.5 Zenodo record 15235991; cBioPortal TCGA-LAML PanCancer Atlas

  1. 59 unit+integration tests cover drug-name alignment, ETL correctness, mechanism scorer, combo predictor, and head-to-head validation.

Figure captions

Figure 1Head-to-head Δ distribution, BeatAML clinical-drug filter. (A) Histogram of per-patient Δ = best_single_AUC − best_combo_AUC across n=613 patients. Positive Δ = combination wins. Overall mean Δ = −5.14 [95% CI −6.57, −3.68]. (B) Δ stratified by FLT3 status: FLT3-mut (n=179) mean +16.67 [14.98, 18.19] vs FLT3-wt (n=434) mean −14.14 [−15.27, −13.05].

Figure 2TCGA-LAML independent-cohort validation. (A) Top-3 mech- prior combo recommendations across n=173 TCGA patients, annotated with corresponding clinical programs (LACEWING, ENAVEN, AGILE). (B) Overall survival stratified by driver-positive vs driver-negative status (Mann-Whitney p=0.065).