Thesis

What we are testing

For AML patients, does a mechanism-aware combination recommendation beat the predicted best single drug?

Formally, for each patient in a held-out validation set:

best_single(p) = min over drugs d of:   predicted AUC_single(p, d)
best_combo(p)  = min over pairs (d1,d2) of: predicted AUC_combo(p, d1, d2)
Δ(p) = best_single(p) − best_combo(p)     # >0 means combo is better

Primary test: H0: E[Δ] ≤ 0 vs H1: E[Δ] > 0 via bootstrap CI on mean(Δ).

Why this is the right question

Our earlier AML-CRAFT work showed:

AML patient embeddings have ~10 intrinsic dimensions — biology is low-dimensional.
KMeans forces discrete partitions onto a mix of tight cores + a continuous MDS-spectrum region. Hybrid representation is needed.
Single-drug AUC does not systematically vary with biology-axis position on the MDS-spectrum (BH-corrected significance = 0/144 drugs). This negative finding is the direct motivation for this project: if single-drug matching can’t personalize, combinations are the only remaining lever.

This thesis tests whether that last claim holds up under a proper mechanism-aware combination predictor.

Four acceptable outcomes

Outcome	`mean(Δ)`	Paper framing
A. Combo clearly wins	≥ 20% AUC reduction	New personalization framework
B. Combo marginally wins	5-20%	Applies to specific subgroups
C. Combo equals single	≈ 0	SOC is optimal; personalization must come from elsewhere
D. Combo loses	< 0	Model design flaw; iterate or publish methodological lesson

All four are publishable with honest framing.

Scope (deliberately narrow)

In scope: training + evaluating two predictors, head-to-head comparison, TCGA-LAML replication.
Out of scope: organoid validation (future wet-lab work), PK/PD modeling, dose optimization, multi-drug (≥4) combinations.

Success criteria (pre-registered)

We commit to these metrics and decision rules BEFORE running the validation, so outcome selection doesn’t bias the interpretation:

Data quality gate: ≥ 450 BeatAML patients with full feature coverage (currently 613 ✅).
Baseline quality gate: single-drug predictor achieves Spearman ≥ 0.4 against held-out AUC (typical DeepSynergy-level baseline).
Combo predictor sanity gate: prior-only score correlates (Spearman ≥ 0.3) with DrugComb Loewe synergy on held-out pairs.
Primary outcome: bootstrap 95% CI of mean(Δ) on held-out BeatAML patients.
Replication outcome: sign-consistency of Δ in TCGA-LAML.

Failure of any gate triggers model-level iteration, not metric-shopping.