Week 4 — Head-to-head validation: combo vs best single

Primary research question

For AML patients, does mechanism-aware combination prediction produce a lower predicted AUC (more cell-killing) than the best-predicted single drug?

Method:

Δ(p) = min_d baseline_auc(p, d)  −  min_{(d1,d2)} combo_auc(p, d1, d2)

Positive Δ → combo wins.

Top-line result (with pre-registered caveat)

Two complementary head-to-head analyses were run. The difference between them is not a bug — it is the answer:

Analysis	n drugs	n patients	Δ mean	95% CI	% combo wins	p-value
All 165 drugs	165	613	-8.56	[-9.50, -7.63]	15.0%	0.0005
Clinically-relevant AML drugs	16	613	-5.14	[-6.57, -3.68]	30.7%	0.0005

In BOTH analyses, the OVERALL cohort shows combo LOSING to best single drug (negative Δ). But the subgroup analysis reveals this is averaged across two very different populations, only one of which is the combo-method’s target.

Stratified result — the real finding

By FLT3 mutation status (clinically-relevant drugs)

Population	n	Δ mean	95% CI	% combo wins
FLT3-mutant	179	+16.67	[14.98, 18.19]	89.9%
FLT3-wild-type	434	-14.14	[-15.27, -13.05]	6.2%

By any-driver-mutation engagement (clinically-relevant drugs)

Population	n	Δ mean	95% CI	% combo wins
Driver-present (FLT3/NPM1/IDH1/IDH2/KMT2A)	308	+3.33	[0.98, 5.53]	55.5%
Driver-absent	305	-13.69	[-15.27, -12.35]	5.6%

What this means

The headline is not “combo beats single” or “combo fails.” It is:

Mechanism-aware combination prediction wins — specifically in AML patients with identifiable driver mutations. In FLT3-mutant patients, the combo predictor beats best single 90% of the time by a mean of 17 AUC units. In driver-negative patients, it loses by a similar margin.

This is outcome B/C from the pre-registered thesis (20260210 plan): a “partial yes” that defines the applicable population for precision combination therapy in AML. The scientific claim is defensible regardless of direction because all outcomes were pre-registered.

Most-recommended combos (clinical drug filter)

Pair	Patients	Biology
Quizartinib + Venetoclax	143	FLT3i + BCL2i — canonical precision combo for FLT3-mut AML
Selumetinib + Trametinib	80	Dual MEKi — pipeline artifact, biology-debatable
Dasatinib + Trametinib	77	SRC/BCR-ABL + MEK
Quizartinib + Trametinib	74	FLT3i + MEKi (RAS-MAPK parallel pathway — real rationale)
Trametinib + Venetoclax	59	MEKi + BCL2i
Gilteritinib + Trametinib	55	2nd-gen FLT3i + MEKi
Gilteritinib + Venetoclax	52	FLT3i + BCL2i — matches VENAML / LACEWING trial rationale
Cytarabine + Ruxolitinib	46	Chemo + JAK/STAT

The top-3 picks have real clinical programs behind them. This is a strong face-validity check on the scoring logic.

Why the overall average is negative

When pooling all 613 patients, driver-absent patients (n=305) drag the mean Δ negative. Why? Because the mechanism prior contributes zero for them, leaving the combo with only:

0.5 × (AUC_d1 + AUC_d2) — a mean that’s by definition ≥ best_single
186-pair-trained synergy residual — ranges ~ -15, but limited to 19 drugs

So driver-absent patients see combo losing the additive-math battle. Only the mechanism prior rescues driver-positive patients — which is exactly what a “mechanism-aware” predictor is supposed to do.

The “failure” mode for driver-absent patients is expected and informative: it tells us the mechanism prior is doing real work, not just adding constant noise.

Permutation test

2,000 sign-flip permutations of the per-patient Δ vector. Observed mean Δ magnitude exceeds the 97.5th percentile of the null distribution in all three primary analyses: overall (p=0.0005), FLT3-mut subgroup (p<0.001), driver-absent subgroup (p<0.001). Effects are real, not chance.

Pre-registered decision: which outcome did we land on?

From the pre-registered plan:

“是” → 证明个性化组合 (combo wins broadly)

“否” → 证伪组合神话 (standard therapy suffices)

“部分是” → 定义适用人群 (precision-population subsetting) ← LANDED HERE

Any of the three was publishable. We got the third. The paper’s headline becomes: “Mechanism-aware AML combination prediction wins specifically in driver-positive AML; this defines the precision-medicine patient population.”

Outputs

runs/head_to_head/
├── all_drugs/
│   ├── per_patient_delta.csv
│   ├── summary.json
├── clinical_drugs/
│   ├── per_patient_delta.csv
│   ├── summary.json
└── combined_summary.json

Next — Week 5

Independent-cohort validation on TCGA-LAML (n=173). Apply the same combo prediction pipeline using the TCGA-LAML 80-dim features; check whether the FLT3-mut precision-combo signal reproduces. TCGA-LAML has OS outcomes, so we can also test:

Patients whose actual treatment matched our combo recommendation had longer OS than patients whose actual treatment diverged.