Week 1 Day 2 — Baseline A: single-drug MLP

Summary

Baseline A trained successfully and passes the pre-registered quality gate (per-patient Spearman ≥ 0.40) with a large margin. This is the strong baseline the combo predictor (Week 3) will have to beat.

Architecture

Multi-task MLP, ~50K parameters:

Patient features (80) → [Patient MLP 128 → 64] → patient_emb
                                                    │
Drug ID (0..164)      → [Drug Embedding 64]  → drug_emb
                                                    │
                              concat (128)          │
                                   │                │
                        [Response Head 128→64→1]    │
                                   │                │
                              predicted AUC ◄───────┘

5-fold CV by patient (no leakage)

Fold	best epoch	MAE	per-patient ρ	per-drug median ρ	wall time
1	23	34.15	0.698	0.278	137 s
2	29	35.75	0.709	0.226	149 s
3	20	36.39	0.681	0.286	110 s
4	29	35.69	0.700	0.369	138 s
5	28	33.21	0.730	0.354	133 s
mean	26	35.04	0.704	0.303	134 s
std	4	1.27	0.018	0.061

Pre-registered gates

Gate	Threshold	Actual	Verdict
Per-patient Spearman	≥ 0.40	0.704	✅ PASS (+76%)
MAE (AUC 0-300 scale)	< 50	34.6	✅
Std across folds	< 0.05	0.018	✅ extremely stable

Per-drug predictability (most-predictable drugs)

Drugs with per-drug Spearman > 0.40:

Drug	n_patients	Spearman	Biology
Venetoclax	382	0.580	BCL2i, AML SOC anchor
Sunitinib	513	0.537	Multi-kinase inhibitor
Sorafenib	518	0.520	FLT3i / RAF
Cabozantinib	469	0.512	VEGFR / FLT3
KW-2449	468	0.506	FLT3 / Aurora B
Tivozanib	467	0.500	VEGFR
Dasatinib	518	0.500	BCR-ABL / SRC
Dovitinib	474	0.494	RTK
Foretinib	472	0.487	MET / VEGFR
Selumetinib	475	0.482	MEK

Biological pattern: the most-predictable drugs are kinase inhibitors (FLT3, multi-RTK) and BCL2i — exactly the drugs whose response is tightly coupled to the FLT3/NPM1/BCL2-status features we encoded. This is a clean sanity check that the model is learning real biology, not memorizing noise.

Per-patient predictability distribution

485 patients have ≥ 5 drug measurements in hold-out across the 5 folds
304/485 (63%) have Spearman > 0.7 (excellent)
16/485 (3.3%) have Spearman < 0.3 (hard-to-predict outliers — likely those with sparse or atypical drug panels)

Outputs

runs/baseline_single_drug_mlp/
├── final_model.pt                                  # checkpoint + scaler + drug vocab
├── cv_metrics.json                                 # fold-level + overall
├── cv_held_out_predictions.csv                     # held-out pred + true, all folds
├── per_patient_spearman.csv                        # per-patient ρ
├── per_drug_spearman.csv                           # per-drug ρ
└── predictions_all_patients_all_drugs.csv          # 613 × 165, used in Week 4

Implications for Week 4

The head-to-head comparison becomes:

for each patient p in held-out:
    best_single(p) = min over drugs of predictions_all_patients_all_drugs[p, :]
    best_combo(p)  = min over legal pairs of combo_predictor(p, d1, d2)
    Δ(p) = best_single(p) - best_combo(p)

A baseline with ρ=0.70 means best_single is a genuinely strong opponent — winning against it requires the combo predictor to carry information beyond single-drug matching, not just learn the same patterns. This strengthens the scientific value of whatever the head-to-head shows.

Next (Day 3) — DrugComb AML subset

User started the download. Once summary_v_1_5.csv is in place:

python -m combo_val.data.drugcomb_etl \
  --input data/raw/drugcomb/summary_v_1_5.csv \
  --out data/canonical/drugcomb_aml_pairs.csv

I’ll write the ETL to filter to AML cell lines and map drug names to the BeatAML drug vocabulary (needed so DrugComb-trained residual can apply to BeatAML patients’ drug space).