Week 1 Day 5 — Integration Report

Generated by combo_val.integration_check. Exit code 0 = all green.

1. Canonical files exist

✓ BeatAML patient features: data/canonical/beataml_patient_features.csv ✓ BeatAML drug response long: data/canonical/beataml_drug_response_long.csv ✓ BeatAML feature manifest: data/canonical/beataml_feature_manifest.json ✓ DrugComb strict pairs: data/canonical/drugcomb_aml_pairs.csv ✓ DrugComb loose pairs: data/canonical/drugcomb_aml_pairs_any_match.csv ✓ DrugComb monotherapy: data/canonical/drugcomb_aml_monotherapy.csv ✓ DrugComb alignment: data/canonical/drugcomb_drug_alignment.csv ✓ DrugComb manifest: data/canonical/drugcomb_filter_manifest.json ✓ TCGA patient features: data/canonical/tcga_laml_patient_features.csv ✓ TCGA clinical: data/canonical/tcga_laml_clinical.csv ✓ TCGA manifest: data/canonical/tcga_laml_manifest.json ✓ Baseline A predictions: runs/baseline_single_drug_mlp/predictions_all_patients_all_drugs.csv ✓ Baseline A metrics: runs/baseline_single_drug_mlp/cv_metrics.json

2. Schema compatibility

✓ BeatAML & TCGA feature schemas match exactly (80 cols)

3. Drug vocabulary consistency

 BeatAML drug vocab: 165 drugs   ✓  All 37 drug mentions in strict pairs exist in BeatAML vocab   ✓  Baseline A predictions cover all 165 BeatAML drugs

4. Patient uniqueness

✓ BeatAML: 613 unique patient IDs ✓ TCGA: 173 unique patient IDs ✓ Baseline predictions: 613 unique patient IDs

5. Distribution sanity

✓ BeatAML AUC in [0.0, 286.3] (median 207.5); within 0-300 raw scale ✓ BeatAML mutations binary across 25 gene cols ✓ ELN ordinal values [np.float64(0.0), np.float64(0.5), np.float64(1.0), np.float64(1.5), np.float64(2.0)] in expected set

6. Week-3 readiness: combo predictor

✓ 186 strict pairs on 19 drugs — enough for small MLP ✓ Synergy Loewe distribution: mean=-12.95 std=14.70 (skewed synergistic — good signal for combo model)

7. Week-5 readiness: TCGA independent cohort

 TCGA OS: 173 patients, deceased rate = 65.9%, median OS = 11.0 months
 TCGA mutation coverage: 145/173 patients have ≥1 driver mutation

Verdict

All critical checks PASS — Week 1 data layer is ready for Week 2+ modeling.