Week 1 Day 5 — Integration Report
Generated by combo_val.integration_check. Exit code 0 = all green.
1. Canonical files exist
✓ BeatAML patient features: data/canonical/beataml_patient_features.csv ✓ BeatAML drug response long: data/canonical/beataml_drug_response_long.csv ✓ BeatAML feature manifest: data/canonical/beataml_feature_manifest.json ✓ DrugComb strict pairs: data/canonical/drugcomb_aml_pairs.csv ✓ DrugComb loose pairs: data/canonical/drugcomb_aml_pairs_any_match.csv ✓ DrugComb monotherapy: data/canonical/drugcomb_aml_monotherapy.csv ✓ DrugComb alignment: data/canonical/drugcomb_drug_alignment.csv ✓ DrugComb manifest: data/canonical/drugcomb_filter_manifest.json ✓ TCGA patient features: data/canonical/tcga_laml_patient_features.csv ✓ TCGA clinical: data/canonical/tcga_laml_clinical.csv ✓ TCGA manifest: data/canonical/tcga_laml_manifest.json ✓ Baseline A predictions: runs/baseline_single_drug_mlp/predictions_all_patients_all_drugs.csv ✓ Baseline A metrics: runs/baseline_single_drug_mlp/cv_metrics.json
2. Schema compatibility
✓ BeatAML & TCGA feature schemas match exactly (80 cols)
3. Drug vocabulary consistency
BeatAML drug vocab: 165 drugs ✓ All 37 drug mentions in strict pairs exist in BeatAML vocab ✓ Baseline A predictions cover all 165 BeatAML drugs
4. Patient uniqueness
✓ BeatAML: 613 unique patient IDs ✓ TCGA: 173 unique patient IDs ✓ Baseline predictions: 613 unique patient IDs
5. Distribution sanity
✓ BeatAML AUC in [0.0, 286.3] (median 207.5); within 0-300 raw scale ✓ BeatAML mutations binary across 25 gene cols ✓ ELN ordinal values [np.float64(0.0), np.float64(0.5), np.float64(1.0), np.float64(1.5), np.float64(2.0)] in expected set
6. Week-3 readiness: combo predictor
✓ 186 strict pairs on 19 drugs — enough for small MLP ✓ Synergy Loewe distribution: mean=-12.95 std=14.70 (skewed synergistic — good signal for combo model)
7. Week-5 readiness: TCGA independent cohort
TCGA OS: 173 patients, deceased rate = 65.9%, median OS = 11.0 months
TCGA mutation coverage: 145/173 patients have ≥1 driver mutation
Verdict
All critical checks PASS — Week 1 data layer is ready for Week 2+ modeling.