Real-world data (RWD) analysis
一行要約
Real-world data analysis は EHR / claims / cancer registry / patient-reported outcome 等の non-trial 由来データを解析し、treatment effectiveness, sequencing, comparative outcome を導出する手法。RCT が enrollment 不足 / 倫理的に困難な領域 (rare driver, elderly, multi-line sequencing, post-marketing safety) で causal inference framework (target trial emulation, propensity score, IPTW) を活用して real-world evidence (RWE) を生成。FDA / PMDA も regulatory submission に RWE を accept するようになり、Flatiron-FDA pilot で post-marketing supplement に適用例が増加。
原理
データソース
| ソース | 特徴 |
|---|---|
| EHR (Flatiron, ConcertAI) | structured + NLP-derived clinical data, treatment / outcome 詳細 |
| Claims (Medicare, MDV, JMDC) | wide population, 費用 / utilization データ豊富、臨床精度限定 |
| Registry (SEER, NCDB, CGTNet) | population-based, long-term outcome、driver mutation 情報限定 |
| Hybrid (AACR GENIE) | NGS panel + clinical outcome リンク、large multi-center |
| PRO (PRO-CTCAE, EQ-5D) | patient symptom / QOL longitudinal |
因果推論手法
- Target trial emulation: 仮想 RCT を設計 → RWD で emulation、Hernán framework で immortal time bias 等を回避
- Propensity score matching / IPTW: 治療群間の baseline imbalance を補正
- Difference-in-differences: policy / approval 変更前後比較
- Instrumental variable: physician preference / regional variation を IV
- G-methods (g-formula, MSM, g-estimation): time-varying confounding 補正
- Negative control: outcome / exposure negative control で unmeasured confounding 推定
主要エビデンス / 適用領域
- NSCLC IO post-approval: KEYNOTE-024 後の real-world OS, 一次 chemo + IO sequencing の comparative effectiveness
- Rare driver (ROS1, RET, NTRK): 試験 sample size 不足を RWD で補完、approval 後の outcome
- Elderly NSCLC: ≥75 歳の RCT under-representation を RWD で補完
- Treatment sequencing: osimertinib 1L → 耐性後 sequencing pattern と outcome
- Regulatory RWE: FDA Project Pragmatica, PMDA RWD pilot で post-marketing supplement に活用
- Health equity / disparity: race / ethnicity / SES と outcome 関連 (試験 cohort では biased)
適用分野と限界
- 強み: large sample, generalizability, long-term outcome, rare disease 適用、external trial cohort 構築
- 限界: measured / unmeasured confounding, missing data (PS / outcome), data quality variability (EHR vs claims), driver / biomarker 情報の completeness, regulatory acceptance は contextual (post-marketing OK / first approval は限定的)
Open Questions
- AI-assisted EHR phenotyping: LLM-based abstraction の accuracy / regulatory pathway
- Multi-source linkage: claims + EHR + genomic + PRO の harmonization
- Synthetic control arm: RCT の control arm を RWD で代替する regulatory framework
- Causal inference reproducibility: target trial emulation の standardization