Methods & Research Background
Study design, model selection rationale, validation, and limitations.
Derivation Cohort
Çam ve Sakura
n=707 patients · 27 MACE events
External Validation Cohort
Siyami Ersek
n=378 patients · 38 MACE events
Endpoint: Major Adverse Cardiovascular Events (MACE) within 30 days of non-cardiac surgery, including cardiac death, non-fatal MI, and non-fatal stroke.
This is a retrospective, observational, two-center Turkish cohort study. ML models were trained on the derivation cohort and performance was assessed on the completely independent Siyami Ersek external validation cohort.
Ten ML classifiers were trained and externally validated. Two are presented as patient-facing probabilities based on the following hierarchy:
| Model / Score | Role | AUROC | Brier | Cal. slope | NB@10% |
|---|---|---|---|---|---|
| HistGradientBoostingDisplayed | Study ML (default) | 0.694 | 0.087 | 0.813 | 0.027 |
| GradientBoostingDisplayed | Sensitivity ML | 0.707 | 0.088 | 0.991 | 0.015 |
| NaiveBayes | Highest AUROC — not deployed | 0.738 | 0.090 | 2.139 | 0.000 |
| AUB-HAS2Displayed | Best clinical benchmark | 0.690 | 0.092 | 0.557 | — |
| RCRIDisplayed | Legacy comparator | 0.583 | 0.094 | 0.879 | — |
HistGradientBoosting (default ML): Best Brier score (0.087) and strongest net benefit at 10% and 15% decision thresholds — the most clinically actionable model for perioperative decisions.
GradientBoosting (sensitivity ML): Higher AUROC (0.707) and near-ideal calibration slope (0.991) — provides a second independent probability estimate for sensitivity analysis.
We do not claim ML superiority over AUB-HAS2. Both approaches are presented together for comparison.
NaiveBayes achieved the highest AUROC in external validation (0.738), which might suggest it is the best model. However, AUROC alone is insufficient to judge clinical utility.
Calibration slope: 2.139
Ideal = 1.0. A slope of 2.1 means the model dramatically overestimates risk for high-risk patients and underestimates for low-risk — predicted probabilities cannot be trusted as absolute risks.
Net benefit @ 10–15%: ≈0
Decision curve analysis shows zero net benefit at the clinically relevant 5–15% threshold range. Using this model would provide no clinical benefit over treating all or no patients.
O:E ratio: 2.033
Observed-to-expected ratio of 2.0 indicates severe miscalibration — the model predicts twice as many events as actually occurred.
Lesson: A high AUROC only means the model distinguishes high-risk from low-risk patients in rank order. It says nothing about whether the absolute probabilities are trustworthy. For shared decision-making and risk communication, calibration and net benefit matter more.
PreOpNet is NOT included in this calculator.
ECG upload and PreOpNet predictions are explicitly excluded from patient-facing risk estimation in this tool.
Our study also investigated a digitized printed-ECG AI model (PreOpNet). Key findings:
- ECG-AI probabilities derived from digitized printed ECGs were not calibrated absolute risk estimates.
- External MACE discrimination was weak.
- Clinical + ECG-AI models showed no incremental value over clinical variables alone.
These findings may be summarized on a separate research information page if needed, but ECG-AI predictions must not be presented as patient-facing MACE probabilities.
AUB-HAS2 Score (0–6)
0–1: Low risk
2–3: Intermediate risk
>3: High risk
RCRI Score (0–6)
0: Low risk
1: Low risk
2: Elevated risk
≥3: High risk
Probabilities shown in the calculator use local derivation-cohort logistic calibration mappings when available, rather than originally published estimates. References: AUB-HAS2 (PMC7660845), RCRI (PubMed 10477528).
- →Two-center, retrospective observational study — findings may not generalize across all surgical populations or healthcare systems.
- →Limited event count in derivation cohort (27 MACE events in 707 patients), which constrains model stability and generalizability.
- →External calibration drift observed — absolute probabilities were systematically higher than actual event rates in the validation cohort.
- →Several important variables (troponin, pro-BNP) had high missingness and were excluded as mandatory inputs.
- →Research prototype only — this calculator is intended for research and educational purposes and does not replace guideline-based perioperative risk assessment.
- →ML pipeline imputes missing values using training-cohort medians; predictions with many missing variables may be unstable.