Exploring Conformal Forecasting of Financial Time Series
In markets where distributional assumptions routinely fail, conformal forecasting offers a principled way to quantify uncertainty and control risk. MAPIE (Model-Agnostic Prediction Interval Estimator), an open-source Python library in the scikit-learn-contrib ecosystem, brings these guarantees to everyday ML workflows. Its value for finance is simple and powerful: instead of point predictions that hide uncertainty, MAPIE produces prediction intervals (regression) or prediction sets (classification) with finite-sample coverage guarantees—no strong distributional assumptions required.
Why conformal prediction matters in finance
Conformal prediction (CP) constructs sets or intervals that contain the true outcome with a user-chosen probability (for example, 90%). Crucially, CP is “distribution-free”: it assumes only exchangeability between training and test data, a weaker condition than i.i.d. That’s a compelling fit for financial time series, which are heavy-tailed, heteroscedastic, and often misspecified by classical models. While traditional prediction intervals rely on assumptions like normal residuals, CP offers finite-sample guarantees that hold even when those assumptions break.
In trading terms, CP transforms a single Buy/Sell label into a prediction set—{Buy}, {Sell}, or {Buy, Sell}. The smaller the set, the more informative the forecast. A single-class set ({Buy}) signals confidence; a multi-class set ({Buy, Sell}) flags ambiguity. This explicit uncertainty is actionable: take larger positions on singletons, downsize or abstain on ambiguous calls, and calibrate overall risk to market conditions.
Marginal vs conditional coverage—and why Mondrian helps
CP guarantees marginal coverage on average over all data. But financial labels are typically imbalanced (e.g., fewer Buy opportunities). You might hit 90% marginal coverage while systematically under-covering the rare, high-impact class. Mondrian Conformal Prediction addresses this by computing class-conditional quantiles, delivering targeted coverage per class. For trading, that means fairer confidence across Buy and Sell—reducing the chance that rare but critical signals are under-protected.
From conformity scores to operational robustness
Conformity scores quantify how “atypical” a sample is relative to calibration data. MAPIE implements a range of scores for regression and classification, including LAC, APS, and RAPS, each with different behavioral trade-offs:
- LAC is simple but can produce empty sets under high uncertainty—unhelpful when you must make a decision.
- APS and RAPS avoid empty sets, ensuring the system always outputs a plausible set, enabling fallback behaviors like “hold” or “reassess.”
High conformity scores can also highlight market anomalies—regime shifts, liquidity shocks, or news events—serving as an early warning signal beyond raw model accuracy.
MAPIE in practice: model-agnostic, pipeline-friendly
MAPIE is compatible with any estimator following the scikit-learn API, including TensorFlow and PyTorch via wrappers. That makes it easy to drop into existing research and production stacks. Key features include:
- Calibration set: a held-out dataset used to compute conformity scores and quantile thresholds.
- cv options: split, cross-validation, or prefit. The prefit path is ideal for real-time systems where base models are trained elsewhere and quickly “conformalized” without retraining.
- conformity_score: a modular interface (superseding the older method parameter) that lets you tailor how mismatch is measured—useful when misclassifying Buy vs Sell carries asymmetric costs.
Time-series caveat: while CP assumes exchangeability, practitioners can approximate it via careful temporal splitting (e.g., rolling or blocked schemes) and frequent recalibration to track drift. The result is a pragmatic balance between theoretical guarantees and market reality.
Decision engineering: from point forecasts to gated actions
A common workflow is to pair a base model with a “meta” layer that gates trades using conformal prediction sets:
- Train a baseline classifier (e.g., Random Forest) on historical features.
- Reserve a calibration window, compute conformity scores, and derive class-specific thresholds.
- Generate prediction sets for new samples. Treat singletons as high-confidence trades and multi-class sets as low-confidence.
Teams often go a step further: they relabel or filter training data for the base model using MAPIE’s outputs, keeping only samples where the base prediction and label align within high-confidence sets. This “self-cleaning” loop reduces label noise, improves generalization in non-stationary regimes, and yields more stable downstream performance. The meta learner then focuses on reliably greenlighting trades, while the base model benefits from a higher-quality training subset.
The ultimate risk knob: confidence level
Coverage is tunable. A 0.9 confidence level yields fewer, more reliable singletons and more abstentions; a 0.6 level opens the aperture—more trades, more risk, and typically deeper drawdowns. Adjusting this threshold lets you align trading frequency and tail-risk tolerance with mandate and market conditions. In backtests (e.g., EURUSD 2020–2025, plus prior regimes), this calibration visibly shifts the proportion of single-element vs multi-element sets and the shape of the equity curve.
What this changes for governance and adoption
Because MAPIE provides distribution-independent, finite-sample guarantees, it moves model risk management from best-effort heuristics to quantified error control. It is not a pattern-finding method; it is an uncertainty lens for any model you already trust. That distinction matters: it helps risk teams set explicit tolerances, auditors verify procedures, and researchers ship models with traceable, defensible confidence bounds.
Takeaways
- Conformal prediction turns black-box scores into verifiable prediction sets, better aligned with financial risk.
- Mondrian CP combats class imbalance by enforcing class-conditional coverage—vital for rare, high-impact trades.
- Choice of conformity score (LAC vs APS/RAPS) determines operational behavior under uncertainty.
- MAPIE’s prefit mode and scikit-learn compatibility ease integration into low-latency and batch pipelines.
- Confidence level is a first-class control for trade frequency and drawdown risk.
Bottom line: conformal forecasting, operationalized via MAPIE, helps transform ML signals into disciplined, coverage-aware trading decisions—bridging the gap between predictive modeling and real-world risk control in financial time series.