Skip to content

Forecasting layer

This page covers the v1 forecasting surface (rolling-ridge). A second-generation forecast layer with tree / NN predictors is planned and not yet shipped.

A rolling-fit linear predictor (RollingRidge) exposed to .qe configs as predict_return(...). The motivation is to start data-driven strategy research from a known baseline — ridge regression — that any later ML approach has to beat to justify its complexity.

Quick start

backtest(
  data = yahoo("SPY", "1d", "2018-01-01", "2024-12-31"),
  strategy = signal(
    entry = predict_return(
              252, 1.0,
              lag_return(close, 1),
              lag_return(close, 5),
              rolling_zscore(close, 20),
              rsi(close, 14) / 100.0
            ) > 0.001,
    exit  = predict_return(
              252, 1.0,
              lag_return(close, 1),
              lag_return(close, 5),
              rolling_zscore(close, 20),
              rsi(close, 14) / 100.0
            ) < 0,
    symbol = "SPY",
  ),
  execution = execution(capital = 100_000),
  output    = output(results = "out/forecast.json"),
)

What this does, per bar:

  1. Compute the four feature values from the bar's close.
  2. If a prior bar's features are stashed, push (features_{t-1}, return_t) into the ridge — keeping the training pair causal — and refit.
  3. Predict return_{t+1} from this bar's features.
  4. Compare the prediction to the entry / exit thresholds, which becomes the strategy's signal for the next bar's fill.

The first lookback + feature_warmup bars produce NaN forecasts; the strategy stays flat through warm-up.

API

predict_return(lookback, alpha, f1, f2, ..., fN)

Arg Type Constraints
lookback int literal > 0, >= n_features + 1
alpha double literal >= 0 (use 0 for plain OLS)
f1..fN signal expr at least one; up to 64 in v1

Returns a double — the next-bar forecast. Warm-up returns NaN; any feature being NaN this bar also returns NaN and breaks the causal training chain for one bar (better than corrupting the ring with garbage). Each call site allocates an independent RollingRidge instance; an entry and exit that both call predict_return(...) do not share state in v1 — they train in parallel on the same data and converge to similar coefficients.

Feature primitives

Function Output
lag_return(price, k) (p_t - p_{t-k}) / p_{t-k}; NaN k bars
rolling_vol(x, n) n-window sample std; NaN n-1 bars
rolling_zscore(x, n) (x_t - mean_n) / std_n; 0 on constants

All are per-call-site stateful, O(1) per push for lag_return, O(n) for the rolling stats — well under the per-bar budget for any reasonable window.

Walk-forward validation

The forecasting layer composes with the walk_forward(...) harness. Wrap a backtest(...) that uses predict_return in walk-forward windows; each test slice reports OOS metrics that are not in the predictor's training set:

walk_forward(
  base = backtest(
    data = yahoo("SPY", "1d", "2018-01-01", "2024-12-31"),
    strategy = signal(
      entry = predict_return(60, 1.0, lag_return(close, 1)) > 0.001,
      exit  = predict_return(60, 1.0, lag_return(close, 1)) < 0.0,
      symbol = "SPY",
    ),
    execution = execution(capital = 100_000),
  ),
  train_window = "365d",
  test_window  = "90d",
  step_window  = "90d",
)

Read the resulting results.json's walk_forward.oos_metrics: if in-sample sharpe is high but OOS sharpe sits near zero, the predictor is fitting noise. This is the canonical overfit smell-test the forecasting layer is designed to support.

Inspecting forecasts in the dashboard

Opt in by setting record_forecast = true on execution(...):

execution(
  capital         = 100_000,
  commission_bps  = 1,
  record_forecast = true,
)

qe_run then drains the per-bar (y_hat, y_realized) trace from every predict_return(...) call site after the run and appends a forecasts[] block to results.json:

"forecasts": [
  {
    "id":         "entry#0",
    "lookback":   60,
    "alpha":      1.0,
    "n_features": 4,
    "metrics": {
      "rmse":                 0.0142,
      "directional_accuracy": 0.51,
      "r2_vs_naive":          0.03,
      "n":                    243
    },
    "series": [
      { "ts_ns": ..., "bar_index": ..., "y_hat": ..., "y_realized": ... },
      ...
    ]
  }
]

The F4 BCKT screen's bottom-right slot detects the block and switches into a scatter view: y_hat on x, y_realized on y, with a 45° identity line and a y=0 reference. The header badge restates n / directional accuracy / RMSE / R² vs naïve. Points above the identity line are bars where the model under-predicted the realized return; the top-right and bottom-left quadrants are direction-correct.

Cost note: each trace point is 24 bytes on disk and a few hundred bytes per render. For 60-bar smoke runs this is free; for 1M-bar minute runs you'll add ~25 MB to results.json. Leave the flag off unless you actually want the overlay.

Limitations of the v1 overlay: - Only the first call site is rendered (typical use: entry#0 and exit#0 are identical predictors, so the picture is the same). A selector across call sites is a follow-up. - walk_forward(...) doesn't currently drain predictor traces through its window-stitched runner — the overlay shows up on single-pass backtests only.

Limitations (v1)

  • Target is hardcoded to next-bar return. To predict other targets (volatility, multi-bar return, direction probability), v2 will introduce a target = ... arg.
  • Method is ridge-only. alpha = 0 recovers OLS via the Tikhonov fallback. Lasso, Bayesian regression, and tree-based models (XGBoost, LightGBM) are planned as separate predict_* builtins.
  • No cross-sectional models. The predictor takes per-bar features from one symbol; cross-asset (e.g. residualize on a market factor) needs a future multi-symbol predict primitive.
  • No shared state across signal positions. An entry and exit expression that both call predict_return(...) instantiate independent ridges. Practical impact: ~2× compute for the same fit; no correctness issue.
  • No live coefficient inspection from the dashboard yet. The F4 BCKT screen renders the OOS curve when walk-forward is used; an overlay panel showing forecast-vs-realized is a follow-up.

Implementation pointers

Concern File
Closed-form ridge solver include/qe/forecast/rolling_ridge.hpp
Feature indicators include/qe/indicators/lag_return.hpp
include/qe/indicators/rolling_stats.hpp
DSL builtin registration src/dsl/env.cpp
Per-bar dispatch src/dsl/evaluator.cpp (case PredictReturn)
Warmup math src/dsl/analysis.cpp
End-to-end smoke tests/fixtures/forecast_smoke.qe

Pitfalls

  • lookback too small: with 5 features + intercept = 6 coefficients, lookback = 6 is the minimum but the design matrix will be rank-deficient on any colinear inputs; bump to at least 4 × n_features for sane fits. The binder rejects lookback < n_features + 1 to catch the obviously broken case.
  • Highly correlated features: the ridge is robust against collinearity at the LDLT level, but the learned slopes become noisy. Prefer a small set of orthogonal-ish features (lag_return at different horizons, a z-score, RSI normalized).
  • Refit cost on minute bars: at lookback 1000 and 5 features RollingRidge::fit() runs in ~250 µs. For 1M bars that's about 4 minutes wall time of pure refit — acceptable for one-shot research, painful inside a sweep × walk-forward grid. Drop the lookback or thin the features when that bites.