Published on

Backtesting through Cross-Validation (CPCV)

Authors
Table of Contents

Backtesting through Cross-Validation (CPCV)

This chapter contrasts the three primary methods for backtesting a quantitative strategy. It argues that the most common method (Walk-Forward) is flawed and easily overfit, while standard Cross-Validation has its own drawbacks. It concludes by introducing a new, more robust method called Combinatorial Purged Cross-Validation (CPCV).


The Walk-Forward (WF) Method

  • What It Is: A standard historical simulation. It trains on data from [0, t] and tests on data at t+1, moving forward in time. This is the most common form of "backtesting."
  • Advantages:
    1. Has a clear, intuitive historical interpretation.
    2. Guarantees no information leakage (if purging is used correctly) because the training set always predates the testing set.
  • Disadvantages (Critical):
    1. Tests a Single Path: It only tests the one historical scenario that happened, which is easily overfit.
    2. Path-Dependent Overfitting: The model's performance is highly dependent on the sequence of historical events (e.g., a 2007-2017 backtest is different from a 2017-2007 backtest).
    3. Uneven Information: Decisions at the beginning of the backtest are based on much less data than decisions at the end, making results inconsistent.

The Cross-Validation (CV) Method

  • What It Is: This method tests a model's performance on "stress scenarios." It splits data into kk sets, then trains on k1k-1 sets and tests on the 1 held-out set. For example, it might train on 2009-2017 data and then test on the 2008 crisis.
  • Goal: The goal is not historical accuracy, but to see how a model (trained on "normal" data) would perform under an unknown stress event.
  • Advantages:
    1. Tests kk different scenarios, not just the single historical path.
    2. Every decision is made using an equal amount of training data.
    3. Uses the entire dataset for testing (no warm-up period).
  • Disadvantages:
    1. Still only produces a single backtest path (by stitching the kk tests together).
    2. Leakage is a high risk because the training set can contain future data. Requires purging and embargoing (from Ch. 7).

The Combinatorial Purged Cross-Validation (CPCV) Method

This is the author's novel method, designed to fix the flaws of WF and CV by testing multiple paths to generate a distribution of performance metrics, not just a single number.

  • What It Is:

    1. The data is split into NN groups.
    2. A test-set size of kk groups is chosen (where kN/2k \le N/2).
    3. The algorithm then generates all possible combinations of training/testing splits. For each split, it trains on NkN-k groups and tests on kk groups.
    4. All training sets are purged (and embargoed) to prevent leakage.
    5. This combinatorial process generates φ\varphi unique, full-length backtest paths.
  • Number of Paths (φ\varphi): The number of unique backtest paths generated is:

    φ[N,k]=kN(NNk)=i=1k1(Ni)(k1)!\varphi[N, k]=\frac{k}{N}\left(\begin{array}{c} N \\ N-k \end{array}\right)=\frac{\prod_{i=1}^{k-1}(N-i)}{(k-1) !}
    • Example: Using k=2k=2 (testing on 2 groups at a time) is a powerful "sweet spot." It generates φ[N,2]=N1\varphi[N, 2] = N-1 paths while keeping the training set size large.
  • How It Solves Overfitting:

    • WF and CV produce a single Sharpe Ratio (SR), yiy_i. This yiy_i has a high variance (σ2(yi)\sigma^2(y_i)) and is easily "cherry-picked" (selection bias).
    • CPCV generates φ\varphi different SRs for the same strategy. It produces a distribution of performance, allowing us to analyze the mean SR, μi\mu_i.
    • The variance of this mean, σ2(μi)\sigma^2(\mu_i), is much lower than the variance of a single backtest.
      σ2(μi)=φ1σi2(1+(φ1)ρˉi)\sigma^{2}(\mu_{i}) = \varphi^{-1} \sigma_{i}^{2}\left(1+(\varphi-1) \bar{\rho}_{i}\right)
    • Because the variance is so much lower, it is much harder to find a "false discovery." CPCV defeats backtest overfitting by forcing the strategy to prove its profitability across many different scenarios (paths), not just the single historical one.

Cross-Validator Design in RiskLabAI

To make these complex cross-validation strategies easy to use and interchangeable, we implement a Factory and Controller design pattern.

  • CrossValidator (Interface): An abstract base class that defines the common API all validators must implement (split, backtest_paths, backtest_predictions). This ensures that any validator can be used in the same way.
  • CrossValidatorFactory: A simple factory class that constructs the correct validator instance (e.g., 'purgedkfold' or 'combinatorialpurged') based on a string name.
  • CrossValidatorController: A high-level controller that acts as the main user-facing class. It uses the factory to create and hold a validator instance, simplifying the workflow.

This design allows a user to switch from a WalkForward backtest to a CombinatorialPurged backtest by changing only one line of code (the validator_type string), promoting rapid and robust experimentation.

API reference

RiskLabAI implements these in Python and Julia (signatures auto-generated from the package source):

PythonJulia
class KFold(CrossValidator):
struct KFoldCV
    n_splits::Int
    shuffle::Bool
    rng::AbstractRNG
end

KFoldCV(n_splits::Integer; shuffle::Bool = false, random_seed = nothing) = KFoldCV(
    n_splits,
    shuffle,
    random_seed === nothing ? default_rng() : MersenneTwister(random_seed),
)
class PurgedKFold(CrossValidator):
struct PurgedKFoldCV
    n_splits::Int
    event_starts::Vector
    event_ends::Vector
    embargo::Float64
end

PurgedKFoldCV(
    n_splits::Integer,
    event_starts::AbstractVector,
    event_ends::AbstractVector;
    embargo::Real = 0.0,
) = PurgedKFoldCV(n_splits, collect(event_starts), collect(event_ends), Float64(embargo))
class CombinatorialPurged(PurgedKFold):
struct CombinatorialPurgedCV
    n_splits::Int
    n_test_groups::Int
    event_starts::Vector
    event_ends::Vector
    embargo::Float64
end

function CombinatorialPurgedCV(
    n_splits::Integer,
    n_test_groups::Integer,
    event_starts::AbstractVector,
    event_ends::AbstractVector;
    embargo::Real = 0.0,
)
class WalkForward(KFold):
struct WalkForwardCV
    n_splits::Int
    max_train_size::Union{Nothing,Int}
    gap::Int
end

WalkForwardCV(n_splits::Integer; max_train_size = nothing, gap::Integer = 0) =
def bagging_classifier_accuracy(N: int, p: float) -> float:
function bagging_classifier_accuracy(N::Integer, p::Real)
class BaggingClassifierAccuracy:
    """
    Evaluates a bagging classifier's accuracy using different
    weighting schemes based on decision tree c_i scores.

    Methods:
    - fit: Fits the bagging classifier.
    - calculate_c_i: Calculates the c_i score for each tree.
    - calculate_weights: Computes weights (uniform, c_i, 1-c_i^2).
    - predict: Predicts class labels using specified weights.
    - evaluate_all_schemes: Gets accuracy for all weighting schemes.
    """

    def __init__(
        self,
        n_estimators: int = 1000,
        max_samples: int = 100,
        max_features: float = 1.0,
        random_state: Optional[int] = None,
    ):
function fit_bagging(
    x::AbstractMatrix{<:Real},
    y::AbstractVector;
    n_estimators::Integer = 1000,
    max_samples::Integer = 100,
    max_features::Integer = 1,
    random_state = nothing,
)
def calculate_bootstrap_accuracy(
    clf: BaggingClassifier, X: pd.DataFrame, y: pd.Series, n_bootstraps: int = 1000
) -> tuple[np.ndarray, float, float]:
function calculate_bootstrap_accuracy(
    trees,
    classes,
    x::AbstractMatrix{<:Real},
    y::AbstractVector;
    weights::AbstractVector{<:Real} = fill(1.0 / length(trees), length(trees)),
    n_bootstraps::Integer = 1000,
    random_state = nothing,
)
    def backtest_predictions(
        self,
        estimator: Union[Estimator, dict[str, Estimator]],
        data: Union[pd.DataFrame, dict[str, pd.DataFrame]],
        labels: Union[pd.Series, dict[str, pd.Series]],
        sample_weights: Optional[Union[np.ndarray, dict[str, np.ndarray]]] = None,
        predict_probability: bool = False,
        n_jobs: int = 1,
    ) -> Union[dict[str, np.ndarray], dict[str, dict[str, np.ndarray]]]:
function cross_val_score(
    cv,
    x::AbstractMatrix{<:Real},
    y::AbstractVector;
    n_trees::Integer = 100,
    n_subfeatures::Integer = -1,
    max_depth::Integer = -1,
    scoring::Symbol = :accuracy,
    random_state::Integer = 0,
)

Full source: Python · Julia