Financial Backtesting and the Curse of Overfitting

Even if you manage to avoid all the above pitfalls, your backtesting may still lead to false positives due to multiple testing, selection bias, or overfitting. Overfitting happens when a strategy is tailored too closely to historical data, making it unlikely to perform well on new, unseen data.

A Technical Solution: Combinatorially Symmetric Cross-Validation (CSCV)

CSCV is a technique that uses combinations of submatrices created from the performance metrics of various trials. These submatrices are then used to train and test the model to evaluate the likelihood of backtesting overfitting.

def probability_of_backtest_overfitting(
    performances: np.ndarray, 
    n_partitions: int = 16, 
    risk_free_return: float = 0.0,
    metric: Callable = None, 
    n_jobs: int = 1
) -> Tuple[float, np.ndarray]:
    if n_partitions % 2 == 1:
        raise ValueError("Number of partitions must be even.")
    
    if metric is None:
        metric = sharpe_ratio
    
    _, n_strategies = performances.shape
    partitions = np.array_split(performances, n_partitions)
    partition_indices = range(n_partitions)
    partition_combinations_indices = list(combinations(partition_indices, n_partitions // 2))

    results = Parallel(n_jobs=n_jobs)(
        delayed(performance_evaluation)(
            np.concatenate([partitions[i] for i in train_indices], axis=0),
            np.concatenate([partitions[i] for i in partition_indices 
                            if i not in train_indices], axis=0),
            n_strategies, 
            metric, 
            risk_free_return
        ) 
        for train_indices in partition_combinations_indices
    )

    results = np.array(results)  

    pbo = results[:, 0].mean(axis=0)
    logit_values = results[:, 1]
    
    return pbo, logit_values

View More: Julia | Python

These functionalities are available in both Python and Julia in the RiskLabAI library.

Mathematical Formula for CSCV

To calculate the total number of combinations, use:

\binom{S}{\frac{S}{2}} = \prod_{i=0}^{\frac{S}{2}-1} \frac{S-i}{\frac{S}{2}-i}

References

De Prado, M. L. (2018). Advances in financial machine learning. John Wiley & Sons.
De Prado, M. M. L. (2020). Machine learning for asset managers. Cambridge University Press.