Ensemble Methods

This chapter explains why ensemble methods are effective and how to avoid common errors when applying them to finance. Ensembles combine multiple "weak learners" to create a single "strong learner" that performs better by reducing bias and/or variance.

The Three Sources of Errors

All ML models suffer from three types of errors. The goal is to minimize their sum.

Bias: Error from unrealistic assumptions (causes underfitting).
Variance: Error from sensitivity to small changes in the training data (causes overfitting).
Noise: Irreducible error ( $\sigma_{\varepsilon}^{2}$ ) from the data itself.

The Mean-Squared Error (MSE) of an estimator $\hat{f}(x)$ can be decomposed as:

\mathrm{E}\left(\left(y_{i}-\hat{f}\left(x_{i}\right)\right)^{2}\right)=(\underbrace{\mathrm{E}\left(\hat{f}\left(x_{i}\right)-f\left(x_{i}\right)\right)}_{\text {bias }})^{2}+\underbrace{\mathrm{V}\left(\hat{f}\left(x_{i}\right)\right)}_{\text {variance }}+\underbrace{\sigma_{\varepsilon}^{2}}_{\text {noise }}

Bootstrap Aggregation (Bagging)

Bagging is an ensemble method designed primarily to reduce variance (overfitting).

Process:
1. Generate $N$ training datasets by random sampling with replacement (bootstrapping).
2. Fit $N$ independent estimators, one on each dataset (can be done in parallel).
3. The final forecast is the average (for regression) or majority vote (for classification).
Variance Reduction: The variance of the bagged forecast is a function of the average variance ( $\bar{\sigma}^2$ ) and average correlation ( $\bar{\rho}$ ) of the individual estimators:
$\mathrm{V}\left(\frac{1}{N} \sum_{i=1}^{N} \varphi_{i}[c]\right) = \bar{\sigma}^{2}\left(\bar{\rho}+\frac{1-\bar{\rho}}{N}\right)$
Key Insight: Bagging is only effective if the estimators are not perfectly correlated ( $\bar{\rho} < 1$ ). As $N$ (the number of estimators) increases, the variance is reduced, converging to $\bar{\sigma}^{2}\bar{\rho}$ .
Improved Accuracy: Bagging can improve accuracy if the individual classifiers are better than random chance ( $p > 1/k$ , where $k$ is the number of classes). The probability of a correct majority vote $X$ (where $X > N/k$ ) is:
$\mathrm{P}\left(X>\frac{N}{k}\right) = 1-\sum_{i=0}^{\lfloor N / k\rfloor}\left(\begin{array}{c} N \\ i \end{array}\right) p^{i}(1-p)^{N-i}$

Standard deviation of the bagged prediction

Random Forest (RF)

Random Forest is a specific type of bagging that uses decision trees as the weak learners. It is designed to combat the high variance (overfitting) tendency of individual trees.

Key Difference from Bagging: RF introduces a second level of randomness. At each node split, it only evaluates a random subsample of the features.
Purpose: This further decorrelates the individual trees (lowers $\bar{\rho}$ ), leading to a more significant reduction in variance.
Problem in Finance: RF still suffers from the observation redundancy problem. If samples are redundant, RF will build many identical, overfit trees ( $\bar{\rho} \approx 1$ ).
Solutions:
1. Use BaggingClassifier on a DecisionTreeClassifier and set max_samples to the average uniqueness of the labels.
2. Modify the RF algorithm to use the Sequential Bootstrap (from Ch. 4) instead of standard bootstrapping.

Boosting

Boosting is an ensemble method designed to reduce both bias (underfitting) and variance.

Process:
1. Estimators are fit sequentially.
2. At each step, the algorithm increases the sample weights of misclassified observations.
3. This forces subsequent estimators to focus on the "hard" examples that were previously wrong.
4. The final forecast is a weighted average of the estimators, giving more weight to those with higher accuracy.

Bagging vs. Boosting in Finance

Bagging: Addresses overfitting (variance). It is parallelizable.
Boosting: Addresses underfitting (bias). It is sequential.

Conclusion: In finance, the signal-to-noise ratio is very low, making overfitting the primary concern. Therefore, bagging is generally preferable to boosting for financial applications.

Bagging for Scalability

Bagging can also be used as a practical tool to apply non-scalable algorithms (like SVMs) to very large datasets.

Method: Use BaggingClassifier with a base estimator (e.g., SVM) but force an early stopping condition (like setting a low max_iter).
Result: This transforms one large, slow, sequential task into many small, fast, parallel tasks. The bagging process compensates for the high variance introduced by the early stopping.

Bagging classifier accuracy as a function of p and N

Weighted Voting in Bagging Ensembles

While standard bagging (like Random Forest) uses a simple majority vote (uniform weighting), we can also explore weighted voting schemes. The goal is to give more influence to estimators that are more accurate or less correlated. In our RiskLabAI.ensemble.empirical_bagging_accuracy module, we provide a class to analyze these schemes, as described in Chapter 6, Section 6.5.

Methodology

We evaluate three weighting schemes based on the in-sample accuracy ( $c_i$ ) of each individual estimator:

Uniform: $w_i = 1/N$ . This is the standard bagging vote.
Accuracy-weighted: $w_i \propto c_i$ . Gives more weight to trees that had higher in-sample accuracy.
Variance-weighted: $w_i \propto (1 - c_i^2)$ . Gives more weight to trees with accuracy not close to 1, as $c_i=1$ implies zero variance (overfit) and $c_i=0.5$ implies maximum variance.

Implementation

The BaggingClassifierAccuracy class implements this analysis. It fits a BaggingClassifier and provides methods to calculate_c_i (in-sample accuracy for each tree) and calculate_weights for all three schemes. The predict method can then generate predictions using a specified weighting scheme, and evaluate_all_schemes compares their out-of-sample performance.
The module also includes calculate_bootstrap_accuracy, which estimates the mean and standard deviation of a model's accuracy by bootstrapping the test set. This helps assess the stability of the performance metric itself.

API reference

RiskLabAI implements these in Python and Julia (signatures auto-generated from the package source):

Python	Julia
`def bagging_classifier_accuracy(N: int, p: float) -> float:`	`function bagging_classifier_accuracy(N::Integer, p::Real)`
class BaggingClassifierAccuracy: """ Evaluates a bagging classifier's accuracy using different weighting schemes based on decision tree c_i scores. Methods: - fit: Fits the bagging classifier. - calculate_c_i: Calculates the c_i score for each tree. - calculate_weights: Computes weights (uniform, c_i, 1-c_i^2). - predict: Predicts class labels using specified weights. - evaluate_all_schemes: Gets accuracy for all weighting schemes. """ def __init__( self, n_estimators: int = 1000, max_samples: int = 100, max_features: float = 1.0, random_state: Optional[int] = None, ):	`function fit_bagging( x::AbstractMatrix{<:Real}, y::AbstractVector; n_estimators::Integer = 1000, max_samples::Integer = 100, max_features::Integer = 1, random_state = nothing, )`
`def calculate_bootstrap_accuracy( clf: BaggingClassifier, X: pd.DataFrame, y: pd.Series, n_bootstraps: int = 1000 ) -> tuple[np.ndarray, float, float]:`	`function calculate_bootstrap_accuracy( trees, classes, x::AbstractMatrix{<:Real}, y::AbstractVector; weights::AbstractVector{<:Real} = fill(1.0 / length(trees), length(trees)), n_bootstraps::Integer = 1000, random_state = nothing, )`
`def backtest_predictions( self, estimator: Union[Estimator, dict[str, Estimator]], data: Union[pd.DataFrame, dict[str, pd.DataFrame]], labels: Union[pd.Series, dict[str, pd.Series]], sample_weights: Optional[Union[np.ndarray, dict[str, np.ndarray]]] = None, predict_probability: bool = False, n_jobs: int = 1, ) -> Union[dict[str, np.ndarray], dict[str, dict[str, np.ndarray]]]:`	`function cross_val_score( cv, x::AbstractMatrix{<:Real}, y::AbstractVector; n_trees::Integer = 100, n_subfeatures::Integer = -1, max_depth::Integer = -1, scoring::Symbol = :accuracy, random_state::Integer = 0, )`

Full source: Python · Julia