- Published on
Backtest Statistics Categories
Backtest statistics are essential for evaluating the efficacy of investment strategies. These metrics fall into different categories:
- General Features: Includes metrics like Time range, Average AUM, Capacity, and Leverage.
- Performance Metrics: Such as PnL, annualized rate of return, hit ratio, etc.
def bet_timing(target_positions: pd.Series) -> pd.Index:
zero_positions = target_positions[target_positions == 0].index
lagged_non_zero_positions = target_positions.shift(1)
lagged_non_zero_positions = lagged_non_zero_positions[lagged_non_zero_positions != 0].index
bets = zero_positions.intersection(lagged_non_zero_positions)
zero_positions = target_positions.iloc[1:] * target_positions.iloc[:-1].values
bets = bets.union(zero_positions[zero_positions < 0].index).sort_values()
if target_positions.index[-1] not in bets:
bets = bets.append(target_positions.index[-1:])
return bets
TWRR is a method for calculating returns that adjusts for external cash flows. The formula is complex but can be summarized with : TWRR for portfolio between time , : Mark-to-market profit or loss for portfolio at time , : Market value of assets managed by portfolio over sub-period :
def calculate_holding_period(target_positions: pd.Series) -> tuple:
hold_period, time_entry = pd.DataFrame(columns=['dT', 'w']), 0.0
position_difference = target_positions.diff()
time_difference = (target_positions.index - target_positions.index[0]) / np.timedelta64(1, 'D')
for i in range(1, target_positions.shape[0]):
if position_difference.iloc[i] * target_positions.iloc[i - 1] >= 0:
if target_positions.iloc[i] != 0:
time_entry = (time_entry * target_positions.iloc[i - 1] + time_difference[i] * position_difference.iloc[i]) / target_positions.iloc[i]
else:
if target_positions.iloc[i] * target_positions.iloc[i - 1] < 0:
hold_period.loc[target_positions.index[i], ['dT', 'w']] = (time_difference[i] - time_entry, abs(target_positions.iloc[i - 1]))
time_entry = time_difference[i]
else:
hold_period.loc[target_positions.index[i], ['dT', 'w']] = (time_difference[i] - time_entry, abs(position_difference.iloc[i]))
if hold_period['w'].sum() > 0:
mean_holding_period = (hold_period['dT'] * hold_period['w']).sum() / hold_period['w'].sum()
else:
mean_holding_period = np.nan
return hold_period, mean_holding_period
Performance statistics that are not risk-adjusted include: PnL: Total dollars earned, PnL from Long Positions: Earnings from only long holdings, Annualized Rate of Return: Includes all forms of earnings and expenses, Hit Ratio: Percentage of profitable bets. Investment strategies often contain series of returns, known as "runs," that can be either positive or negative. Understanding the concentration of these runs and their impact on risk factors like drawdowns and time under water is essential for assessing a strategy's viability.
Consider a time series of bet returns, , with a length . We can split these returns into positive and negative subsets, and . Two weight series, and , can be defined as:
We define the Herfindahl-Hirschman Index (HHI)-based concentration of positive returns () and negative returns () as:
Desirable strategy characteristics include: High Sharpe ratio, Many bets per year, High hit ratio (low ), Low , Low .
HHI Concentration Functions
def calculate_hhi_concentration(returns: pd.Series) -> tuple:
"""
Calculate the HHI concentration measures.
:param returns: Series of returns.
:return: Tuple containing positive returns HHI, negative returns HHI, and time-concentrated HHI.
"""
returns_hhi_positive = calculate_hhi(returns[returns >= 0])
returns_hhi_negative = calculate_hhi(returns[returns < 0])
time_concentrated_hhi = calculate_hhi(returns.groupby(pd.Grouper(freq='M')).count())
return returns_hhi_positive, returns_hhi_negative, time_concentrated_hhi
These functionalities are available in both Python and Julia in the RiskLabAI library.
Drawdown and Time Under Water
Drawdown (DD) is the most significant loss between two high watermarks (HWMs), while Time under Water (TuW) is the duration taken to surpass a previous HWM.
DD and TuW Functions
def compute_drawdowns_time_under_water(series: pd.Series, dollars: bool = False) -> tuple:
series_df = series.to_frame('PnL').reset_index(names='Datetime')
series_df['HWM'] = series.expanding().max().values
def process_groups(group):
if len(group) <= 1:
return None
result = pd.Series()
result.loc['Start'] = group['Datetime'].iloc[0]
result.loc['Stop'] = group['Datetime'].iloc[-1]
result.loc['HWM'] = group['HWM'].iloc[0]
result.loc['Min'] = group['PnL'].min()
result.loc['Min. Time'] = group['Datetime'][group['PnL'] == group['PnL'].min()].iloc[0]
return result
groups = series_df.groupby('HWM')
drawdown_analysis = pd.DataFrame()
for _, group in groups:
drawdown_analysis = drawdown_analysis.append(process_groups(group), ignore_index=True)
if dollars:
drawdown = drawdown_analysis['HWM'] - drawdown_analysis['Min']
else:
drawdown = 1 - drawdown_analysis['Min'] / drawdown_analysis['HWM']
drawdown.index = drawdown_analysis['Start']
drawdown.index.name = 'Datetime'
time_under_water = ((drawdown_analysis['Stop'] - drawdown_analysis['Start']) / np.timedelta64(1, 'Y')).values
time_under_water = pd.Series(time_under_water, index=drawdown_analysis['Start'])
return drawdown, time_under_water, drawdown_analysis
These functionalities are available in both Python and Julia in the RiskLabAI library.
Key Metrics for runs statistics: HHI index for both positive and negative returns, Time between bets measured by HHI index, 95th percentile of Drawdown (DD) and Time under Water (TuW). These metrics are useful to understand the concentration of portfolio returns and the risk involved.
def calculate_hhi(bet_returns: pd.Series) -> float:
"""
Calculate the Herfindahl-Hirschman Index (HHI) concentration measure.
:param bet_returns: Series of bet returns.
:return: Calculated HHI value.
"""
if bet_returns.shape[0] <= 2:
return np.nan
weight = bet_returns / bet_returns.sum()
hhi_ = (weight ** 2).sum()
hhi_ = (hhi_ - bet_returns.shape[0] ** -1) / (1.0 - bet_returns.shape[0] ** -1)
return hhi_
Implementation Failure Metrics
Key Metrics to prevent investment plans from failing:
- Broker fees per turnover
- Average slippage per turnover
- Dollar performance per turnover
- Return on execution costs
These metrics help you understand how your portfolio could be affected by hidden costs.
Efficiency Metrics
Sharpe Ratio (SR)
This ratio measures performance by dividing the average returns by the standard deviation of returns.
Probabilistic Sharpe Ratio (PSR)
This metric adjusts the Sharpe ratio to account for data distortions like skewness and kurtosis.
Deflated Sharpe Ratio (DSR)
This is an extension of PSR, which accounts for the number of trials performed to obtain the Sharpe ratio.
Other Efficiency Metrics
- Annualized Sharpe Ratio
- Information Ratio
- Probabilistic Sharpe Ratio (PSR)
- Deflated Sharpe Ratio (DSR)
Classification Scores
Metrics for evaluating the performance of machine learning algorithms in trading strategies include:
Accuracy:
Precision:
Recall:
F1 Score:
These metrics help you gauge how accurately your machine learning model is performing in real trading scenarios.
References
- De Prado, M. L. (2018). Advances in financial machine learning. John Wiley & Sons.
- De Prado, M. M. L. (2020). Machine learning for asset managers. Cambridge University Press.