Value Investing Analytics

Methodology

The live ranking now uses a hand-set factor ensemble as the published score, with a shallower LightGBM model shown beside it as an experimental comparison. We tested the current factor set on a broad in-house universe from 1990 through 2023, using 36-month forward total return rank as the outcome and normalized cross-sectional factor ranks as the inputs. The goal was simple: keep the live score readable and stable, while still learning from the model research.

Current Live Score

These are the same five factors used in the earlier Custom Research Ensemble. The published score is still a transparent weighted mix, but momentum now carries less weight than before.

The published Final Score stays on a clean 0-1 scale because it is built from percentile-ranked factor inputs. The Experimental LightGBM Score also stays on a 0-1 scale by percentile-ranking the model's raw predictions across the current universe.

We still show Earnings Yield in the table because it remains a useful diagnostic. It is no longer part of the live score because it was weak to negative in this universe and time frame. We also removed Book Value / Market Cap from the live display and score for the same reason.

Why keep an experimental model view

We first used research to identify a strong five-factor set. Gross profitability, share shrink, momentum, and revenue growth repeatedly showed up in the better combinations, so they became the core ingredient list. After that, we tested whether smarter combination rules could do a better job than fixed weights while still using exactly the same inputs.

In the walk-forward, out-of-sample tests, LightGBM beat the Custom Research Ensemble, equal weighting, and the simple linear regressions. But the live version now treats that model as experimental rather than authoritative. We made the live LightGBM shallower and more regularized, then surfaced it beside the published score so we can compare it without letting it fully drive the rankings.

Backtest setup

Mean rank IC by method
LightGBM delivered the strongest mean rank IC among the tested combination methods, ahead of the Custom Research Ensemble and the linear alternatives.
Top 20 percent minus bottom 20 percent spread by method
The top 20% versus bottom 20% spread chart shows how much each method separated the strongest names from the weakest names on average.
Top 10 percent minus bottom 10 percent spread by method
The decile spread view makes the sharper separation visible: the best nonlinear methods widened the gap substantially more than the hand-weighted score.
Method correlation heatmap
The method correlation heatmap shows which ensemble methods are producing genuinely different rankings versus mostly rephrasing the same ordering.
Feature weight and importance grid
This grid shows how the hand-set weights compare with the learned coefficients and feature importances from the model-based approaches.
Method summary table
The final method table brings together mean rank IC, spread results, and hit rates across the competing ensemble approaches.