Research paper · Finance & Machine Learning

Forecasting Returns in Thin Markets: A Machine Learning Approach to the Zagreb Stock Exchange

Mislav Šagovac  ·  Luka Šikić  ·  Petra Palić

Zagreb Stock Exchange (ZSE)  ·  sample period 2000–2024

1,100+ predictors
4 ML models
24 years of data
1.58 peak Sharpe ratio
Abstract

Purpose. This paper investigates the out-of-sample predictability of weekly stock returns on the Zagreb Stock Exchange (ZSE), a thin frontier market characterized by low liquidity and concentrated ownership, and asks whether machine learning can extract predictive signal in this environment.

Methodology. We construct over 1,100 predictors from daily OHLCV data (2000–2024) — technical indicators, time-series features, and wavelet decompositions — and evaluate four models (Elastic Net, Random Forest, XGBoost, and a shallow neural network) within a rigorous nested rolling-window cross-validation framework, assessed via statistical metrics and a realistic portfolio backtest.

Results. Directional accuracy is modest (46–53%), with nonlinear ensembles and neural networks outperforming the linear benchmark. A key finding is a strong monotonic liquidity gradient: portfolio Sharpe ratios rise from 0.17 for the most liquid stocks to 1.58 for the full universe (up to 1.97 for the best individual model).

Conclusion. Machine learning generates economically significant signals in frontier markets, but predictability concentrates in thinly traded stocks where transaction costs and market depth constrain practical implementation.

Read online (HTML) Download PDF

Key findings

Earlier version

Croatian · prior version

Primjena modela strojnog učenja za predviđanje očekivanih prinosa dionica u RH

An earlier, Croatian-language version of this research line on machine learning for ZSE return prediction.

Reproducibility: the main paper's tables are self-contained and render in full. Two figures reference chart images not bundled in this repository — add fig3_cropped.png and fig4_cropped.png to the paper/ folder and re-render to display them. The earlier Croatian paper depends on proprietary data, so its analysis code is shown but not executed.