MACHINE LEARNING FORECASTS OF BILATERAL TRADE FLOWS: OUT-OF-TIME EVIDENCE FROM A GLOBAL DYAD PANEL (1991–2021)
Keywords:
Bilateral trade flows, Trade forecasting; Machine learning, Out-of-time evaluation, Dyad panel data; Gradient boosting; CatBoost, Random forest, Model interpretability, SHAP.Abstract
This study develops an out-of-time forecasting framework for bilateral trade flows using a global dyad–year panel covering 1991–2021. While gravity models dominate empirical trade analysis, their primary objective is structural interpretation and counterfactual evaluation rather than predictive accuracy. We reformulate dyad-level trade prediction as a supervised learning problem and construct a parsimonious information set based only on variables available prior to the forecast year, including lagged dyad trade, exporter and importer scale proxies, and dyad importance measures, together with exporter–importer identifiers. Forecast performance is evaluated under a strict temporal split, with training through 2016, validation in 2017–2019, and a held-out test period in 2020–2021. Comparing a regularized linear benchmark (Ridge) against nonlinear tree-based methods (Random Forest and CatBoost), we find consistent gains from nonlinear learning, with CatBoost delivering the best out-of-sample accuracy (lowest RMSE on the transformed target) and Random Forest performing second-best. Diagnostic evidence further shows that forecast errors are highly heterogeneous across the trade distribution, with the largest absolute errors concentrated among high-value dyads, and that residual patterns vary over the disruption period. To address the interpretability gap often associated with machine learning in economics, we apply SHAP-based explanations and show that persistence in dyad trade history is the dominant driver of predictions, while exporter/importer scale conditions provide additional predictive content. The results establish an interpretable and scalable forecasting benchmark for dyad-level trade monitoring and highlight where predictive models succeed and fail in periods of structural change.














