Improving disaggregation models of malaria incidence by ensembling non-linear models of prevalence

Tim C. D. Lucas, Anita K. Nandi, Suzanne H. Keddie, Elisabeth G. Chestnutt, Rosalind E. Howes, Susan F. Rumisha, Rohan Arambepola, Amelia Bertozzi-Villa, Andre Python, Tasmin L. Symons, Justin J. Millar, Punam Amratia, Penelope Hancock, Katherine E. Battle, Ewan Cameron, Peter W. Gething, Daniel J. Weiss

Abstract

Maps of disease burden are a core tool needed for the control and elimination of malaria. Reliable routine surveillance data of malaria incidence, typically aggregated to administrative units, is becoming more widely available. Disaggregation regression is an important model framework for estimating high resolution risk maps from aggregated data. However, the aggregation of incidence over large, heterogeneous areas means that these data are underpowered for estimating complex, non-linear models. In contrast, prevalence point-surveys are directly linked to local environmental conditions but are not common in many areas of the world. Here, we train multiple non-linear, machine learning models on Plasmodium falciparum prevalence point-surveys. We then ensemble the predictions from these machine learning models with a disaggregation regression model that uses aggregated malaria incidences as response data. We find that using a disaggregation regression model to combine predictions from machine learning models improves model accuracy relative to a baseline model.

Type

Journal

Publication

Spatial and Spatio-temporal Epidemiology, pp. 100357, https://doi.org/10.1016/j.sste.2020.100357

Date

January, 2020

Links