Abstract:
Vector-borne diseases, like those transmitted by tsetse flies, pose a significant global public health threat.Reducing vector populations is a promising strategy for disease control, especially in the case of tsetsetransmitted African trypanosomiasis. However, the cost-effective implementation of large-scale vector surveillance and control measures face challenges due to the lack of spatially explicit and reliable maps identifying
vector hotspots. In this study, we assessed the accuracy of predicting Glossina pallidipes relative densities across Kenya by linking constrained in-situ tsetse catch data from 660 traps across three Kenyan regions with readily
available gridded satellite information (human population, land cover, soil properties, elevation, precipitation,and land surface temperature) using a classical random forest algorithm. To enhance predictive performance, we
employed two feature elimination techniques specifically designed for machine learning algorithms, i.e.,Recursive Feature Elimination (RFE) and Variable Selection Using Random Forests (VSURF). For each set of retained variables, we trained a Random Forest model using a spatial cross-validation technique. Our findings
showed that tsetse fly relative densities decreased with mean annual precipitation, and soil moisture, and conversely increased with higher tree cover. Based on the cross-validated R2
, 41% of the spatial variability in relative densities of tsetse flies could be explained. For spatial extrapolation, only the set of predictors retained by
VSURF closely matched known tsetse fly distributions in Kenya. This more accurate performance of VSURF may be attributed to its approach of assessing variables for both importance and their contribution to reducing prediction error. Our study demonstrates the potential of using a random forest method to upscale tsetse relative
abundance predictions to the national level. However, the reliability of the current extrapolated map remains uncertain. We recommend: 1) increasing tsetse fly sampling efforts, particularly in the data-limited northern and eastern regions of Kenya, and 2) developing a more precise and accurate land cover map with classes that directly associate with known habitat characteristics of the target tsetse species