Machine learning to attribute the source of Campylobacter infections in the United States: A retrospective analysis of national surveillance data.
Pascoe B., Futcher G., Pensar J., Bayliss SC., Mourkas E., Calland JK., Hitchings MD., Joseph LA., Lane CG., Greenlee T., Arning N., Wilson DJ., Jolley KA., Corander J., Maiden MCJ., Parker CT., Cooper KK., Rose EB., Hiett K., Bruce BB., Sheppard SK.
OBJECTIVES: Integrating pathogen genomic surveillance with bioinformatics can enhance public health responses by identifying risk and guiding interventions. This study focusses on the two predominant Campylobacter species, which are commonly found in the gut of birds and mammals and often infect humans via contaminated food. Rising incidence and antimicrobial resistance (AMR) are a global concern, and there is an urgent need to quantify the main routes to human infection. METHODS: During routine US national surveillance (2009-2019), 8856 Campylobacter genomes from human infections and 16,703 from possible sources were sequenced. Using machine learning and probabilistic models, we target genetic variation associated with host adaptation to attribute the source of human infections and estimate the importance of different disease reservoirs. RESULTS: Poultry was identified as the primary source of human infections, responsible for an estimated 68% of cases, followed by cattle (28%), and only a small contribution from wild birds (3%) and pork sources (1%). There was also evidence of an increase in multidrug resistance, particularly among isolates attributed to chickens. CONCLUSIONS: National surveillance and source attribution can guide policy, and our study suggests that interventions targeting poultry will yield the greatest reductions in campylobacteriosis and spread of AMR in the US. DATA AVAILABILITY: All sequence reads were uploaded and shared on NCBI's Sequence Read Archive (SRA) associated with BioProjects; PRJNA239251 (CDC / PulseNet surveillance), PRJNA287430 (FSIS surveillance), PRJNA292668 & PRJNA292664 (NARMS) and PRJNA258022 (FDA surveillance). Publicly available genomes, including reference genomes and isolates sampled worldwide from wild birds are associated with BioProject accessions: PRJNA176480, PRJNA177352, PRJNA342755, PRJNA345429, PRJNA312235, PRJNA415188, PRJNA524300, PRJNA528879, PRJNA529798, PRJNA575343, PRJNA524315 and PRJNA689604. Contiguous assemblies of all genome sequences compared are available at Mendeley data (assembled C. coli genomes doi: 10.17632/gxswjvxyh3.1; assembled C. jejuni genomes doi: 10.17632/6ngsz3dtbd.1) and individual project and accession numbers can be found in Supplementary tables S1 and S2, which also includes pubMLST identifiers for assembled genomes. Figshare (10.6084/m9.figshare.20279928). Interactive phylogenies are hosted on microreact separately for C. jejuni (https://microreact.org/project/pascoe-us-cjejuni) and C. coli (https://microreact.org/project/pascoe-us-ccoli).