register interest

Professor Daniel J Wilson

Research Area: Genetics and Genomics
Scientific Themes: Genetics & Genomics and Immunology & Infectious Disease
Keywords: Evolution, Pathogens, Microbiology, Epidemiology and Statistical genetics
Web Links:

My research interests centre on the application of tools for evolutionary analysis, in particular population genetics, to understanding human pathogens. I am primarily involved in the UK CRC Consortium Modernising Medical Microbiology, an ambitious project with the aim of tracing and tracking clinically important microorganisms in near to real-time using whole genome next generation sequencing. Through statistical analysis, we wish to elucidate the evolution and epidemiology of these pathogens.

Name Department Institution Country
Dr Rory Bowden Wellcome Trust Centre for Human Genetics University of Oxford United Kingdom
Professor Derrick Crook Experimental Medicine Division University of Oxford United Kingdom
Professor Peter Donnelly FRS Wellcome Trust Centre for Human Genetics University of Oxford United Kingdom
Professor Tim Peto Experimental Medicine Division University of Oxford United Kingdom
Dr Zamin Iqbal Wellcome Trust Centre for Human Genetics University of Oxford United Kingdom
Dr David Wyllie Jenner Institute University of Oxford United Kingdom
Dr Chris Spencer Wellcome Trust Centre for Human Genetics University of Oxford United Kingdom
Leekitcharoenphon P, Hendriksen RS, Le Hello S, Weill FX, Baggesen DL, Jun SR, Ussery DW, Lund O, Crook DW, Wilson DJ, Aarestrup FM. 2016. Global Genomic Epidemiology of Salmonella enterica Serovar Typhimurium DT104. Appl Environ Microbiol, 82 (8), pp. 2516-2526. | Show Abstract | Read more

It has been 30 years since the initial emergence and subsequent rapid global spread of multidrug-resistant Salmonella entericaserovar Typhimurium DT104 (MDR DT104). Nonetheless, its origin and transmission route have never been revealed. We used whole-genome sequencing (WGS) and temporally structured sequence analysis within a Bayesian framework to reconstruct temporal and spatial phylogenetic trees and estimate the rates of mutation and divergence times of 315S Typhimurium DT104 isolates sampled from 1969 to 2012 from 21 countries on six continents. DT104 was estimated to have emerged initially as antimicrobial susceptible in ∼1948 (95% credible interval [CI], 1934 to 1962) and later became MDR DT104 in ∼1972 (95% CI, 1972 to 1988) through horizontal transfer of the 13-kb Salmonella genomic island 1 (SGI1) MDR region into susceptible strains already containing SGI1. This was followed by multiple transmission events, initially from central Europe and later between several European countries. An independent transmission to the United States and another to Japan occurred, and from there MDR DT104 was probably transmitted to Taiwan and Canada. An independent acquisition of resistance genes took place in Thailand in ∼1975 (95% CI, 1975 to 1990). In Denmark, WGS analysis provided evidence for transmission of the organism between herds of animals. Interestingly, the demographic history of Danish MDR DT104 provided evidence for the success of the program to eradicate Salmonellafrom pig herds in Denmark from 1996 to 2000. The results from this study refute several hypotheses on the evolution of DT104 and suggest that WGS may be useful in monitoring emerging clones and devising strategies for prevention of Salmonella infections.

Hedge J, Wilson DJ. 2016. Practical Approaches for Detecting Selection in Microbial Genomes. PLoS Comput Biol, 12 (2), pp. e1004739. | Show Abstract | Read more

Microbial genome evolution is shaped by a variety of selective pressures. Understanding how these processes occur can help to address important problems in microbiology by explaining observed differences in phenotypes, including virulence and resistance to antibiotics. Greater access to whole-genome sequencing provides microbiologists with the opportunity to perform large-scale analyses of selection in novel settings, such as within individual hosts. This tutorial aims to guide researchers through the fundamentals underpinning popular methods for measuring selection in pathogens. These methods are transferable to a wide variety of organisms, and the exercises provided are designed for researchers with any level of programming experience.

Didelot X, Walker AS, Peto TE, Crook DW, Wilson DJ. 2016. Within-host evolution of bacterial pathogens. Nat Rev Microbiol, 14 (3), pp. 150-162. | Show Abstract | Read more

Whole-genome sequencing has opened the way for investigating the dynamics and genomic evolution of bacterial pathogens during the colonization and infection of humans. The application of this technology to the longitudinal study of adaptation in an infected host--in particular, the evolution of drug resistance and host adaptation in patients who are chronically infected with opportunistic pathogens--has revealed remarkable patterns of convergent evolution, suggestive of an inherent repeatability of evolution. In this Review, we describe how these studies have advanced our understanding of the mechanisms and principles of within-host genome evolution, and we consider the consequences of findings such as a potent adaptive potential for pathogenicity. Finally, we discuss the possibility that genomics may be used in the future to predict the clinical progression of bacterial infections and to suggest the best option for treatment.

Stoesser N, Sheppard AE, Pankhurst L, De Maio N, Moore CE, Sebra R, Turner P, Anson LW et al. 2016. Evolutionary History of the Global Emergence of the Escherichia coli Epidemic Clone ST131. MBio, 7 (2), pp. e02162. | Show Abstract | Read more

UNLABELLED: Escherichia colisequence type 131 (ST131) has emerged globally as the most predominant extraintestinal pathogenic lineage within this clinically important species, and its association with fluoroquinolone and extended-spectrum cephalosporin resistance impacts significantly on treatment. The evolutionary histories of this lineage, and of important antimicrobial resistance elements within it, remain unclearly defined. This study of the largest worldwide collection (n= 215) of sequenced ST131E. coliisolates to date demonstrates that the clonal expansion of two previously recognized antimicrobial-resistant clades, C1/H30R and C2/H30Rx, started around 25 years ago, consistent with the widespread introduction of fluoroquinolones and extended-spectrum cephalosporins in clinical medicine. These two clades appear to have emerged in the United States, with the expansion of the C2/H30Rx clade driven by the acquisition of ablaCTX-M-15-containing IncFII-like plasmid that has subsequently undergone extensive rearrangement. Several other evolutionary processes influencing the trajectory of this drug-resistant lineage are described, including sporadic acquisitions of CTX-M resistance plasmids and chromosomal integration ofblaCTX-Mwithin subclusters followed by vertical evolution. These processes are also occurring for another family of CTX-M gene variants more recently observed among ST131, theblaCTX-M-14/14-likegroup. The complexity of the evolutionary history of ST131 has important implications for antimicrobial resistance surveillance, epidemiological analysis, and control of emerging clinical lineages ofE. coli These data also highlight the global imperative to reduce specific antibiotic selection pressures and demonstrate the important and varied roles played by plasmids and other mobile genetic elements in the perpetuation of antimicrobial resistance within lineages. IMPORTANCE: Escherichia coli, perennially a major bacterial pathogen, is becoming increasingly difficult to manage due to emerging resistance to all preferred antimicrobials. Resistance is concentrated within specificE. colilineages, such as sequence type 131 (ST131). Clarification of the genetic basis for clonally associated resistance is key to devising intervention strategies. We used high-resolution genomic analysis of a large global collection of ST131 isolates to define the evolutionary history of extended-spectrum beta-lactamase production in ST131. We documented diverse contributory genetic processes, including stable chromosomal integrations of resistance genes, persistence and evolution of mobile resistance elements within sublineages, and sporadic acquisition of different resistance elements. Both global distribution and regional segregation were evident. The diversity of resistance element acquisition and propagation within ST131 indicates a need for control and surveillance strategies that target both bacterial strains and mobile genetic elements.

Bowden R, Ansari MA, Jensen SO, Ip CLC, Espedido BA, van Hal SJ, Wilson DJ. 2016. Evolutionary dynamics of Enterococcus faecium reveals complex genomic relationships between isolates with independent emergence of vancomycin resistance Microbial Genomics, 2 (1), | Read more

Bradley P, Gordon NC, Walker TM, Dunn L, Heys S, Huang B, Earle S, Pankhurst LJ et al. 2015. Rapid antibiotic-resistance predictions from genome sequence data for Staphylococcus aureus and Mycobacterium tuberculosis. Nat Commun, 6 pp. 10063. | Show Abstract | Read more

The rise of antibiotic-resistant bacteria has led to an urgent need for rapid detection of drug resistance in clinical samples, and improvements in global surveillance. Here we show how de Bruijn graph representation of bacterial diversity can be used to identify species and resistance profiles of clinical isolates. We implement this method for Staphylococcus aureus and Mycobacterium tuberculosis in a software package ('Mykrobe predictor') that takes raw sequence data as input, and generates a clinician-friendly report within 3 minutes on a laptop. For S. aureus, the error rates of our method are comparable to gold-standard phenotypic methods, with sensitivity/specificity of 99.1%/99.6% across 12 antibiotics (using an independent validation set, n=470). For M. tuberculosis, our method predicts resistance with sensitivity/specificity of 82.6%/98.5% (independent validation set, n=1,609); sensitivity is lower here, probably because of limited understanding of the underlying genetic mechanisms. We give evidence that minor alleles improve detection of extremely drug-resistant strains, and demonstrate feasibility of the use of emerging single-molecule nanopore sequencing techniques for these purposes.

Didelot X, Maio ND, Brown T, Wilson DJ. 2016. SimBac: simulation of whole bacterial genomes with homologous recombination Microbial Genomics, 2 (1), | Read more

Laabei M, Uhlemann AC, Lowy FD, Austin ED, Yokoyama M, Ouadi K, Feil E, Thorpe HA et al. 2015. Evolutionary Trade-Offs Underlie the Multi-faceted Virulence of Staphylococcus aureus. PLoS Biol, 13 (9), pp. e1002229. | Show Abstract | Read more

Bacterial virulence is a multifaceted trait where the interactions between pathogen and host factors affect the severity and outcome of the infection. Toxin secretion is central to the biology of many bacterial pathogens and is widely accepted as playing a crucial role in disease pathology. To understand the relationship between toxicity and bacterial virulence in greater depth, we studied two sequenced collections of the major human pathogen Staphylococcus aureus and found an unexpected inverse correlation between bacterial toxicity and disease severity. By applying a functional genomics approach, we identified several novel toxicity-affecting loci responsible for the wide range in toxic phenotypes observed within these collections. To understand the apparent higher propensity of low toxicity isolates to cause bacteraemia, we performed several functional assays, and our findings suggest that within-host fitness differences between high- and low-toxicity isolates in human serum is a contributing factor. As invasive infections, such as bacteraemia, limit the opportunities for onward transmission, highly toxic strains could gain an additional between-host fitness advantage, potentially contributing to the maintenance of toxicity at the population level. Our results clearly demonstrate how evolutionary trade-offs between toxicity, relative fitness, and transmissibility are critical for understanding the multifaceted nature of bacterial virulence.

Dearlove BL, Cody AJ, Pascoe B, Méric G, Wilson DJ, Sheppard SK. 2016. Rapid host switching in generalist Campylobacter strains erodes the signal for tracing human infections. ISME J, 10 (3), pp. 721-729. | Show Abstract | Read more

Campylobacter jejuni and Campylobacter coli are the biggest causes of bacterial gastroenteritis in the developed world, with human infections typically arising from zoonotic transmission associated with infected meat. Because Campylobacter is not thought to survive well outside the gut, host-associated populations are genetically isolated to varying degrees. Therefore, the likely origin of most strains can be determined by host-associated variation in the genome. This is instructive for characterizing the source of human infection. However, some common strains, notably isolates belonging to the ST-21, ST-45 and ST-828 clonal complexes, appear to have broad host ranges, hindering source attribution. Here whole-genome sequencing has the potential to reveal fine-scale genetic structure associated with host specificity. We found that rates of zoonotic transmission among animal host species in these clonal complexes were so high that the signal of host association is all but obliterated, estimating one zoonotic transmission event every 1.6, 1.8 and 12 years in the ST-21, ST-45 and ST828 complexes, respectively. We attributed 89% of clinical cases to a chicken source, 10% to cattle and 1% to pig. Our results reveal that common strains of C. jejuni and C. coli infectious to humans are adapted to a generalist lifestyle, permitting rapid transmission between different hosts. Furthermore, they show that the weak signal of host association within these complexes presents a challenge for pinpointing the source of clinical infections, underlining the view that whole-genome sequencing, powerful though it is, cannot substitute for intensive sampling of suspected transmission reservoirs.

Didelot X, Wilson DJ. 2015. ClonalFrameML: efficient inference of recombination in whole bacterial genomes. PLoS Comput Biol, 11 (2), pp. e1004041. | Show Abstract | Read more

Recombination is an important evolutionary force in bacteria, but it remains challenging to reconstruct the imports that occurred in the ancestry of a genomic sample. Here we present ClonalFrameML, which uses maximum likelihood inference to simultaneously detect recombination in bacterial genomes and account for it in phylogenetic reconstruction. ClonalFrameML can analyse hundreds of genomes in a matter of hours, and we demonstrate its usefulness on simulated and real datasets. We find evidence for recombination hotspots associated with mobile elements in Clostridium difficile ST6 and a previously undescribed 310kb chromosomal replacement in Staphylococcus aureus ST582. ClonalFrameML is freely available at http://clonalframeml.googlecode.com/.

De Maio N, Wu CH, O'Reilly KM, Wilson D. 2015. New Routes to Phylogeography: A Bayesian Structured Coalescent Approximation. PLoS Genet, 11 (8), pp. e1005421. | Show Abstract | Read more

Phylogeographic methods aim to infer migration trends and the history of sampled lineages from genetic data. Applications of phylogeography are broad, and in the context of pathogens include the reconstruction of transmission histories and the origin and emergence of outbreaks. Phylogeographic inference based on bottom-up population genetics models is computationally expensive, and as a result faster alternatives based on the evolution of discrete traits have become popular. In this paper, we show that inference of migration rates and root locations based on discrete trait models is extremely unreliable and sensitive to biased sampling. To address this problem, we introduce BASTA (BAyesian STructured coalescent Approximation), a new approach implemented in BEAST2 that combines the accuracy of methods based on the structured coalescent with the computational efficiency required to handle more than just few populations. We illustrate the potentially severe implications of poor model choice for phylogeographic analyses by investigating the zoonotic transmission of Ebola virus. Whereas the structured coalescent analysis correctly infers that successive human Ebola outbreaks have been seeded by a large unsampled non-human reservoir population, the discrete trait analysis implausibly concludes that undetected human-to-human transmission has allowed the virus to persist over the past four decades. As genomics takes on an increasingly prominent role informing the control and prevention of infectious diseases, it will be vital that phylogeographic inference provides robust insights into transmission history.

Walker TM, Kohl TA, Omar SV, Hedge J, Del Ojo Elias C, Bradley P, Iqbal Z, Feuerriegel S et al. 2015. Whole-genome sequencing for prediction of Mycobacterium tuberculosis drug susceptibility and resistance: a retrospective cohort study. Lancet Infect Dis, 15 (10), pp. 1193-1202. | Show Abstract | Read more

BACKGROUND: Diagnosing drug-resistance remains an obstacle to the elimination of tuberculosis. Phenotypic drug-susceptibility testing is slow and expensive, and commercial genotypic assays screen only common resistance-determining mutations. We used whole-genome sequencing to characterise common and rare mutations predicting drug resistance, or consistency with susceptibility, for all first-line and second-line drugs for tuberculosis. METHODS: Between Sept 1, 2010, and Dec 1, 2013, we sequenced a training set of 2099 Mycobacterium tuberculosis genomes. For 23 candidate genes identified from the drug-resistance scientific literature, we algorithmically characterised genetic mutations as not conferring resistance (benign), resistance determinants, or uncharacterised. We then assessed the ability of these characterisations to predict phenotypic drug-susceptibility testing for an independent validation set of 1552 genomes. We sought mutations under similar selection pressure to those characterised as resistance determinants outside candidate genes to account for residual phenotypic resistance. FINDINGS: We characterised 120 training-set mutations as resistance determining, and 772 as benign. With these mutations, we could predict 89·2% of the validation-set phenotypes with a mean 92·3% sensitivity (95% CI 90·7-93·7) and 98·4% specificity (98·1-98·7). 10·8% of validation-set phenotypes could not be predicted because uncharacterised mutations were present. With an in-silico comparison, characterised resistance determinants had higher sensitivity than the mutations from three line-probe assays (85·1% vs 81·6%). No additional resistance determinants were identified among mutations under selection pressure in non-candidate genes. INTERPRETATION: A broad catalogue of genetic mutations enable data from whole-genome sequencing to be used clinically to predict drug resistance, drug susceptibility, or to identify drug phenotypes that cannot yet be genetically predicted. This approach could be integrated into routine diagnostic workflows, phasing out phenotypic drug-susceptibility testing while reporting drug resistance early. FUNDING: Wellcome Trust, National Institute of Health Research, Medical Research Council, and the European Union.

Westwood J, Burnett M, Spratt D, Ball M, Wilson DJ, Wellsteed S, Cleary D, Green A et al. 2014. The hospital microbiome project: meeting report for the UK science and innovation network UK-USA workshop ‘beating the superbugs: hospital microbiome studies for tackling antimicrobial resistance’, October 14th 2013 Standards in Genomic Sciences, 9 (1), pp. 12-12. | Show Abstract | Read more

© 2014 Westwood et al.; licensee BioMed Central Ltd.The UK Science and Innovation Network UK-USA workshop 'Beating the Superbugs: Hospital Microbiome Studies for tackling Antimicrobial Resistance' was held on October 14th 2013 at the UK Department of Health, London. The workshop was designed to promote US-UK collaboration on hospital microbiome studies to add a new facet to our collective understanding of antimicrobial resistance. The assembled researchers debated the importance of the hospital microbial community in transmission of disease and as a reservoir for antimicrobial resistance genes, and discussed methodologies, hypotheses, and priorities. A number of complementary approaches were explored, although the importance of the built environment microbiome in disease transmission was not universally accepted. Current whole genome epidemiological methods are being pioneered in the UK and the benefits of moving to community analysis are not necessarily obvious to the pioneers; however, rapid progress in other areas of microbiology suggest to some researchers that hospital microbiome studies will be exceptionally fruitful even in the short term. Collaborative studies will recombine different strengths to tackle the international problems of antimicrobial resistance and hospital and healthcare associated infections.

Stoesser N, Giess A, Batty EM, Sheppard AE, Walker AS, Wilson DJ, Didelot X, Bashir A et al. 2014. Genome sequencing of an extended series of NDM-producing Klebsiella pneumoniae isolates from neonatal infections in a Nepali hospital characterizes the extent of community- versus hospital-associated transmission in an endemic setting. Antimicrob Agents Chemother, 58 (12), pp. 7347-7357. | Show Abstract | Read more

NDM-producing Klebsiella pneumoniae strains represent major clinical and infection control challenges, particularly in resource-limited settings with high rates of antimicrobial resistance. Determining whether transmission occurs at a gene, plasmid, or bacterial strain level and within hospital and/or the community has implications for monitoring and controlling spread. Whole-genome sequencing (WGS) is the highest-resolution typing method available for transmission epidemiology. We sequenced carbapenem-resistant K. pneumoniae isolates from 26 individuals involved in several infection case clusters in a Nepali neonatal unit and 68 other clinical Gram-negative isolates from a similar time frame, using Illumina and PacBio technologies. Within-outbreak chromosomal and closed-plasmid structures were generated and used as data set-specific references. Three temporally separated case clusters were caused by a single NDM K. pneumoniae strain with a conserved set of four plasmids, one being a 304,526-bp plasmid carrying bla(NDM-1). The plasmids contained a large number of antimicrobial/heavy metal resistance and plasmid maintenance genes, which may have explained their persistence. No obvious environmental/human reservoir was found. There was no evidence of transmission of outbreak plasmids to other Gram-negative clinical isolates, although bla(NDM) variants were present in other isolates in different genetic contexts. WGS can effectively define complex antimicrobial resistance epidemiology. Wider sampling frames are required to contextualize outbreaks. Infection control may be effective in terminating outbreaks caused by particular strains, even in areas with widespread resistance, although this study could not demonstrate evidence supporting specific interventions. Larger, detailed studies are needed to characterize resistance genes, vectors, and host strains involved in disease, to enable effective intervention.

Price JR, Golubchik T, Wilson DJ, Crook DW, Walker AS, Peto TE, Paul J, Llewelyn MJ. 2014. Reply to Mills and Linkin. Clin Infect Dis, 59 (5), pp. 752-753. | Read more

Price JR, Golubchik T, Cole K, Wilson DJ, Crook DW, Thwaites GE, Bowden R, Walker AS, Peto TE, Paul J, Llewelyn MJ. 2014. Whole-genome sequencing shows that patient-to-patient transmission rarely accounts for acquisition of Staphylococcus aureus in an intensive care unit. Clin Infect Dis, 58 (5), pp. 609-618. | Show Abstract | Read more

BACKGROUND:  Strategies to prevent Staphylococcus aureus infection in hospitals focus on patient-to-patient transmission. We used whole-genome sequencing to investigate the role of colonized patients as the source of new S. aureus acquisitions, and the reliability of identifying patient-to-patient transmission using the conventional approach of spa typing and overlapping patient stay. METHODS: Over 14 months, all unselected patients admitted to an adult intensive care unit (ICU) were serially screened for S. aureus. All available isolates (n = 275) were spa typed and underwent whole-genome sequencing to investigate their relatedness at high resolution. RESULTS: Staphylococcus aureus was carried by 185 of 1109 patients sampled within 24 hours of ICU admission (16.7%); 59 (5.3%) patients carried methicillin-resistant S. aureus (MRSA). Forty-four S. aureus (22 MRSA) acquisitions while on ICU were detected. Isolates were available for genetic analysis from 37 acquisitions. Whole-genome sequencing indicated that 7 of these 37 (18.9%) were transmissions from other colonized patients. Conventional methods (spa typing combined with overlapping patient stay) falsely identified 3 patient-to-patient transmissions (all MRSA) and failed to detect 2 acquisitions and 4 transmissions (2 MRSA). CONCLUSIONS: Only a minority of S. aureus acquisitions can be explained by patient-to-patient transmission. Whole-genome sequencing provides the resolution to disprove transmission events indicated by conventional methods and also to reveal otherwise unsuspected transmission events. Whole-genome sequencing should replace conventional methods for detection of nosocomial S. aureus transmission.

Gordon NC, Price JR, Cole K, Everitt R, Morgan M, Finney J, Kearns AM, Pichon B et al. 2014. Prediction of Staphylococcus aureus antimicrobial resistance by whole-genome sequencing. J Clin Microbiol, 52 (4), pp. 1182-1191. | Show Abstract | Read more

Whole-genome sequencing (WGS) could potentially provide a single platform for extracting all the information required to predict an organism's phenotype. However, its ability to provide accurate predictions has not yet been demonstrated in large independent studies of specific organisms. In this study, we aimed to develop a genotypic prediction method for antimicrobial susceptibilities. The whole genomes of 501 unrelated Staphylococcus aureus isolates were sequenced, and the assembled genomes were interrogated using BLASTn for a panel of known resistance determinants (chromosomal mutations and genes carried on plasmids). Results were compared with phenotypic susceptibility testing for 12 commonly used antimicrobial agents (penicillin, methicillin, erythromycin, clindamycin, tetracycline, ciprofloxacin, vancomycin, trimethoprim, gentamicin, fusidic acid, rifampin, and mupirocin) performed by the routine clinical laboratory. We investigated discrepancies by repeat susceptibility testing and manual inspection of the sequences and used this information to optimize the resistance determinant panel and BLASTn algorithm. We then tested performance of the optimized tool in an independent validation set of 491 unrelated isolates, with phenotypic results obtained in duplicate by automated broth dilution (BD Phoenix) and disc diffusion. In the validation set, the overall sensitivity and specificity of the genomic prediction method were 0.97 (95% confidence interval [95% CI], 0.95 to 0.98) and 0.99 (95% CI, 0.99 to 1), respectively, compared to standard susceptibility testing methods. The very major error rate was 0.5%, and the major error rate was 0.7%. WGS was as sensitive and specific as routine antimicrobial susceptibility testing methods. WGS is a promising alternative to culture methods for resistance prediction in S. aureus and ultimately other major bacterial pathogens.

Hedge J, Wilson DJ. 2014. Bacterial phylogenetic reconstruction from whole genomes is robust to recombination but demographic inference is not. MBio, 5 (6), pp. e02158. | Show Abstract | Read more

UNLABELLED: Phylogenetic inference in bacterial genomics is fundamental to understanding problems such as population history, antimicrobial resistance, and transmission dynamics. The field has been plagued by an apparent state of contradiction since the distorting effects of recombination on phylogeny were discovered more than a decade ago. Researchers persist with detailed phylogenetic analyses while simultaneously acknowledging that recombination seriously misleads inference of population dynamics and selection. Here we resolve this paradox by showing that phylogenetic tree topologies based on whole genomes robustly reconstruct the clonal frame topology but that branch lengths are badly skewed. Surprisingly, removing recombining sites can exacerbate branch length distortion caused by recombination. IMPORTANCE: Phylogenetic tree reconstruction is a popular approach for understanding the relatedness of bacteria in a population from differences in their genome sequences. However, bacteria frequently exchange regions of their genomes by a process called homologous recombination, which violates a fundamental assumption of phylogenetic methods. Since many researchers continue to use phylogenetics for recombining bacteria, it is important to understand how recombination affects the conclusions drawn from these analyses. We find that whole-genome sequences afford great accuracy in reconstructing evolutionary relationships despite concerns surrounding the presence of recombination, but the branch lengths of the phylogenetic tree are indeed badly distorted. Surprisingly, methods to reduce the impact of recombination on branch lengths can exacerbate the problem.

Everitt RG, Didelot X, Batty EM, Miller RR, Knox K, Young BC, Bowden R, Auton A et al. 2014. Mobile elements drive recombination hotspots in the core genome of Staphylococcus aureus. Nat Commun, 5 pp. 3956. | Show Abstract | Read more

Horizontal gene transfer is an important driver of bacterial evolution, but genetic exchange in the core genome of clonal species, including the major pathogen Staphylococcus aureus, is incompletely understood. Here we reveal widespread homologous recombination in S. aureus at the species level, in contrast to its near-complete absence between closely related strains. We discover a patchwork of hotspots and coldspots at fine scales falling against a backdrop of broad-scale trends in rate variation. Over megabases, homoplasy rates fluctuate 1.9-fold, peaking towards the origin-of-replication. Over kilobases, we find core recombination hotspots of up to 2.5-fold enrichment situated near fault lines in the genome associated with mobile elements. The strongest hotspots include regions flanking conjugative transposon ICE6013, the staphylococcal cassette chromosome (SCC) and genomic island νSaα. Mobile element-driven core genome transfer represents an opportunity for adaptation and challenges our understanding of the recombination landscape in predominantly clonal pathogens, with important implications for genotype-phenotype mapping.

Miller RM, Price JR, Batty EM, Didelot X, Wyllie D, Golubchik T, Crook DW, Paul J et al. 2014. Healthcare-associated outbreak of meticillin-resistant Staphylococcus aureus bacteraemia: role of a cryptic variant of an epidemic clone. J Hosp Infect, 86 (2), pp. 83-89. | Show Abstract | Read more

BACKGROUND: New strains of meticillin-resistant Staphylococcus aureus (MRSA) may be associated with changes in rates of disease or clinical presentation. Conventional typing techniques may not detect new clonal variants that underlie changes in epidemiology or clinical phenotype. AIM: To investigate the role of clonal variants of MRSA in an outbreak of MRSA bacteraemia at a hospital in England. METHODS: Bacteraemia isolates of the major UK lineages (EMRSA-15 and -16) from before and after the outbreak were analysed by whole-genome sequencing in the context of epidemiological and clinical data. For comparison, EMRSA-15 and -16 isolates from another hospital in England were sequenced. A clonal variant of EMRSA-16 was identified at the outbreak hospital and a molecular signature test designed to distinguish variant isolates among further EMRSA-16 strains. FINDINGS: By whole-genome sequencing, EMRSA-16 isolates during the outbreak showed strikingly low genetic diversity (P < 1 × 10(-6), Monte Carlo test), compared with EMRSA-15 and EMRSA-16 isolates from before the outbreak or the comparator hospital, demonstrating the emergence of a clonal variant. The variant was indistinguishable from the ancestral strain by conventional typing. This clonal variant accounted for 64/72 (89%) of EMRSA-16 bacteraemia isolates at the outbreak hospital from 2006. CONCLUSIONS: Evolutionary changes in epidemic MRSA strains not detected by conventional typing may be associated with changes in disease epidemiology. Rapid and affordable technologies for whole-genome sequencing are becoming available with the potential to identify and track the emergence of variants of highly clonal organisms.

Wong TH, Dearlove BL, Hedge J, Giess AP, Piazza P, Trebes A, Paul J, Smit E et al. 2013. Whole genome sequencing and de novo assembly identifies Sydney-like variant noroviruses and recombinants during the winter 2012/2013 outbreak in England. Virol J, 10 (1), pp. 335. | Show Abstract | Read more

BACKGROUND: Norovirus is the commonest cause of epidemic gastroenteritis among people of all ages. Outbreaks frequently occur in hospitals and the community, costing the UK an estimated £110 m per annum. An evolutionary explanation for periodic increases in norovirus cases, despite some host-specific post immunity is currently limited to the identification of obvious recombinants. Our understanding could be significantly enhanced by full length genome sequences for large numbers of intensively sampled viruses, which would also assist control and vaccine design. Our objective is to develop rapid, high-throughput, end-to-end methods yielding complete norovirus genome sequences. We apply these methods to recent English outbreaks, placing them in the wider context of the international norovirus epidemic of winter 2012. METHOD: Norovirus sequences were generated from 28 unique clinical samples by Illumina RNA sequencing (RNA-Seq) of total faecal RNA. A range of de novo sequence assemblers were attempted. The best assembler was identified by validation against three replicate samples and two norovirus qPCR negative samples, together with an additional 20 sequences determined by PCR and fractional capillary sequencing. Phylogenetic methods were used to reconstruct evolutionary relationships from the whole genome sequences. RESULTS: Full length norovirus genomes were generated from 23/28 samples. 5/28 partial norovirus genomes were associated with low viral copy numbers. The de novo assembled sequences differed from sequences determined by capillary sequencing by <0.003%. Intra-host nucleotide sequence diversity was rare, but detectable by mapping short sequence reads onto its de novo assembled consensus. Genomes similar to the Sydney 2012 strain caused 78% (18/23) of cases, consistent with its previously documented association with the winter 2012 global outbreak. Interestingly, phylogenetic analysis and recombination detection analysis of the consensus sequences identified two related viruses as recombinants, containing sequences in prior circulation to Sydney 2012 in open reading frame (ORF) 2. CONCLUSION: Our approach facilitates the rapid determination of complete norovirus genomes. This method provides high resolution of full norovirus genomes which, when coupled with detailed epidemiology, may improve the understanding of evolution and control of this important healthcare-associated pathogen.

Eyre DW, Cule ML, Wilson DJ, Griffiths D, Vaughan A, O'Connor L, Ip CL, Golubchik T et al. 2013. Diverse sources of C. difficile infection identified on whole-genome sequencing. N Engl J Med, 369 (13), pp. 1195-1205. | Show Abstract | Read more

BACKGROUND: It has been thought that Clostridium difficile infection is transmitted predominantly within health care settings. However, endemic spread has hampered identification of precise sources of infection and the assessment of the efficacy of interventions. METHODS: From September 2007 through March 2011, we performed whole-genome sequencing on isolates obtained from all symptomatic patients with C. difficile infection identified in health care settings or in the community in Oxfordshire, United Kingdom. We compared single-nucleotide variants (SNVs) between the isolates, using C. difficile evolution rates estimated on the basis of the first and last samples obtained from each of 145 patients, with 0 to 2 SNVs expected between transmitted isolates obtained less than 124 days apart, on the basis of a 95% prediction interval. We then identified plausible epidemiologic links among genetically related cases from data on hospital admissions and community location. RESULTS: Of 1250 C. difficile cases that were evaluated, 1223 (98%) were successfully sequenced. In a comparison of 957 samples obtained from April 2008 through March 2011 with those obtained from September 2007 onward, a total of 333 isolates (35%) had no more than 2 SNVs from at least 1 earlier case, and 428 isolates (45%) had more than 10 SNVs from all previous cases. Reductions in incidence over time were similar in the two groups, a finding that suggests an effect of interventions targeting the transition from exposure to disease. Of the 333 patients with no more than 2 SNVs (consistent with transmission), 126 patients (38%) had close hospital contact with another patient, and 120 patients (36%) had no hospital or community contact with another patient. Distinct subtypes of infection continued to be identified throughout the study, which suggests a considerable reservoir of C. difficile. CONCLUSIONS: Over a 3-year period, 45% of C. difficile cases in Oxfordshire were genetically distinct from all previous cases. Genetically diverse sources, in addition to symptomatic patients, play a major part in C. difficile transmission. (Funded by the U.K. Clinical Research Collaboration Translational Infection Research Initiative and others.).

Batty EM, Wong TH, Trebes A, Argoud K, Attar M, Buck D, Ip CL, Golubchik T et al. 2013. A modified RNA-Seq approach for whole genome sequencing of RNA viruses from faecal and blood samples. PLoS One, 8 (6), pp. e66129. | Show Abstract | Read more

To date, very large scale sequencing of many clinically important RNA viruses has been complicated by their high population molecular variation, which creates challenges for polymerase chain reaction and sequencing primer design. Many RNA viruses are also difficult or currently not possible to culture, severely limiting the amount and purity of available starting material. Here, we describe a simple, novel, high-throughput approach to Norovirus and Hepatitis C virus whole genome sequence determination based on RNA shotgun sequencing (also known as RNA-Seq). We demonstrate the effectiveness of this method by sequencing three Norovirus samples from faeces and two Hepatitis C virus samples from blood, on an Illumina MiSeq benchtop sequencer. More than 97% of reference genomes were recovered. Compared with Sanger sequencing, our method had no nucleotide differences in 14,019 nucleotides (nt) for Noroviruses (from a total of 2 Norovirus genomes obtained with Sanger sequencing), and 8 variants in 9,542 nt for Hepatitis C virus (1 variant per 1,193 nt). The three Norovirus samples had 2, 3, and 2 distinct positions called as heterozygous, while the two Hepatitis C virus samples had 117 and 131 positions called as heterozygous. To confirm that our sample and library preparation could be scaled to true high-throughput, we prepared and sequenced an additional 77 Norovirus samples in a single batch on an Illumina HiSeq 2000 sequencer, recovering >90% of the reference genome in all but one sample. No discrepancies were observed across 118,757 nt compared between Sanger and our custom RNA-Seq method in 16 samples. By generating viral genomic sequences that are not biased by primer-specific amplification or enrichment, this method offers the prospect of large-scale, affordable studies of RNA viruses which could be adapted to routine diagnostic laboratory workflows in the near future, with the potential to directly characterize within-host viral diversity.

Eyre DW, Cule ML, Griffiths D, Crook DW, Peto TE, Walker AS, Wilson DJ. 2013. Detection of mixed infection from bacterial whole genome sequence data allows assessment of its role in Clostridium difficile transmission. PLoS Comput Biol, 9 (5), pp. e1003059. | Show Abstract | Read more

Bacterial whole genome sequencing offers the prospect of rapid and high precision investigation of infectious disease outbreaks. Close genetic relationships between microorganisms isolated from different infected cases suggest transmission is a strong possibility, whereas transmission between cases with genetically distinct bacterial isolates can be excluded. However, undetected mixed infections-infection with ≥2 unrelated strains of the same species where only one is sequenced-potentially impairs exclusion of transmission with certainty, and may therefore limit the utility of this technique. We investigated the problem by developing a computationally efficient method for detecting mixed infection without the need for resource-intensive independent sequencing of multiple bacterial colonies. Given the relatively low density of single nucleotide polymorphisms within bacterial sequence data, direct reconstruction of mixed infection haplotypes from current short-read sequence data is not consistently possible. We therefore use a two-step maximum likelihood-based approach, assuming each sample contains up to two infecting strains. We jointly estimate the proportion of the infection arising from the dominant and minor strains, and the sequence divergence between these strains. In cases where mixed infection is confirmed, the dominant and minor haplotypes are then matched to a database of previously sequenced local isolates. We demonstrate the performance of our algorithm with in silico and in vitro mixed infection experiments, and apply it to transmission of an important healthcare-associated pathogen, Clostridium difficile. Using hospital ward movement data in a previously described stochastic transmission model, 15 pairs of cases enriched for likely transmission events associated with mixed infection were selected. Our method identified four previously undetected mixed infections, and a previously undetected transmission event, but no direct transmission between the pairs of cases under investigation. These results demonstrate that mixed infections can be detected without additional sequencing effort, and this will be important in assessing the extent of cryptic transmission in our hospitals.

Golubchik T, Batty EM, Miller RR, Farr H, Young BC, Larner-Svensson H, Fung R, Godwin H et al. 2013. Within-host evolution of Staphylococcus aureus during asymptomatic carriage. PLoS One, 8 (5), pp. e61319. | Show Abstract | Read more

BACKGROUND: Staphylococcus aureus is a major cause of healthcare associated mortality, but like many important bacterial pathogens, it is a common constituent of the normal human body flora. Around a third of healthy adults are carriers. Recent evidence suggests that evolution of S. aureus during nasal carriage may be associated with progression to invasive disease. However, a more detailed understanding of within-host evolution under natural conditions is required to appreciate the evolutionary and mechanistic reasons why commensal bacteria such as S. aureus cause disease. Therefore we examined in detail the evolutionary dynamics of normal, asymptomatic carriage. Sequencing a total of 131 genomes across 13 singly colonized hosts using the Illumina platform, we investigated diversity, selection, population dynamics and transmission during the short-term evolution of S. aureus. PRINCIPAL FINDINGS: We characterized the processes by which the raw material for evolution is generated: micro-mutation (point mutation and small insertions/deletions), macro-mutation (large insertions/deletions) and the loss or acquisition of mobile elements (plasmids and bacteriophages). Through an analysis of synonymous, non-synonymous and intergenic mutations we discovered a fitness landscape dominated by purifying selection, with rare examples of adaptive change in genes encoding surface-anchored proteins and an enterotoxin. We found evidence for dramatic, hundred-fold fluctuations in the size of the within-host population over time, which we related to the cycle of colonization and clearance. Using a newly-developed population genetics approach to detect recent transmission among hosts, we revealed evidence for recent transmission between some of our subjects, including a husband and wife both carrying populations of methicillin-resistant S. aureus (MRSA). SIGNIFICANCE: This investigation begins to paint a picture of the within-host evolution of an important bacterial pathogen during its prevailing natural state, asymptomatic carriage. These results also have wider significance as a benchmark for future systematic studies of evolution during invasive S. aureus disease.

Dearlove B, Wilson DJ. 2013. Coalescent inference for infectious disease: meta-analysis of hepatitis C. Philos Trans R Soc Lond B Biol Sci, 368 (1614), pp. 20120314. | Show Abstract | Read more

Genetic analysis of pathogen genomes is a powerful approach to investigating the population dynamics and epidemic history of infectious diseases. However, the theoretical underpinnings of the most widely used, coalescent methods have been questioned, casting doubt on their interpretation. The aim of this study is to develop robust population genetic inference for compartmental models in epidemiology. Using a general approach based on the theory of metapopulations, we derive coalescent models under susceptible-infectious (SI), susceptible-infectious-susceptible (SIS) and susceptible-infectious-recovered (SIR) dynamics. We show that exponential and logistic growth models are equivalent to SI and SIS models, respectively, when co-infection is negligible. Implementing SI, SIS and SIR models in BEAST, we conduct a meta-analysis of hepatitis C epidemics, and show that we can directly estimate the basic reproductive number (R(0)) and prevalence under SIR dynamics. We find that differences in genetic diversity between epidemics can be explained by differences in underlying epidemiology (age of the epidemic and local population density) and viral subtype. Model comparison reveals SIR dynamics in three globally restricted epidemics, but most are better fit by the simpler SI dynamics. In summary, metapopulation models provide a general and practical framework for integrating epidemiology and population genetics for the purposes of joint inference.

Didelot X, Eyre DW, Cule M, Ip CL, Ansari MA, Griffiths D, Vaughan A, O'Connor L et al. 2012. Microevolutionary analysis of Clostridium difficile genomes to investigate transmission. Genome Biol, 13 (12), pp. R118. | Show Abstract | Read more

BACKGROUND: The control of Clostridium difficile infection is a major international healthcare priority, hindered by a limited understanding of transmission epidemiology for these bacteria. However, transmission studies of bacterial pathogens are rapidly being transformed by the advent of next generation sequencing. RESULTS: Here we sequence whole C. difficile genomes from 486 cases arising over four years in Oxfordshire. We show that we can estimate the times back to common ancestors of bacterial lineages with sufficient resolution to distinguish whether direct transmission is plausible or not. Time depths were inferred using a within-host evolutionary rate that we estimated at 1.4 mutations per genome per year based on serially isolated genomes. The subset of plausible transmissions was found to be highly associated with pairs of patients sharing time and space in hospital. Conversely, the large majority of pairs of genomes matched by conventional typing and isolated from patients within a month of each other were too distantly related to be direct transmissions. CONCLUSIONS: Our results confirm that nosocomial transmission between symptomatic C. difficile cases contributes far less to current rates of infection than has been widely assumed, which clarifies the importance of future research into other transmission routes, such as from asymptomatic carriers. With the costs of DNA sequencing rapidly falling and its use becoming more and more widespread, genomics will revolutionize our understanding of the transmission of bacterial pathogens.

Walker TM, Ip CL, Harrell RH, Evans JT, Kapatai G, Dedicoat MJ, Eyre DW, Wilson DJ et al. 2013. Whole-genome sequencing to delineate Mycobacterium tuberculosis outbreaks: a retrospective observational study. Lancet Infect Dis, 13 (2), pp. 137-146. | Show Abstract | Read more

BACKGROUND: Tuberculosis incidence in the UK has risen in the past decade. Disease control depends on epidemiological data, which can be difficult to obtain. Whole-genome sequencing can detect microevolution within Mycobacterium tuberculosis strains. We aimed to estimate the genetic diversity of related M tuberculosis strains in the UK Midlands and to investigate how this measurement might be used to investigate community outbreaks. METHODS: In a retrospective observational study, we used Illumina technology to sequence M tuberculosis genomes from an archive of frozen cultures. We characterised isolates into four groups: cross-sectional, longitudinal, household, and community. We measured pairwise nucleotide differences within hosts and between hosts in household outbreaks and estimated the rate of change in DNA sequences. We used the findings to interpret network diagrams constructed from 11 community clusters derived from mycobacterial interspersed repetitive-unit-variable-number tandem-repeat data. FINDINGS: We sequenced 390 separate isolates from 254 patients, including representatives from all five major lineages of M tuberculosis. The estimated rate of change in DNA sequences was 0.5 single nucleotide polymorphisms (SNPs) per genome per year (95% CI 0.3-0.7) in longitudinal isolates from 30 individuals and 25 families. Divergence is rarely higher than five SNPs in 3 years. 109 (96%) of 114 paired isolates from individuals and households differed by five or fewer SNPs. More than five SNPs separated isolates from none of 69 epidemiologically linked patients, two (15%) of 13 possibly linked patients, and 13 (17%) of 75 epidemiologically unlinked patients (three-way comparison exact p<0.0001). Genetic trees and clinical and epidemiological data suggest that super-spreaders were present in two community clusters. INTERPRETATION: Whole-genome sequencing can delineate outbreaks of tuberculosis and allows inference about direction of transmission between cases. The technique could identify super-spreaders and predict the existence of undiagnosed cases, potentially leading to early treatment of infectious patients and their contacts. FUNDING: Medical Research Council, Wellcome Trust, National Institute for Health Research, and the Health Protection Agency.

Wilson DJ. 2012. Insights from genomics into bacterial pathogen populations. PLoS Pathog, 8 (9), pp. e1002874. | Show Abstract | Read more

Bacterial pathogens impose a heavy burden of disease on human populations worldwide. The gravest threats are posed by highly virulent respiratory pathogens, enteric pathogens, and HIV-associated infections. Tuberculosis alone is responsible for the deaths of 1.5 million people annually. Treatment options for bacterial pathogens are being steadily eroded by the evolution and spread of drug resistance. However, population-level whole genome sequencing offers new hope in the fight against pathogenic bacteria. By providing insights into bacterial evolution and disease etiology, these approaches pave the way for novel interventions and therapeutic targets. Sequencing populations of bacteria across the whole genome provides unprecedented resolution to investigate (i) within-host evolution, (ii) transmission history, and (iii) population structure. Moreover, advances in rapid benchtop sequencing herald a new era of real-time genomics in which sequencing and analysis can be deployed within hours in response to rapidly changing public health emergencies. The purpose of this review is to highlight the transformative effect of population genomics on bacteriology, and to consider the prospects for answering abiding questions such as why bacteria cause disease.

Young BC, Wilson DJ. 2012. On the evolution of virulence during Staphylococcus aureus nasal carriage. Virulence, 3 (5), pp. 454-456. | Read more

Didelot X, Bowden R, Wilson DJ, Peto TE, Crook DW. 2012. Transforming clinical microbiology with bacterial genome sequencing. Nat Rev Genet, 13 (9), pp. 601-612. | Show Abstract | Read more

Whole-genome sequencing of bacteria has recently emerged as a cost-effective and convenient approach for addressing many microbiological questions. Here, we review the current status of clinical microbiology and how it has already begun to be transformed by using next-generation sequencing. We focus on three essential tasks: identifying the species of an isolate, testing its properties, such as resistance to antibiotics and virulence, and monitoring the emergence and spread of bacterial pathogens. We predict that the application of next-generation sequencing will soon be sufficiently fast, accurate and cheap to be used in routine clinical microbiology practice, where it could replace many complex current techniques with a single, more efficient workflow.

Young BC, Golubchik T, Batty EM, Fung R, Larner-Svensson H, Votintseva AA, Miller RR, Godwin H et al. 2012. Evolutionary dynamics of Staphylococcus aureus during progression from carriage to disease. Proc Natl Acad Sci U S A, 109 (12), pp. 4550-4555. | Show Abstract | Read more

Whole-genome sequencing offers new insights into the evolution of bacterial pathogens and the etiology of bacterial disease. Staphylococcus aureus is a major cause of bacteria-associated mortality and invasive disease and is carried asymptomatically by 27% of adults. Eighty percent of bacteremias match the carried strain. However, the role of evolutionary change in the pathogen during the progression from carriage to disease is incompletely understood. Here we use high-throughput genome sequencing to discover the genetic changes that accompany the transition from nasal carriage to fatal bloodstream infection in an individual colonized with methicillin-sensitive S. aureus. We found a single, cohesive population exhibiting a repertoire of 30 single-nucleotide polymorphisms and four insertion/deletion variants. Mutations accumulated at a steady rate over a 13-mo period, except for a cluster of mutations preceding the transition to disease. Although bloodstream bacteria differed by just eight mutations from the original nasally carried bacteria, half of those mutations caused truncation of proteins, including a premature stop codon in an AraC-family transcriptional regulator that has been implicated in pathogenicity. Comparison with evolution in two asymptomatic carriers supported the conclusion that clusters of protein-truncating mutations are highly unusual. Our results demonstrate that bacterial diversity in vivo is limited but nonetheless detectable by whole-genome sequencing, enabling the study of evolutionary dynamics within the host. Regulatory or structural changes that occur during carriage may be functionally important for pathogenesis; therefore identifying those changes is a crucial step in understanding the biological causes of invasive bacterial disease.

Cited:

189

Scopus

Didelot X, Bowden R, Wilson DJ, Peto TEA, Crook DW. 2012. Transforming clinical microbiology with bacterial genome sequencing Nature Reviews Genetics, 13 (9), pp. 601-612. | Show Abstract | Read more

Whole-genome sequencing of bacteria has recently emerged as a cost-effective and convenient approach for addressing many microbiological questions. Here, we review the current status of clinical microbiology and how it has already begun to be transformed by using next-generation sequencing. We focus on three essential tasks: identifying the species of an isolate, testing its properties, such as resistance to antibiotics and virulence, and monitoring the emergence and spread of bacterial pathogens. We predict that the application of next-generation sequencing will soon be sufficiently fast, accurate and cheap to be used in routine clinical microbiology practice, where it could replace many complex current techniques with a single, more efficient workflow. © 2012 Macmillan Publishers Limited. All rights reserved.

Eyre DW, Golubchik T, Gordon NC, Bowden R, Piazza P, Batty EM, Ip CL, Wilson DJ et al. 2012. A pilot study of rapid benchtop sequencing of Staphylococcus aureus and Clostridium difficile for outbreak detection and surveillance. BMJ Open, 2 (3), pp. e001124-e001124. | Show Abstract | Read more

OBJECTIVES: To investigate the prospects of newly available benchtop sequencers to provide rapid whole-genome data in routine clinical practice. Next-generation sequencing has the potential to resolve uncertainties surrounding the route and timing of person-to-person transmission of healthcare-associated infection, which has been a major impediment to optimal management. DESIGN: The authors used Illumina MiSeq benchtop sequencing to undertake case studies investigating potential outbreaks of methicillin-resistant Staphylococcus aureus (MRSA) and Clostridium difficile. SETTING: Isolates were obtained from potential outbreaks associated with three UK hospitals. PARTICIPANTS: Isolates were sequenced from a cluster of eight MRSA carriers and an associated bacteraemia case in an intensive care unit, another MRSA cluster of six cases and two clusters of C difficile. Additionally, all C difficile isolates from cases over 6 weeks in a single hospital were rapidly sequenced and compared with local strain sequences obtained in the preceding 3 years. MAIN OUTCOME MEASURE: Whole-genome genetic relatedness of the isolates within each epidemiological cluster. RESULTS: Twenty-six MRSA and 15 C difficile isolates were successfully sequenced and analysed within 5 days of culture. Both MRSA clusters were identified as outbreaks, with most sequences in each cluster indistinguishable and all within three single nucleotide variants (SNVs). Epidemiologically unrelated isolates of the same spa-type were genetically distinct (≥21 SNVs). In both C difficile clusters, closely epidemiologically linked cases (in one case sharing the same strain type) were shown to be genetically distinct (≥144 SNVs). A reconstruction applying rapid sequencing in C difficile surveillance provided early outbreak detection and identified previously undetected probable community transmission. CONCLUSIONS: This benchtop sequencing technology is widely generalisable to human bacterial pathogens. The findings provide several good examples of how rapid and precise sequencing could transform identification of transmission of healthcare-associated infection and therefore improve hospital infection control and patient outcomes in routine clinical practice.

Wilson DJ, Hernandez RD, Andolfatto P, Przeworski M. 2011. A population genetics-phylogenetics approach to inferring natural selection in coding sequences. PLoS Genet, 7 (12), pp. e1002395. | Show Abstract | Read more

Through an analysis of polymorphism within and divergence between species, we can hope to learn about the distribution of selective effects of mutations in the genome, changes in the fitness landscape that occur over time, and the location of sites involved in key adaptations that distinguish modern-day species. We introduce a novel method for the analysis of variation in selection pressures within and between species, spatially along the genome and temporally between lineages. We model codon evolution explicitly using a joint population genetics-phylogenetics approach that we developed for the construction of multiallelic models with mutation, selection, and drift. Our approach has the advantage of performing direct inference on coding sequences, inferring ancestral states probabilistically, utilizing allele frequency information, and generalizing to multiple species. We use a Bayesian sliding window model for intragenic variation in selection coefficients that efficiently combines information across sites and captures spatial clustering within the genome. To demonstrate the utility of the method, we infer selective pressures acting in Drosophila melanogaster and D. simulans from polymorphism and divergence data for 100 X-linked coding regions.

Muellner P, Marshall JC, Spencer SE, Noble AD, Shadbolt T, Collins-Emerson JM, Midwinter AC, Carter PE et al. 2011. Utilizing a combination of molecular and spatial tools to assess the effect of a public health intervention. Prev Vet Med, 102 (3), pp. 242-253. | Show Abstract | Read more

Until recently New Zealand had one of the highest rates of human campylobacteriosis reported by industrialized countries. Since the introduction of a range of control measures in the poultry production chain a reduction in human cases of around 50% has been observed nationwide. To inform risk managers a combination of spatial, temporal and molecular tools - including minimum spanning trees, risk surfaces, rarefaction analysis and dynamic source attribution modelling - was used in this study to formally evaluate the reduction in disease risk that occurred after the implementation of control measures in the poultry industry. Utilizing data from a sentinel surveillance site in the Manawatu region of New Zealand, our analyses demonstrated a reduction in disease risk attributable to a reduction in the number of poultry-associated campylobacteriosis cases. Before the implementation of interventions poultry-associated cases were more prevalent in urban than rural areas, whereas for ruminant-associated cases the reverse was evident. In addition to the overall reduction in prevalence, this study also showed a stronger intervention effect in urban areas where poultry sources were more dominant. Overall a combination of molecular and spatial tools has provided evidence that the interventions aimed at reducing Campylobacter contamination of poultry were successful in reducing poultry-associated disease and this will inform the development of future control strategies.

Gabriel E, Wilson DJ, Leatherbarrow AJ, Cheesbrough J, Gee S, Bolton E, Fox A, Fearnhead P, Hart CA, Diggle PJ. 2010. Spatio-temporal epidemiology of Campylobacter jejuni enteritis, in an area of Northwest England, 2000-2002. Epidemiol Infect, 138 (10), pp. 1384-1390. | Show Abstract | Read more

A total of 969 isolates of Campylobacter jejuni originating in the Preston, Lancashire postcode district over a 3-year period were characterized using multi-locus sequence typing. Recently developed statistical methods and a genetic model were used to investigate temporal, spatial, spatio-temporal and genetic variation in human C. jejuni infections. The analysis of the data showed statistically significant seasonal variation, spatial clustering, small-scale spatio-temporal clustering and spatio-temporal interaction in the overall pattern of incidence, and spatial segregation in cases classified according to their most likely species-of-origin.

Sousa TN, Tarazona-Santos EM, Wilson DJ, Madureira AP, Falcão PR, Fontes CJ, Gil LH, Ferreira MU, Carvalho LH, Brito CF. 2010. Genetic variability and natural selection at the ligand domain of the Duffy binding protein in Brazilian Plasmodium vivax populations. Malar J, 9 (1), pp. 334. | Show Abstract | Read more

BACKGROUND: Plasmodium vivax malaria is a major public health challenge in Latin America, Asia and Oceania, with 130-435 million clinical cases per year worldwide. Invasion of host blood cells by P. vivax mainly depends on a type I membrane protein called Duffy binding protein (PvDBP). The erythrocyte-binding motif of PvDBP is a 170 amino-acid stretch located in its cysteine-rich region II (PvDBPII), which is the most variable segment of the protein. METHODS: To test whether diversifying natural selection has shaped the nucleotide diversity of PvDBPII in Brazilian populations, this region was sequenced in 122 isolates from six different geographic areas. A Bayesian method was applied to test for the action of natural selection under a population genetic model that incorporates recombination. The analysis was integrated with a structural model of PvDBPII, and T- and B-cell epitopes were localized on the 3-D structure. RESULTS: The results suggest that: (i) recombination plays an important role in determining the haplotype structure of PvDBPII, and (ii) PvDBPII appears to contain neutrally evolving codons as well as codons evolving under natural selection. Diversifying selection preferentially acts on sites identified as epitopes, particularly on amino acid residues 417, 419, and 424, which show strong linkage disequilibrium. CONCLUSIONS: This study shows that some polymorphisms of PvDBPII are present near the erythrocyte-binding domain and might serve to elude antibodies that inhibit cell invasion. Therefore, these polymorphisms should be taken into account when designing vaccines aimed at eliciting antibodies to inhibit erythrocyte invasion.

Sheppard SK, Dallas JF, Wilson DJ, Strachan NJ, McCarthy ND, Jolley KA, Colles FM, Rotariu O, Ogden ID, Forbes KJ, Maiden MC. 2010. Evolution of an agriculture-associated disease causing Campylobacter coli clade: evidence from national surveillance data in Scotland. PLoS One, 5 (12), pp. e15708. | Show Abstract | Read more

The common zoonotic pathogen Campylobacter coli is an important cause of bacterial gastroenteritis worldwide but its evolution is incompletely understood. Using multilocus sequence type (MLST) data of 7 housekeeping genes from a national survey of Campylobacter in Scotland (2005/6), and a combined population genetic-phylogenetics approach, we investigated the evolutionary history of C. coli. Genealogical reconstruction of isolates from clinical infection, farm animals and the environment, revealed a three-clade genetic structure. The majority of farm animal, and all disease causing genotypes belonged to a single clade (clade 1) which had comparatively low synonymous sequence diversity, little deep branching genetic structure, and a higher number of shared alleles providing evidence of recent clonal decent. Calibration of the rate of molecular evolution, based on within-species genetic variation, estimated a more rapid rate of evolution than in traditional estimates. This placed the divergence of the clades at less than 2500 years ago, consistent with the introduction of an agricultural niche having had an effect upon the evolution of the C. coli clades. Attribution of clinical isolate genotypes to source, using an asymmetric island model, confirmed that strains from chicken and ruminants, and not pigs or turkeys, are the principal source of human C. coli infection. Taken together these analyses are consistent with an evolutionary scenario describing the emergence of agriculture-associated C. coli lineage that is an important human pathogen.

MULLNER P, SHADBOLT T, COLLINS-EMERSON JM, MIDWINTER AC, SPENCER SEF, MARSHALL J, CARTER PE, CAMPBELL DM et al. 2010. Molecular and spatial epidemiology of human campylobacteriosis: source association and genotype-related risk factors Epidemiology and Infection, pp. 1-12.

Mullner P, Shadbolt T, Collins-Emerson JM, Midwinter AC, Spencer SE, Marshall J, Carter PE, Campbell DM et al. 2010. Molecular and spatial epidemiology of human campylobacteriosis: source association and genotype-related risk factors. Epidemiol Infect, 138 (10), pp. 1372-1383. | Show Abstract | Read more

The epidemiology of human campylobacteriosis is complex but in recent years understanding of this disease has advanced considerably. Despite being a major public health concern in many countries, the presence of multiple hosts, genotypes and transmission pathways has made it difficult to identify and quantify the determinants of human infection and disease. This has delayed the development of successful intervention programmes for this disease in many countries including New Zealand, a country with a comparatively high, yet until recently poorly understood, rate of notified disease. This study investigated the epidemiology of Campylobacter jejuni at the genotype-level over a 3-year period between 2005 and 2008 using multilocus sequence typing. By combining epidemiological surveillance and population genetics, a dominant, internationally rare strain of C. jejuni (ST474) was identified, and most human cases (65.7%) were found to be caused by only seven different genotypes. Source association of genotypes was used to identify risk factors at the genotype-level through multivariable logistic regression and a spatial model. Poultry-associated cases were more likely to be found in urban areas compared to rural areas. In particular young children in rural areas had a higher risk of infection with ruminant strains than their urban counterparts. These findings provide important information for the implementation of pathway-specific control strategies.

GABRIEL E, WILSON DJ, LEATHERBARROW AJH, CHEESBROUGH J, GEE S, BOLTON E, FOX A, FEARNHEAD P, HART CA, DIGGLE PJ. 2010. Spatio-temporal epidemiology of Campylobacter jejuni enteritis, in an area of Northwest England, 2000-2002 Epidemiology and Infection, pp. 1-7.

Mullner P, Spencer SE, Wilson DJ, Jones G, Noble AD, Midwinter AC, Collins-Emerson JM, Carter P, Hathaway S, French NP. 2009. Assigning the source of human campylobacteriosis in New Zealand: a comparative genetic and epidemiological approach. Infect Genet Evol, 9 (6), pp. 1311-1319. | Show Abstract | Read more

Integrated surveillance of infectious multi-source diseases using a combination of epidemiology, ecology, genetics and evolution can provide a valuable risk-based approach for the control of important human pathogens. This includes a better understanding of transmission routes and the impact of human activities on the emergence of zoonoses. Until recently New Zealand had extraordinarily high and increasing rates of notified human campylobacteriosis, and our limited understanding of the source of these infections was hindering efforts to control this disease. Genetic and epidemiological modeling of a 3-year dataset comprising multilocus sequence typed isolates from human clinical cases, coupled with concurrent data on food and environmental sources, enabled us to estimate the relative importance of different sources of human disease. Our studies provided evidence that poultry was the leading cause of human campylobacteriosis in New Zealand, causing an estimated 58-76% of cases with widely varying contributions by individual poultry suppliers. These findings influenced national policy and, after the implementation of poultry industry-specific interventions, a dramatic decline in human notified cases was observed in 2008. The comparative-modeling and molecular sentinel surveillance approach proposed in this study provides new opportunities for the management of zoonotic diseases.

Brehony C, Wilson DJ, Maiden MC. 2009. Variation of the factor H-binding protein of Neisseria meningitidis. Microbiology, 155 (Pt 12), pp. 4155-4169. | Show Abstract | Read more

There is currently no comprehensive meningococcal vaccine, due to difficulties in immunizing against organisms expressing serogroup B capsules. To address this problem, subcapsular antigens, particularly the outer-membrane proteins (OMPs), are being investigated as candidate vaccine components. If immunogenic, however, such antigens are often antigenically variable, and knowledge of the extent and structuring of this diversity is an essential part of vaccine formulation. Factor H-binding protein (fHbp) is one such protein and is included in two vaccines under development. A survey of the diversity of the fHbp gene and the encoded protein in a representative sample of meningococcal isolates confirmed that variability in this protein is structured into two or three major groups, each with a substantial number of alleles that have some association with meningococcal clonal complexes and serogroups. A unified nomenclature scheme was devised to catalogue this diversity. Analysis of recombination and selection on the allele sequences demonstrated that parts of the gene are subject to positive selection, consistent with immune selection on the protein generating antigenic variation, particularly in the C-terminal region of the peptide sequence. The highest levels of selection were observed in regions corresponding to epitopes recognized by previously described bactericidal monoclonal antibodies.

Fledel-Alon A, Wilson DJ, Broman K, Wen X, Ober C, Coop G, Przeworski M. 2009. Broad-scale recombination patterns underlying proper disjunction in humans. PLoS Genet, 5 (9), pp. e1000658. | Show Abstract | Read more

Although recombination is essential to the successful completion of human meiosis, it remains unclear how tightly the process is regulated and over what scale. To assess the nature and stringency of constraints on human recombination, we examined crossover patterns in transmissions to viable, non-trisomic offspring, using dense genotyping data collected in a large set of pedigrees. Our analysis supports a requirement for one chiasma per chromosome rather than per arm to ensure proper disjunction, with additional chiasmata occurring in proportion to physical length. The requirement is not absolute, however, as chromosome 21 seems to be frequently transmitted properly in the absence of a chiasma in females, a finding that raises the possibility of a back-up mechanism aiding in its correct segregation. We also found a set of double crossovers in surprisingly close proximity, as expected from a second pathway that is not subject to crossover interference. These findings point to multiple mechanisms that shape the distribution of crossovers, influencing proper disjunction in humans.

Sheppard SK, Dallas JF, Strachan NJ, MacRae M, McCarthy ND, Wilson DJ, Gormley FJ, Falush D, Ogden ID, Maiden MC, Forbes KJ. 2009. Campylobacter genotyping to determine the source of human infection. Clin Infect Dis, 48 (8), pp. 1072-1078. | Show Abstract | Read more

BACKGROUND: Campylobacter species cause a high proportion of bacterial gastroenteritis cases and are a significant burden on health care systems and economies worldwide; however, the relative contributions of the various possible sources of infection in humans are unclear. METHODS: National-scale genotyping of Campylobacter species was used to quantify the relative importance of various possible sources of human infection. Multilocus sequence types were determined for 5674 isolates obtained from cases of human campylobacteriosis in Scotland from July 2005 through September 2006 and from 999 Campylobacter species isolates from 3417 contemporaneous samples from potential human infection sources. These data were supplemented with 2420 sequence types from other studies, representing isolates from a variety of sources. The clinical isolates were attributed to possible sources on the basis of their sequence types with use of 2 population genetic models, STRUCTURE and an asymmetric island model. RESULTS: The STRUCTURE and the asymmetric island models attributed most clinical isolates to chicken meat (58% and 78% of Campylobacter jejuni and 40% and 56% of Campylobacter coli isolates, respectively), identifying it as the principal source of Campylobacter infection in humans. Both models attributed the majority of the remaining isolates to ruminant sources, with relatively few isolates attributed to wild bird, environment, swine, and turkey sources. CONCLUSIONS: National-scale genotyping was a practical and efficient methodology for the quantification of the contributions of different sources to human Campylobacter infection. Combined with the knowledge that retail chicken is routinely contaminated with Campylobacter, these results are consistent with the view that the largest reductions in human campylobacteriosis in industrialized countries will come from interventions that focus on the poultry industry.

Wilson DJ, Gabriel E, Leatherbarrow AJ, Cheesbrough J, Gee S, Bolton E, Fox A, Hart CA, Diggle PJ, Fearnhead P. 2009. Rapid evolution and the importance of recombination to the gastroenteric pathogen Campylobacter jejuni. Mol Biol Evol, 26 (2), pp. 385-397. | Show Abstract | Read more

Responsible for the majority of bacterial gastroenteritis in the developed world, Campylobacter jejuni is a pervasive pathogen of humans and animals, but its evolution is obscure. In this paper, we exploit contemporary genetic diversity and empirical evidence to piece together the evolutionary history of C. jejuni and quantify its evolutionary potential. Our combined population genetics-phylogenetics approach reveals a surprising picture. Campylobacter jejuni is a rapidly evolving species, subject to intense purifying selection that purges 60% of novel variation, but possessing a massive evolutionary potential. The low mutation rate is offset by a large effective population size so that a mutation at any site can occur somewhere in the population within the space of a week. Recombination has a fundamental role, generating diversity at twice the rate of de novo mutation, and facilitating gene flow between C. jejuni and its sister species Campylobacter coli. We attempt to calibrate the rate of molecular evolution in C. jejuni based solely on within-species variation. The rates we obtain are up to 1,000 times faster than conventional estimates, placing the C. jejuni-C. coli split at the time of the Neolithic revolution. We weigh the plausibility of such recent bacterial evolution against alternative explanations and discuss the evidence required to settle the issue.

Wilson DJ, Gabriel E, Leatherbarrow AJ, Cheesbrough J, Gee S, Bolton E, Fox A, Fearnhead P, Hart CA, Diggle PJ. 2008. Tracing the source of campylobacteriosis. PLoS Genet, 4 (9), pp. e1000203. | Show Abstract | Read more

Campylobacter jejuni is the leading cause of bacterial gastro-enteritis in the developed world. It is thought to infect 2-3 million people a year in the US alone, at a cost to the economy in excess of US $4 billion. C. jejuni is a widespread zoonotic pathogen that is carried by animals farmed for meat and poultry. A connection with contaminated food is recognized, but C. jejuni is also commonly found in wild animals and water sources. Phylogenetic studies have suggested that genotypes pathogenic to humans bear greatest resemblance to non-livestock isolates. Moreover, seasonal variation in campylobacteriosis bears the hallmarks of water-borne disease, and certain outbreaks have been attributed to contamination of drinking water. As a result, the relative importance of these reservoirs to human disease is controversial. We use multilocus sequence typing to genotype 1,231 cases of C. jejuni isolated from patients in Lancashire, England. By modeling the DNA sequence evolution and zoonotic transmission of C. jejuni between host species and the environment, we assign human cases probabilistically to source populations. Our novel population genetics approach reveals that the vast majority (97%) of sporadic disease can be attributed to animals farmed for meat and poultry. Chicken and cattle are the principal sources of C. jejuni pathogenic to humans, whereas wild animal and environmental sources are responsible for just 3% of disease. Our results imply that the primary transmission route is through the food chain, and suggest that incidence could be dramatically reduced by enhanced on-farm biosecurity or preventing food-borne transmission.

Falush D, Torpdahl M, Didelot X, Conrad DF, Wilson DJ, Achtman M. 2006. Mismatch induced speciation in Salmonella: model and data. Philos Trans R Soc Lond B Biol Sci, 361 (1475), pp. 2045-2053. | Show Abstract | Read more

In bacteria, DNA sequence mismatches act as a barrier to recombination between distantly related organisms and can potentially promote the cohesion of species. We have performed computer simulations which show that the homology dependence of recombination can cause de novo speciation in a neutrally evolving population once a critical population size has been exceeded. Our model can explain the patterns of divergence and genetic exchange observed in the genus Salmonella, without invoking either natural selection or geographical population subdivision. If this model was validated, based on extensive sequence data, it would imply that the named subspecies of Salmonella enterica correspond to good biological species, making species boundaries objective. However, multilocus sequence typing data, analysed using several conventional tools, provide a misleading impression of relationships within S. enterica subspecies enterica and do not provide the resolution to establish whether new species are presently being formed.

Edwards CT, Holmes EC, Pybus OG, Wilson DJ, Viscidi RP, Abrams EJ, Phillips RE, Drummond AJ. 2006. Evolution of the human immunodeficiency virus envelope gene is dominated by purifying selection. Genetics, 174 (3), pp. 1441-1453. | Show Abstract | Read more

The evolution of the human immunodeficiency virus (HIV-1) during chronic infection involves the rapid, continuous turnover of genetic diversity. However, the role of natural selection, relative to random genetic drift, in governing this process is unclear. We tested a stochastic model of genetic drift using partial envelope sequences sampled longitudinally in 28 infected children. In each case the Bayesian posterior (empirical) distribution of coalescent genealogies was estimated using Markov chain Monte Carlo methods. Posterior predictive simulation was then used to generate a null distribution of genealogies assuming neutrality, with the null and empirical distributions compared using four genealogy-based summary statistics sensitive to nonneutral evolution. Because both null and empirical distributions were generated within a coalescent framework, we were able to explicitly account for the confounding influence of demography. From the distribution of corrected P-values across patients, we conclude that empirical genealogies are more asymmetric than expected if evolution is driven by mutation and genetic drift only, with an excess of low-frequency polymorphisms in the population. This indicates that although drift may still play an important role, natural selection has a strong influence on the evolution of HIV-1 envelope. A negative relationship between effective population size and substitution rate indicates that as the efficacy of selection increases, a smaller proportion of mutations approach fixation in the population. This suggests the presence of deleterious mutations. We therefore conclude that intrahost HIV-1 evolution in envelope is dominated by purifying selection against low-frequency deleterious mutations that do not reach fixation.

Wilson DJ, McVean G. 2006. Estimating diversifying selection and functional constraint in the presence of recombination. Genetics, 172 (3), pp. 1411-1425. | Show Abstract | Read more

Models of molecular evolution that incorporate the ratio of nonsynonymous to synonymous polymorphism (dN/dS ratio) as a parameter can be used to identify sites that are under diversifying selection or functional constraint in a sample of gene sequences. However, when there has been recombination in the evolutionary history of the sequences, reconstructing a single phylogenetic tree is not appropriate, and inference based on a single tree can give misleading results. In the presence of high levels of recombination, the identification of sites experiencing diversifying selection can suffer from a false-positive rate as high as 90%. We present a model that uses a population genetics approximation to the coalescent with recombination and use reversible-jump MCMC to perform Bayesian inference on both the dN/dS ratio and the recombination rate, allowing each to vary along the sequence. We demonstrate that the method has the power to detect variation in the dN/dS ratio and the recombination rate and does not suffer from a high false-positive rate. We use the method to analyze the porB gene of Neisseria meningitidis and verify the inferences using prior sensitivity analysis and model criticism techniques.

Edwards CT, Holmes EC, Wilson DJ, Viscidi RP, Abrams EJ, Phillips RE, Drummond AJ. 2006. Population genetic estimation of the loss of genetic diversity during horizontal transmission of HIV-1. BMC Evol Biol, 6 pp. 28. | Show Abstract | Read more

BACKGROUND: Genetic diversity of the human immunodeficiency virus type 1 (HIV-1) population within an individual is lost during transmission to a new host. The demography of transmission is an important determinant of evolutionary dynamics, particularly the relative impact of natural selection and genetic drift immediately following HIV-1 infection. Despite this, the magnitude of this population bottleneck is unclear. RESULTS: We use coalescent methods to quantify the bottleneck in a single case of homosexual transmission and find that over 99% of the env and gag diversity present in the donor is lost. This was consistent with the diversity present at seroconversion in nine other horizontally infected individuals. Furthermore, we estimated viral diversity at birth in 27 infants infected through vertical transmission and found there to be no difference between the two modes of transmission. CONCLUSION: Assuming the bottleneck at transmission is selectively neutral, such a severe reduction in genetic diversity has important implications for adaptation in HIV-1, since beneficial mutations have a reduced chance of transmission.

Jolley KA, Wilson DJ, Kriz P, McVean G, Maiden MC. 2005. The influence of mutation, recombination, population history, and selection on patterns of genetic diversity in Neisseria meningitidis. Mol Biol Evol, 22 (3), pp. 562-569. | Show Abstract | Read more

Patterns of genetic diversity within populations of human pathogens, shaped by the ecology of host-microbe interactions, contain important information about the epidemiological history of infectious disease. Exploiting this information, however, requires a systematic approach that distinguishes the genetic signal generated by epidemiological processes from the effects of other forces, such as recombination, mutation, and population history. Here, a variety of quantitative techniques were employed to investigate multilocus sequence information from isolate collections of Neisseria meningitidis, a major cause of meningitis and septicemia world wide. This allowed quantitative evaluation of alternative explanations for the observed population structure. A coalescent-based approach was employed to estimate the rate of mutation, the rate of recombination, and the size distribution of recombination fragments from samples from disease-associated and carried meningococci obtained in the Czech Republic in 1993 and a global collection of disease-associated isolates collected globally from 1937 to 1996. The parameter estimates were used to reject a model in which genetic structure arose by chance in small populations, and analysis of molecular variation showed that geographically restricted gene flow was unlikely to be the cause of the genetic structure. The genetic differentiation between disease and carriage isolate collections indicated that, whereas certain genotypes were overrepresented among the disease-isolate collections (the "hyperinvasive" lineages), disease-associated and carried meningococci exhibited remarkably little differentiation at the level of individual nucleotide polymorphisms. In combination, these results indicated the repeated action of natural selection on meningococcal populations, possibly arising from the coevolutionary dynamic of host-pathogen interactions.

Wilson DJ, Falush D, McVean G. 2005. Germs, genomes and genealogies. Trends Ecol Evol, 20 (1), pp. 39-45. | Show Abstract | Read more

Genetic diversity in pathogen species contains information about evolutionary and epidemiological processes, including the origins and history of disease, the nature of the selective forces acting on pathogen genes and the role of recombination in generating genetic novelty. Here, we review recent developments in these fields and compare the use of population genetic, or population-model based, approaches to phylogenetic, or population-model free, methodologies. We show how simple epidemiological models can be related to the ancestral, or coalescent, process underlying samples from pathogen species, enabling detailed inference about pathogen biology from patterns of molecular variation.

Yazdankhah SP, Kriz P, Tzanakaki G, Kremastinou J, Kalmusova J, Musilek M, Alvestad T, Jolley KA et al. 2004. Distribution of serogroups and genotypes among disease-associated and carried isolates of Neisseria meningitidis from the Czech Republic, Greece, and Norway. J Clin Microbiol, 42 (11), pp. 5146-5153. | Show Abstract | Read more

The distribution of serogroups and multilocus sequence types (STs) in collections of disease-associated and carried meningococci from the period 1991 to 2000 in three European countries (the Czech Republic, Greece, and Norway) was investigated. A total of 314 patient isolates and 353 isolates from asymptomatic carriers were characterized. The frequency distributions of serogroups and clone complexes differed among countries and between disease and carrier isolate collections. Highly significant differentiation was seen at each housekeeping locus. A marked positive association of serogroup C with disease was evidenced. The ST-11 complex was strongly positively associated with disease; associations for other clone complexes were weaker. The genetic diversity of the clone complexes differed. A single ST dominated the ST-11 clone complex, while the ST-41/44 complex exhibited greater levels of diversity. These data robustly demonstrated differences in the distribution of meningococcal genotypes in disease and carrier isolates and among countries. Further, they indicated that differences in genotype diversity and pathogenicity exist between meningococcal clone complexes.

Das S, Lindemann C, Young BC, Muller J, Österreich B, Ternette N, Winkler AC, Paprotka K et al. 2016. Natural mutations in a Staphylococcus aureus virulence regulator attenuate cytotoxicity but permit bacteremia and abscess formation. Proc Natl Acad Sci U S A, 113 (22), pp. E3101-E3110. | Show Abstract | Read more

Staphylococcus aureus is a major bacterial pathogen, which causes severe blood and tissue infections that frequently emerge by autoinfection with asymptomatically carried nose and skin populations. However, recent studies report that bloodstream isolates differ systematically from those found in the nose and skin, exhibiting reduced toxicity toward leukocytes. In two patients, an attenuated toxicity bloodstream infection evolved from an asymptomatically carried high-toxicity nasal strain by loss-of-function mutations in the gene encoding the transcription factor repressor of surface proteins (rsp). Here, we report that rsp knockout mutants lead to global transcriptional and proteomic reprofiling, and they exhibit the greatest signal in a genome-wide screen for genes influencing S. aureus survival in human cells. This effect is likely to be mediated in part via SSR42, a long-noncoding RNA. We show that rsp controls SSR42 expression, is induced by hydrogen peroxide, and is required for normal cytotoxicity and hemolytic activity. Rsp inactivation in laboratory- and bacteremia-derived mutants attenuates toxin production, but up-regulates other immune subversion proteins and reduces lethality during experimental infection. Crucially, inactivation of rsp preserves bacterial dissemination, because it affects neither formation of deep abscesses in mice nor survival in human blood. Thus, we have identified a spontaneously evolving, attenuated-cytotoxicity, nonhemolytic S. aureus phenotype, controlled by a pleiotropic transcriptional regulator/noncoding RNA virulence regulatory system, capable of causing S. aureus bloodstream infections. Such a phenotype could promote deep infection with limited early clinical manifestations, raising concerns that bacterial evolution within the human body may contribute to severe infection.

Sheppard AE, Stoesser N, Wilson DJ, Sebra R, Kasarskis A, Anson LW, Giess A, Pankhurst LJ et al. 2016. Nested Russian Doll-Like Genetic Mobility Drives Rapid Dissemination of the Carbapenem Resistance Gene blaKPC. Antimicrob Agents Chemother, 60 (6), pp. 3767-3778. | Show Abstract | Read more

The recent widespread emergence of carbapenem resistance in Enterobacteriaceae is a major public health concern, as carbapenems are a therapy of last resort against this family of common bacterial pathogens. Resistance genes can mobilize via various mechanisms, including conjugation and transposition; however, the importance of this mobility in short-term evolution, such as within nosocomial outbreaks, is unknown. Using a combination of short- and long-read whole-genome sequencing of 281 blaKPC-positive Enterobacteriaceae isolates from a single hospital over 5 years, we demonstrate rapid dissemination of this carbapenem resistance gene to multiple species, strains, and plasmids. Mobility of blaKPC occurs at multiple nested genetic levels, with transmission of blaKPC strains between individuals, frequent transfer of blaKPC plasmids between strains/species, and frequent transposition of blaKPC transposon Tn4401 between plasmids. We also identify a common insertion site for Tn4401 within various Tn2-like elements, suggesting that homologous recombination between Tn2-like elements has enhanced the spread of Tn4401 between different plasmid vectors. Furthermore, while short-read sequencing has known limitations for plasmid assembly, various studies have attempted to overcome this by the use of reference-based methods. We also demonstrate that, as a consequence of the genetic mobility observed in this study, plasmid structures can be extremely dynamic, and therefore these reference-based methods, as well as traditional partial typing methods, can produce very misleading conclusions. Overall, our findings demonstrate that nonclonal resistance gene dissemination can be extremely rapid, presenting significant challenges for public health surveillance and achieving effective control of antibiotic resistance.

Didelot X, Wilson DJ. 2015. ClonalFrameML: efficient inference of recombination in whole bacterial genomes. PLoS Comput Biol, 11 (2), pp. e1004041. | Show Abstract | Read more

Recombination is an important evolutionary force in bacteria, but it remains challenging to reconstruct the imports that occurred in the ancestry of a genomic sample. Here we present ClonalFrameML, which uses maximum likelihood inference to simultaneously detect recombination in bacterial genomes and account for it in phylogenetic reconstruction. ClonalFrameML can analyse hundreds of genomes in a matter of hours, and we demonstrate its usefulness on simulated and real datasets. We find evidence for recombination hotspots associated with mobile elements in Clostridium difficile ST6 and a previously undescribed 310kb chromosomal replacement in Staphylococcus aureus ST582. ClonalFrameML is freely available at http://clonalframeml.googlecode.com/.

Hedge J, Wilson DJ. 2014. Bacterial phylogenetic reconstruction from whole genomes is robust to recombination but demographic inference is not. MBio, 5 (6), pp. e02158. | Show Abstract | Read more

UNLABELLED: Phylogenetic inference in bacterial genomics is fundamental to understanding problems such as population history, antimicrobial resistance, and transmission dynamics. The field has been plagued by an apparent state of contradiction since the distorting effects of recombination on phylogeny were discovered more than a decade ago. Researchers persist with detailed phylogenetic analyses while simultaneously acknowledging that recombination seriously misleads inference of population dynamics and selection. Here we resolve this paradox by showing that phylogenetic tree topologies based on whole genomes robustly reconstruct the clonal frame topology but that branch lengths are badly skewed. Surprisingly, removing recombining sites can exacerbate branch length distortion caused by recombination. IMPORTANCE: Phylogenetic tree reconstruction is a popular approach for understanding the relatedness of bacteria in a population from differences in their genome sequences. However, bacteria frequently exchange regions of their genomes by a process called homologous recombination, which violates a fundamental assumption of phylogenetic methods. Since many researchers continue to use phylogenetics for recombining bacteria, it is important to understand how recombination affects the conclusions drawn from these analyses. We find that whole-genome sequences afford great accuracy in reconstructing evolutionary relationships despite concerns surrounding the presence of recombination, but the branch lengths of the phylogenetic tree are indeed badly distorted. Surprisingly, methods to reduce the impact of recombination on branch lengths can exacerbate the problem.

Everitt RG, Didelot X, Batty EM, Miller RR, Knox K, Young BC, Bowden R, Auton A et al. 2014. Mobile elements drive recombination hotspots in the core genome of Staphylococcus aureus. Nat Commun, 5 pp. 3956. | Show Abstract | Read more

Horizontal gene transfer is an important driver of bacterial evolution, but genetic exchange in the core genome of clonal species, including the major pathogen Staphylococcus aureus, is incompletely understood. Here we reveal widespread homologous recombination in S. aureus at the species level, in contrast to its near-complete absence between closely related strains. We discover a patchwork of hotspots and coldspots at fine scales falling against a backdrop of broad-scale trends in rate variation. Over megabases, homoplasy rates fluctuate 1.9-fold, peaking towards the origin-of-replication. Over kilobases, we find core recombination hotspots of up to 2.5-fold enrichment situated near fault lines in the genome associated with mobile elements. The strongest hotspots include regions flanking conjugative transposon ICE6013, the staphylococcal cassette chromosome (SCC) and genomic island νSaα. Mobile element-driven core genome transfer represents an opportunity for adaptation and challenges our understanding of the recombination landscape in predominantly clonal pathogens, with important implications for genotype-phenotype mapping.

Eyre DW, Cule ML, Wilson DJ, Griffiths D, Vaughan A, O'Connor L, Ip CL, Golubchik T et al. 2013. Diverse sources of C. difficile infection identified on whole-genome sequencing. N Engl J Med, 369 (13), pp. 1195-1205. | Show Abstract | Read more

BACKGROUND: It has been thought that Clostridium difficile infection is transmitted predominantly within health care settings. However, endemic spread has hampered identification of precise sources of infection and the assessment of the efficacy of interventions. METHODS: From September 2007 through March 2011, we performed whole-genome sequencing on isolates obtained from all symptomatic patients with C. difficile infection identified in health care settings or in the community in Oxfordshire, United Kingdom. We compared single-nucleotide variants (SNVs) between the isolates, using C. difficile evolution rates estimated on the basis of the first and last samples obtained from each of 145 patients, with 0 to 2 SNVs expected between transmitted isolates obtained less than 124 days apart, on the basis of a 95% prediction interval. We then identified plausible epidemiologic links among genetically related cases from data on hospital admissions and community location. RESULTS: Of 1250 C. difficile cases that were evaluated, 1223 (98%) were successfully sequenced. In a comparison of 957 samples obtained from April 2008 through March 2011 with those obtained from September 2007 onward, a total of 333 isolates (35%) had no more than 2 SNVs from at least 1 earlier case, and 428 isolates (45%) had more than 10 SNVs from all previous cases. Reductions in incidence over time were similar in the two groups, a finding that suggests an effect of interventions targeting the transition from exposure to disease. Of the 333 patients with no more than 2 SNVs (consistent with transmission), 126 patients (38%) had close hospital contact with another patient, and 120 patients (36%) had no hospital or community contact with another patient. Distinct subtypes of infection continued to be identified throughout the study, which suggests a considerable reservoir of C. difficile. CONCLUSIONS: Over a 3-year period, 45% of C. difficile cases in Oxfordshire were genetically distinct from all previous cases. Genetically diverse sources, in addition to symptomatic patients, play a major part in C. difficile transmission. (Funded by the U.K. Clinical Research Collaboration Translational Infection Research Initiative and others.).

Eyre DW, Cule ML, Griffiths D, Crook DW, Peto TE, Walker AS, Wilson DJ. 2013. Detection of mixed infection from bacterial whole genome sequence data allows assessment of its role in Clostridium difficile transmission. PLoS Comput Biol, 9 (5), pp. e1003059. | Show Abstract | Read more

Bacterial whole genome sequencing offers the prospect of rapid and high precision investigation of infectious disease outbreaks. Close genetic relationships between microorganisms isolated from different infected cases suggest transmission is a strong possibility, whereas transmission between cases with genetically distinct bacterial isolates can be excluded. However, undetected mixed infections-infection with ≥2 unrelated strains of the same species where only one is sequenced-potentially impairs exclusion of transmission with certainty, and may therefore limit the utility of this technique. We investigated the problem by developing a computationally efficient method for detecting mixed infection without the need for resource-intensive independent sequencing of multiple bacterial colonies. Given the relatively low density of single nucleotide polymorphisms within bacterial sequence data, direct reconstruction of mixed infection haplotypes from current short-read sequence data is not consistently possible. We therefore use a two-step maximum likelihood-based approach, assuming each sample contains up to two infecting strains. We jointly estimate the proportion of the infection arising from the dominant and minor strains, and the sequence divergence between these strains. In cases where mixed infection is confirmed, the dominant and minor haplotypes are then matched to a database of previously sequenced local isolates. We demonstrate the performance of our algorithm with in silico and in vitro mixed infection experiments, and apply it to transmission of an important healthcare-associated pathogen, Clostridium difficile. Using hospital ward movement data in a previously described stochastic transmission model, 15 pairs of cases enriched for likely transmission events associated with mixed infection were selected. Our method identified four previously undetected mixed infections, and a previously undetected transmission event, but no direct transmission between the pairs of cases under investigation. These results demonstrate that mixed infections can be detected without additional sequencing effort, and this will be important in assessing the extent of cryptic transmission in our hospitals.

Golubchik T, Batty EM, Miller RR, Farr H, Young BC, Larner-Svensson H, Fung R, Godwin H et al. 2013. Within-host evolution of Staphylococcus aureus during asymptomatic carriage. PLoS One, 8 (5), pp. e61319. | Show Abstract | Read more

BACKGROUND: Staphylococcus aureus is a major cause of healthcare associated mortality, but like many important bacterial pathogens, it is a common constituent of the normal human body flora. Around a third of healthy adults are carriers. Recent evidence suggests that evolution of S. aureus during nasal carriage may be associated with progression to invasive disease. However, a more detailed understanding of within-host evolution under natural conditions is required to appreciate the evolutionary and mechanistic reasons why commensal bacteria such as S. aureus cause disease. Therefore we examined in detail the evolutionary dynamics of normal, asymptomatic carriage. Sequencing a total of 131 genomes across 13 singly colonized hosts using the Illumina platform, we investigated diversity, selection, population dynamics and transmission during the short-term evolution of S. aureus. PRINCIPAL FINDINGS: We characterized the processes by which the raw material for evolution is generated: micro-mutation (point mutation and small insertions/deletions), macro-mutation (large insertions/deletions) and the loss or acquisition of mobile elements (plasmids and bacteriophages). Through an analysis of synonymous, non-synonymous and intergenic mutations we discovered a fitness landscape dominated by purifying selection, with rare examples of adaptive change in genes encoding surface-anchored proteins and an enterotoxin. We found evidence for dramatic, hundred-fold fluctuations in the size of the within-host population over time, which we related to the cycle of colonization and clearance. Using a newly-developed population genetics approach to detect recent transmission among hosts, we revealed evidence for recent transmission between some of our subjects, including a husband and wife both carrying populations of methicillin-resistant S. aureus (MRSA). SIGNIFICANCE: This investigation begins to paint a picture of the within-host evolution of an important bacterial pathogen during its prevailing natural state, asymptomatic carriage. These results also have wider significance as a benchmark for future systematic studies of evolution during invasive S. aureus disease.

Dearlove B, Wilson DJ. 2013. Coalescent inference for infectious disease: meta-analysis of hepatitis C. Philos Trans R Soc Lond B Biol Sci, 368 (1614), pp. 20120314. | Show Abstract | Read more

Genetic analysis of pathogen genomes is a powerful approach to investigating the population dynamics and epidemic history of infectious diseases. However, the theoretical underpinnings of the most widely used, coalescent methods have been questioned, casting doubt on their interpretation. The aim of this study is to develop robust population genetic inference for compartmental models in epidemiology. Using a general approach based on the theory of metapopulations, we derive coalescent models under susceptible-infectious (SI), susceptible-infectious-susceptible (SIS) and susceptible-infectious-recovered (SIR) dynamics. We show that exponential and logistic growth models are equivalent to SI and SIS models, respectively, when co-infection is negligible. Implementing SI, SIS and SIR models in BEAST, we conduct a meta-analysis of hepatitis C epidemics, and show that we can directly estimate the basic reproductive number (R(0)) and prevalence under SIR dynamics. We find that differences in genetic diversity between epidemics can be explained by differences in underlying epidemiology (age of the epidemic and local population density) and viral subtype. Model comparison reveals SIR dynamics in three globally restricted epidemics, but most are better fit by the simpler SI dynamics. In summary, metapopulation models provide a general and practical framework for integrating epidemiology and population genetics for the purposes of joint inference.

Wilson DJ. 2012. Insights from genomics into bacterial pathogen populations. PLoS Pathog, 8 (9), pp. e1002874. | Show Abstract | Read more

Bacterial pathogens impose a heavy burden of disease on human populations worldwide. The gravest threats are posed by highly virulent respiratory pathogens, enteric pathogens, and HIV-associated infections. Tuberculosis alone is responsible for the deaths of 1.5 million people annually. Treatment options for bacterial pathogens are being steadily eroded by the evolution and spread of drug resistance. However, population-level whole genome sequencing offers new hope in the fight against pathogenic bacteria. By providing insights into bacterial evolution and disease etiology, these approaches pave the way for novel interventions and therapeutic targets. Sequencing populations of bacteria across the whole genome provides unprecedented resolution to investigate (i) within-host evolution, (ii) transmission history, and (iii) population structure. Moreover, advances in rapid benchtop sequencing herald a new era of real-time genomics in which sequencing and analysis can be deployed within hours in response to rapidly changing public health emergencies. The purpose of this review is to highlight the transformative effect of population genomics on bacteriology, and to consider the prospects for answering abiding questions such as why bacteria cause disease.

Young BC, Golubchik T, Batty EM, Fung R, Larner-Svensson H, Votintseva AA, Miller RR, Godwin H et al. 2012. Evolutionary dynamics of Staphylococcus aureus during progression from carriage to disease. Proc Natl Acad Sci U S A, 109 (12), pp. 4550-4555. | Show Abstract | Read more

Whole-genome sequencing offers new insights into the evolution of bacterial pathogens and the etiology of bacterial disease. Staphylococcus aureus is a major cause of bacteria-associated mortality and invasive disease and is carried asymptomatically by 27% of adults. Eighty percent of bacteremias match the carried strain. However, the role of evolutionary change in the pathogen during the progression from carriage to disease is incompletely understood. Here we use high-throughput genome sequencing to discover the genetic changes that accompany the transition from nasal carriage to fatal bloodstream infection in an individual colonized with methicillin-sensitive S. aureus. We found a single, cohesive population exhibiting a repertoire of 30 single-nucleotide polymorphisms and four insertion/deletion variants. Mutations accumulated at a steady rate over a 13-mo period, except for a cluster of mutations preceding the transition to disease. Although bloodstream bacteria differed by just eight mutations from the original nasally carried bacteria, half of those mutations caused truncation of proteins, including a premature stop codon in an AraC-family transcriptional regulator that has been implicated in pathogenicity. Comparison with evolution in two asymptomatic carriers supported the conclusion that clusters of protein-truncating mutations are highly unusual. Our results demonstrate that bacterial diversity in vivo is limited but nonetheless detectable by whole-genome sequencing, enabling the study of evolutionary dynamics within the host. Regulatory or structural changes that occur during carriage may be functionally important for pathogenesis; therefore identifying those changes is a crucial step in understanding the biological causes of invasive bacterial disease.

Cited:

189

Scopus

Didelot X, Bowden R, Wilson DJ, Peto TEA, Crook DW. 2012. Transforming clinical microbiology with bacterial genome sequencing Nature Reviews Genetics, 13 (9), pp. 601-612. | Show Abstract | Read more

Whole-genome sequencing of bacteria has recently emerged as a cost-effective and convenient approach for addressing many microbiological questions. Here, we review the current status of clinical microbiology and how it has already begun to be transformed by using next-generation sequencing. We focus on three essential tasks: identifying the species of an isolate, testing its properties, such as resistance to antibiotics and virulence, and monitoring the emergence and spread of bacterial pathogens. We predict that the application of next-generation sequencing will soon be sufficiently fast, accurate and cheap to be used in routine clinical microbiology practice, where it could replace many complex current techniques with a single, more efficient workflow. © 2012 Macmillan Publishers Limited. All rights reserved.

Wilson DJ, Hernandez RD, Andolfatto P, Przeworski M. 2011. A population genetics-phylogenetics approach to inferring natural selection in coding sequences. PLoS Genet, 7 (12), pp. e1002395. | Show Abstract | Read more

Through an analysis of polymorphism within and divergence between species, we can hope to learn about the distribution of selective effects of mutations in the genome, changes in the fitness landscape that occur over time, and the location of sites involved in key adaptations that distinguish modern-day species. We introduce a novel method for the analysis of variation in selection pressures within and between species, spatially along the genome and temporally between lineages. We model codon evolution explicitly using a joint population genetics-phylogenetics approach that we developed for the construction of multiallelic models with mutation, selection, and drift. Our approach has the advantage of performing direct inference on coding sequences, inferring ancestral states probabilistically, utilizing allele frequency information, and generalizing to multiple species. We use a Bayesian sliding window model for intragenic variation in selection coefficients that efficiently combines information across sites and captures spatial clustering within the genome. To demonstrate the utility of the method, we infer selective pressures acting in Drosophila melanogaster and D. simulans from polymorphism and divergence data for 100 X-linked coding regions.

Genome Evolution in Bacterial Infectious Diseases

Why do bacteria cause disease? Are there genetic differences between bacteria that affect disease severity? Does natural selection act on bacteria within the body to promote disease or attenuate infection?We are using statistical genetics and evolutionary biology to understand how mutations in the genome and changes in gene expression affect virulence - the severity or frequency of infection.Bacterial diseases are leading causes of mortality worldwide, exerting a profound effect onglobal ...

View project

Developing a vaccine against Staphylococcus aureus

The high human and economic burden of S. aureus disease in man, as well as its impact in agriculture, continue to stimulate interest in an effective vaccine against Staphylococcus aureus.Successful selection of antigens for such a vaccine represents a significant challenge, due to variation in S. aureus behaviour between strains and between in vitro conditions, where it is relatively easily studied, and in vivo infection.  We have recently developed tools which identify both virulent and less ...

View project

209

Thank you for registering your interest

We were unable to record your request to register for interest in future opportunities. Please try again and if problems persist contact us at webteam@ndm.ox.ac.uk