Prediction of Mycobacterium tuberculosis drug resistance through genome sequencing clinical samples

Project Overview

CRyPTIC - a global initiative to understand M. tuberculosis drug resistance

The World Health Organization estimates that two billion people are infected with tuberculosis, with 9 million active cases and 1.5 million deaths per year. Of these, an estimated 480,000 cases are multi-drug-resistant, posing one of the greatest obstacles to the WHO’s ambition to ‘end tuberculosis’ by 2035. Around the globe drug susceptibility results are currently often unavailable, or only available after weeks or months of waiting, but researchers from the University of Oxford and Public Health England are leading international efforts to develop rapid tests using genomics to identify complete drug susceptibility patterns within hours or days. This could allow the correct treatment to be given in a timely manner, potentially minimizing both individual morbidity and mortality, and the chances of onward transmission.

The candidate will participate in an international effort to sequence a global collection of up 100.000 M. tuberculosis genomes, with the overall aim to characterize a catalogue of mutations underlying resistance to each anti-tuberculosis drug. In particular, the candidate will have the opportunity to work on the sequencing of primary clinical samples and contribute towards the development of near-to-patient, real time testing. The candidate will work within a large multi-disciplinary group where they will be able to draw on the expertise of laboratory scientists, clinicians, computational biologists and bioinformaticians.     

Prof Tim Peto is an Infectious diseases clinician, and joint PI for the Modernising Medical Microbiology (MMM) group at the University of Oxford, Nuffield Department of Medicine. Timothy Walker is an Academic Clinical Lecturer in Infectious Diseases and Microbiology, working on mycobacterial genomics in the same department.

This project will be part of the CRyPTIC consortium, a newly established international effort that has its foundations in the MMM group led by Professor Derrick Crook, clinical microbiologist at the University of Oxford and director of the National Infection Service at Public Health England. CRyPTIC is supported by a multimillion pound funding portfolio from the Wellcome Trust and the Bill and Melinda Gates Foundation. MMM has a strong record of publishing in the top internationally recognized journals, and our research has impacted on the delivery of public health and microbiology in Britain and beyond. For more information visit

Training Opportunities

The Modernising Medical Microbiology consortium provides an excellent research environment in which to develop new skills and train among world-leading scientists in their field. Based at the John Radcliffe Hospital, the University of Oxford team consists of a community of research groups led by Profs Derrick Crook, Tim Peto, Sarah Walker and Drs David Clifton, Kate Dingle, Phil Fowler, Zam Iqbal, Danny Wilson and David Wyllie. We have specialist expertise in microbiology, genomics, statistics, epidemiology and bioinformatics, and our work focuses on understanding the causes of infectious disease in populations. Training is provided by weekly supervisory meetings, weekly Modernising Medical Microbiology work-in-progress meetings, journal clubs, seminar series, and external opportunities including attending national and international conferences. The department and university run training courses, while the Department for Continuing Education and the Language Centre offer further opportunities for personal development to research students at Oxford.


Immunology & Infectious Disease and Tropical Medicine & Global Health


Project reference number: 848

Funding and admissions information


Name Department Institution Country Email
Professor Tim Peto Experimental Medicine Division Oxford University, John Radcliffe Hospital GBR
Dr Timothy M Walker Nuffield Department of Medicine University of Oxford GBR

Pankhurst LJ, Del Ojo Elias C, Votintseva AA, Walker TM, Cole K, Davies J, Fermont JM, Gascoyne-Binzi DM, Kohl TA, Kong C, Lemaitre N, Niemann S, Paul J, Rogers TR, Roycroft E, Smith EG, Supply P, Tang P, Wilcox MH, Wordsworth S, Wyllie D, Xu L, Crook DW, COMPASS-TB Study Group. 2016. Rapid, comprehensive, and affordable mycobacterial diagnosis with whole-genome sequencing: a prospective study. Lancet Respir Med, 4 (1), pp. 49-58. Read abstract | Read more

BACKGROUND: Slow and cumbersome laboratory diagnostics for Mycobacterium tuberculosis complex (MTBC) risk delayed treatment and poor patient outcomes. Whole-genome sequencing (WGS) could potentially provide a rapid and comprehensive diagnostic solution. In this prospective study, we compare real-time WGS with routine MTBC diagnostic workflows. METHODS: We compared sequencing mycobacteria from all newly positive liquid cultures with routine laboratory diagnostic workflows across eight laboratories in Europe and North America for diagnostic accuracy, processing times, and cost between Sept 6, 2013, and April 14, 2014. We sequenced specimens once using local Illumina MiSeq platforms and processed data centrally using a semi-automated bioinformatics pipeline. We identified species or complex using gene presence or absence, predicted drug susceptibilities from resistance-conferring mutations identified from reference-mapped MTBC genomes, and calculated genetic distance to previously sequenced UK MTBC isolates to detect outbreaks. WGS data processing and analysis was done by staff masked to routine reference laboratory and clinical results. We also did a microcosting analysis to assess the financial viability of WGS-based diagnostics. FINDINGS: Compared with routine results, WGS predicted species with 93% (95% CI 90-96; 322 of 345 specimens; 356 mycobacteria specimens submitted) accuracy and drug susceptibility also with 93% (91-95; 628 of 672 specimens; 168 MTBC specimens identified) accuracy, with one sequencing attempt. WGS linked 15 (16% [95% CI 10-26]) of 91 UK patients to an outbreak. WGS diagnosed a case of multidrug-resistant tuberculosis before routine diagnosis was completed and discovered a new multidrug-resistant tuberculosis cluster. Full WGS diagnostics could be generated in a median of 9 days (IQR 6-10), a median of 21 days (IQR 14-32) faster than final reference laboratory reports were produced (median of 31 days [IQR 21-44]), at a cost of £481 per culture-positive specimen, whereas routine diagnosis costs £518, equating to a WGS-based diagnosis cost that is 7% cheaper annually than are present diagnostic workflows. INTERPRETATION: We have shown that WGS has a scalable, rapid turnaround, and is a financially feasible method for full MTBC diagnostics. Continued improvements to mycobacterial processing, bioinformatics, and analysis will improve the accuracy, speed, and scope of WGS-based diagnosis. FUNDING: National Institute for Health Research, Department of Health, Wellcome Trust, British Colombia Centre for Disease Control Foundation for Population and Public Health, Department of Clinical Microbiology, Trinity College Dublin. Hide abstract

Walker TM, Kohl TA, Omar SV, Hedge J, Del Ojo Elias C, Bradley P, Iqbal Z, Feuerriegel S, Niehaus KE, Wilson DJ, Clifton DA, Kapatai G, Ip CL, Bowden R, Drobniewski FA, Allix-Béguec C, Gaudin C, Parkhill J, Diel R, Supply P, Crook DW, Smith EG, Walker AS, Ismail N, Niemann S, Peto TE, Modernizing Medical Microbiology (MMM) Informatics Group. 2015. Whole-genome sequencing for prediction of Mycobacterium tuberculosis drug susceptibility and resistance: a retrospective cohort study. Lancet Infect Dis, 15 (10), pp. 1193-202. Read abstract | Read more

BACKGROUND: Diagnosing drug-resistance remains an obstacle to the elimination of tuberculosis. Phenotypic drug-susceptibility testing is slow and expensive, and commercial genotypic assays screen only common resistance-determining mutations. We used whole-genome sequencing to characterise common and rare mutations predicting drug resistance, or consistency with susceptibility, for all first-line and second-line drugs for tuberculosis. METHODS: Between Sept 1, 2010, and Dec 1, 2013, we sequenced a training set of 2099 Mycobacterium tuberculosis genomes. For 23 candidate genes identified from the drug-resistance scientific literature, we algorithmically characterised genetic mutations as not conferring resistance (benign), resistance determinants, or uncharacterised. We then assessed the ability of these characterisations to predict phenotypic drug-susceptibility testing for an independent validation set of 1552 genomes. We sought mutations under similar selection pressure to those characterised as resistance determinants outside candidate genes to account for residual phenotypic resistance. FINDINGS: We characterised 120 training-set mutations as resistance determining, and 772 as benign. With these mutations, we could predict 89·2% of the validation-set phenotypes with a mean 92·3% sensitivity (95% CI 90·7-93·7) and 98·4% specificity (98·1-98·7). 10·8% of validation-set phenotypes could not be predicted because uncharacterised mutations were present. With an in-silico comparison, characterised resistance determinants had higher sensitivity than the mutations from three line-probe assays (85·1% vs 81·6%). No additional resistance determinants were identified among mutations under selection pressure in non-candidate genes. INTERPRETATION: A broad catalogue of genetic mutations enable data from whole-genome sequencing to be used clinically to predict drug resistance, drug susceptibility, or to identify drug phenotypes that cannot yet be genetically predicted. This approach could be integrated into routine diagnostic workflows, phasing out phenotypic drug-susceptibility testing while reporting drug resistance early. FUNDING: Wellcome Trust, National Institute of Health Research, Medical Research Council, and the European Union. Hide abstract

Earle SG, Wu CH, Charlesworth J, Stoesser N, Gordon NC, Walker TM, Spencer CC, Iqbal Z, Clifton DA, Hopkins KL, Woodford N, Smith EG, Ismail N, Llewelyn MJ, Peto TE, Crook DW, McVean G, Walker AS, Wilson DJ. 2016. Identifying lineage effects when controlling for population structure improves power in bacterial association studies. Nat Microbiol, 1 pp. 16041. Read abstract | Read more

Bacteria pose unique challenges for genome-wide association studies because of strong structuring into distinct strains and substantial linkage disequilibrium across the genome(1,2). Although methods developed for human studies can correct for strain structure(3,4), this risks considerable loss-of-power because genetic differences between strains often contribute substantial phenotypic variability(5). Here, we propose a new method that captures lineage-level associations even when locus-specific associations cannot be fine-mapped. We demonstrate its ability to detect genes and genetic variants underlying resistance to 17 antimicrobials in 3,144 isolates from four taxonomically diverse clonal and recombining bacteria: Mycobacterium tuberculosis, Staphylococcus aureus, Escherichia coli and Klebsiella pneumoniae. Strong selection, recombination and penetrance confer high power to recover known antimicrobial resistance mechanisms and reveal a candidate association between the outer membrane porin nmpC and cefazolin resistance in E. coli. Hence, our method pinpoints locus-specific effects where possible and boosts power by detecting lineage-level differences when fine-mapping is intractable. Hide abstract

Bradley P, Gordon NC, Walker TM, Dunn L, Heys S, Huang B, Earle S, Pankhurst LJ, Anson L, de Cesare M, Piazza P, Votintseva AA, Golubchik T, Wilson DJ, Wyllie DH, Diel R, Niemann S, Feuerriegel S, Kohl TA, Ismail N, Omar SV, Smith EG, Buck D, McVean G, Walker AS, Peto TE, Crook DW, Iqbal Z. 2015. Rapid antibiotic-resistance predictions from genome sequence data for Staphylococcus aureus and Mycobacterium tuberculosis. Nat Commun, 6 pp. 10063. Read abstract | Read more

The rise of antibiotic-resistant bacteria has led to an urgent need for rapid detection of drug resistance in clinical samples, and improvements in global surveillance. Here we show how de Bruijn graph representation of bacterial diversity can be used to identify species and resistance profiles of clinical isolates. We implement this method for Staphylococcus aureus and Mycobacterium tuberculosis in a software package ('Mykrobe predictor') that takes raw sequence data as input, and generates a clinician-friendly report within 3 minutes on a laptop. For S. aureus, the error rates of our method are comparable to gold-standard phenotypic methods, with sensitivity/specificity of 99.1%/99.6% across 12 antibiotics (using an independent validation set, n=470). For M. tuberculosis, our method predicts resistance with sensitivity/specificity of 82.6%/98.5% (independent validation set, n=1,609); sensitivity is lower here, probably because of limited understanding of the underlying genetic mechanisms. We give evidence that minor alleles improve detection of extremely drug-resistant strains, and demonstrate feasibility of the use of emerging single-molecule nanopore sequencing techniques for these purposes. Hide abstract

Votintseva AA, Pankhurst LJ, Anson LW, Morgan MR, Gascoyne-Binzi D, Walker TM, Quan TP, Wyllie DH, Del Ojo Elias C, Wilcox M, Walker AS, Peto TE, Crook DW. 2015. Mycobacterial DNA extraction for whole-genome sequencing from early positive liquid (MGIT) cultures. J. Clin. Microbiol., 53 (4), pp. 1137-43. Read abstract | Read more

We developed a low-cost and reliable method of DNA extraction from as little as 1 ml of early positive mycobacterial growth indicator tube (MGIT) cultures that is suitable for whole-genome sequencing to identify mycobacterial species and predict antibiotic resistance in clinical samples. The DNA extraction method is based on ethanol precipitation supplemented by pretreatment steps with a MolYsis kit or saline wash for the removal of human DNA and a final DNA cleanup step with solid-phase reversible immobilization beads. The protocol yielded ≥0.2 ng/μl of DNA for 90% (MolYsis kit) and 83% (saline wash) of positive MGIT cultures. A total of 144 (94%) of the 154 samples sequenced on the MiSeq platform (Illumina) achieved the target of 1 million reads, with <5% of reads derived from human or nasopharyngeal flora for 88% and 91% of samples, respectively. A total of 59 (98%) of 60 samples that were identified by the national mycobacterial reference laboratory (NMRL) as Mycobacterium tuberculosis were successfully mapped to the H37Rv reference, with >90% coverage achieved. The DNA extraction protocol, therefore, will facilitate fast and accurate identification of mycobacterial species and resistance using a range of bioinformatics tools. Hide abstract

Walker TM, Ip CL, Harrell RH, Evans JT, Kapatai G, Dedicoat MJ, Eyre DW, Wilson DJ, Hawkey PM, Crook DW, Parkhill J, Harris D, Walker AS, Bowden R, Monk P, Smith EG, Peto TE. 2013. Whole-genome sequencing to delineate Mycobacterium tuberculosis outbreaks: a retrospective observational study. Lancet Infect Dis, 13 (2), pp. 137-46. Read abstract | Read more

BACKGROUND: Tuberculosis incidence in the UK has risen in the past decade. Disease control depends on epidemiological data, which can be difficult to obtain. Whole-genome sequencing can detect microevolution within Mycobacterium tuberculosis strains. We aimed to estimate the genetic diversity of related M tuberculosis strains in the UK Midlands and to investigate how this measurement might be used to investigate community outbreaks. METHODS: In a retrospective observational study, we used Illumina technology to sequence M tuberculosis genomes from an archive of frozen cultures. We characterised isolates into four groups: cross-sectional, longitudinal, household, and community. We measured pairwise nucleotide differences within hosts and between hosts in household outbreaks and estimated the rate of change in DNA sequences. We used the findings to interpret network diagrams constructed from 11 community clusters derived from mycobacterial interspersed repetitive-unit-variable-number tandem-repeat data. FINDINGS: We sequenced 390 separate isolates from 254 patients, including representatives from all five major lineages of M tuberculosis. The estimated rate of change in DNA sequences was 0.5 single nucleotide polymorphisms (SNPs) per genome per year (95% CI 0.3-0.7) in longitudinal isolates from 30 individuals and 25 families. Divergence is rarely higher than five SNPs in 3 years. 109 (96%) of 114 paired isolates from individuals and households differed by five or fewer SNPs. More than five SNPs separated isolates from none of 69 epidemiologically linked patients, two (15%) of 13 possibly linked patients, and 13 (17%) of 75 epidemiologically unlinked patients (three-way comparison exact p<0.0001). Genetic trees and clinical and epidemiological data suggest that super-spreaders were present in two community clusters. INTERPRETATION: Whole-genome sequencing can delineate outbreaks of tuberculosis and allows inference about direction of transmission between cases. The technique could identify super-spreaders and predict the existence of undiagnosed cases, potentially leading to early treatment of infectious patients and their contacts. FUNDING: Medical Research Council, Wellcome Trust, National Institute for Health Research, and the Health Protection Agency. Hide abstract

Walker TM, Lalor MK, Broda A, Saldana Ortega L, Morgan M, Parker L, Churchill S, Bennett K, Golubchik T, Giess AP, Del Ojo Elias C, Jeffery KJ, Bowler IC, Laurenson IF, Barrett A, Drobniewski F, McCarthy ND, Anderson LF, Abubakar I, Thomas HL, Monk P, Smith EG, Walker AS, Crook DW, Peto TE, Conlon CP. 2014. Assessment of Mycobacterium tuberculosis transmission in Oxfordshire, UK, 2007-12, with whole pathogen genome sequences: an observational study. Lancet Respir Med, 2 (4), pp. 285-92. Read abstract | Read more

BACKGROUND: Patients born outside the UK have contributed to a 20% rise in the UK's tuberculosis incidence since 2000, but their effect on domestic transmission is not known. Here we use whole-genome sequencing to investigate the epidemiology of tuberculosis transmission in an unselected population over 6 years. METHODS: We identified all residents with Oxfordshire postcodes with a Mycobacterium tuberculosis culture or a clinical diagnosis of tuberculosis between Jan 1, 2007, and Dec 31, 2012, using local databases and checking against the national Enhanced Tuberculosis Surveillance database. We used Illumina technology to sequence all available M tuberculosis cultures from identified cases. Sequences were clustered by genetic relatedness and compared retrospectively with contact investigations. The first patient diagnosed in each cluster was defined as the index case, with links to subsequent cases assigned first by use of any epidemiological linkage, then by genetic distance, and then by timing of diagnosis. FINDINGS: Although we identified 384 patients with a diagnosis of tuberculosis, country of birth was known for 380 and we sequenced isolates from 247 of 269 cases with culture-confirmed disease. 39 cases were genomically linked within 13 clusters, implying 26 local transmission events. Only 11 of 26 possible transmissions had been previously identified through contact tracing. Of seven genomically confirmed household clusters, five contained additional genomic links to epidemiologically unidentified non-household members. 255 (67%) patients were born in a country with high tuberculosis incidence, conferring a local incidence of 109 cases per 100,000 population per year in Oxfordshire, compared with 3·5 cases per 100,000 per year for those born in low-incidence countries. However, patients born in the low-incidence countries, predominantly UK, were more likely to have pulmonary disease (adjusted odds ratio 1·8 [95% CI 1·2-2·9]; p=0·009), social risk factors (4·4 [2·0-9·4]; p<0·0001), and be part of a local transmission cluster (4·8 [1·6-14·8]; p=0·006). INTERPRETATION: Although inward migration has contributed to the overall tuberculosis incidence, our findings suggest that most patients born in high-incidence countries reactivate latent infection acquired abroad and are not involved in local onward transmission. Systematic screening of new entrants could further improve tuberculosis control, but it is important that health care remains accessible to all individuals, especially high-risk groups, if tuberculosis control is not to be jeopardised. FUNDING: UK Clinical Research Collaboration (Wellcome Trust, Medical Research Council, National Institute for Health Research [NIHR]), and NIHR Oxford Biomedical Research Centre. Hide abstract

Walker TM, Monk P, Smith EG, Peto TE. 2013. Contact investigations for outbreaks of Mycobacterium tuberculosis: advances through whole genome sequencing. Clin. Microbiol. Infect., 19 (9), pp. 796-802. Read abstract | Read more

The control of tuberculosis depends on the identification and treatment of infectious patients and their contacts, who are currently identified through a combined approach of genotyping and epidemiological investigation. However, epidemiological data are often challenging to obtain, and genotyping data are difficult to interpret without them. Whole genome sequencing (WGS) technology is increasingly affordable, and offers the prospect of identifying plausible transmission events between patients without prior recourse to epidemiological data. We discuss the current approaches to tuberculosis control, and how WGS might advance public health efforts in the future. Hide abstract