Improving viral genomic data analyses to gain new insights into HIV transmission and evolution

Project Overview

Huge progress has been made in the prevention of HIV/AIDS. Nonetheless, rates of new infection remain stubbornly high, and so there remains an urgent need for new science, tools and technologies. Next-generation sequencing (NGS) has revolutionised disease surveillance, resulting in large databases of viral genetic diversity collected from the most affected populations. There are challenges unique to HIV in interpreting NGS data. Each infection consists of a swarm of related viruses: NGS characterises this swarm through millions of small fragments of thousands of closely related viral genomes. The primary aim of this DPhil will be to develop novel bioinformatic and statistical methods to improve the interpretation of these data. The student will have access to three large databases of HIV NGS data generated in projects led by the Fraser group (N>25,000 samples). The new methods will build on and improve existing algorithms for interpreting within host diversity, see for example references [1,2]. The ultimate aim will be to reconstruct whole genome haplotypes and associated recombination events.  There will be extensive opportunities to link the findings of the project to clinical and epidemiological data, and to interact with wide multidisciplinary teams. Findings (data and software) will be shared with the consortia and beyond, so that the public health utility of the work can be maximised.

  1. Wymant C et al. PHYLOSCANNER: Inferring Transmission from Within- and Between-Host Pathogen Genetic Diversity. Mol Biol Evol. 2017 Available from:
  2. Palmer DS et al. Mapping the drivers of within-host pathogen evolution using massive data sets. bioRxiv. 2017. Available from:

Training Opportunities

The primary research area this project will be in bioinformatics combined with statistical genetics. The project will be co-supervised by Prof Christophe Fraser, Prof Gil McVean and Dr Tanya Golubchik, providing a broad expertise in the core topics of the DPhil. The student will be based in the new Big Data Institute, and will have access to the new training in data science that will be offered as part of the creation of a new Doctoral Training Centre. The student will also be eligible to attend relevant courses in other departments, e.g. statistics, zoology. The student will be expected to interact regularly with postdoctoral scientists  in the Fraser and McVean groups, and with the members of wider consortia that generated the data. The student will be exposed to a multidisciplinary environment including epidemiology, public health, clinical science, phylodynamics and mathematical modelling. As a result, the student will acquire key communication skills needed for working in a multidisciplinary data science project.


Genetics & Genomics and Immunology & Infectious Disease


Project reference number: 1011

Funding and admissions information


Name Department Institution Country Email
Professor Christophe Fraser Big Data Institute Oxford University, Henry Wellcome Building of Genomic Medicine GBR
Professor Gil McVean FRS FMedSci Big Data Institute Oxford University, Henry Wellcome Building of Genomic Medicine GBR

There are no publications listed for this DPhil project.