A comprehensive genomics solution for HIV surveillance and clinical monitoring in a global health setting
Bonsall D., Golubchik T., de Cesare M., Limbada M., Kosloff B., MacIntyre-Cockett G., Hall M., Wymant C., Ansari A., Abeler-Dorner L., Schaap A., Brown A., Barnes E., Piwowar-Manning E., Wilson E., Emel L., Hayes R., Fidler S., Ayles H., Bowden R., Fraser C.
High-throughput viral genetic sequencing is needed to monitor the spread of drug resistance, direct optimal antiretroviral regimes, and to identify transmission dynamics in generalised HIV epidemics. Public health efforts to sequence HIV genomes at scale face three major technical challenges: (i) minimising assay cost and protocol complexity, (ii) maximising sensitivity, and (iii) recovering accurate and unbiased sequences of both the genome consensus and the within-host viral diversity. Here we present a novel, high-throughput, virus-enriched sequencing method and computational pipeline tailored specifically to HIV (veSEQ-HIV), which addresses all three technical challenges, and can be used directly on leftover blood drawn for routine CD4 testing. We demonstrate its performance on 1,620 plasma samples collected from consenting individuals attending 10 large urban clinics in Zambia, partners of HPTN 071 (PopART). We show that veSEQ-HIV consistently recovers complete HIV genomes from the majority of samples of different subtypes, and is also quantitative: the number of HIV reads per sample obtained by veSEQ-HIV estimates viral load without the need for additional testing. Both quantitativity and sensitivity were assessed on a subset of 126 samples with clinically measured viral loads, and with standardized quantification controls (VL 100 - 5,000,000 RNA copies/ml). Complete HIV genomes were recovered from 93% (85/91) of samples when viral load was over 1,000 copies per ml. The quantitative nature of the assay implies that variant frequencies estimated with veSEQ-HIV are representative of true variant frequencies in the sample. Detection of minority variants can be exploited for epidemiological analysis of transmission and drug resistance, and we show how the information contained in individual reads of a veSEQ-HIV sample can be used to detect linkage between multiple mutations associated with resistance to antiretroviral therapy. Less than 2% of reads obtained by veSEQ-HIV were identified as in silico contamination events using updates to the phyloscanner software (phyloscanner clean) that we show to be 95% sensitive and 99% specific at 'decontaminating' NGS data. The cost of the assay - approximately 45 USD per sample - compares favourably with existing VL and HIV genotyping tests, and provides the additional value of viral load quantification and inference of drug resistance with a single test. veSEQ-HIV is well suited to large public health efforts and is being applied to all ~9000 samples collected for the HPTN 071-2 (PopART Phylogenetics) study. * David Bonsall and Tanya Golubchik contributed equally.