Using minor variant genomes and machine learning to study the genome biology of SARS-CoV-2 over time.

Dong X., Matthews DA., Gallo G., Darby A., Donovan-Banfield I., Goldswain H., MacGill T., Myers T., Orr R., Bailey D., Carroll MW., Hiscox JA.

In infected individuals, viruses are present as a population consisting of dominant and minor variant genomes. Most databases contain information on the dominant genome sequence. Since the emergence of SARS-CoV-2 in late 2019, variants have been selected that are more transmissible and capable of partial immune escape. Currently, models for projecting the evolution of SARS-CoV-2 are based on using dominant genome sequences to forecast whether a known mutation will be prevalent in the future. However, novel variants of SARS-CoV-2 (and other viruses) are driven by evolutionary pressure acting on minor variant genomes, which then become dominant and form a potential next wave of infection. In this study, sequencing data from 96 209 patients, sampled over a 3-year period, were used to analyse patterns of minor variant genomes. These data were used to develop unsupervised machine learning clusters to identify amino acids that had a greater potential for mutation than others in the Spike protein. Being able to identify amino acids that may be present in future variants would better inform the design of longer-lived medical countermeasures and allow a risk-based evaluation of viral properties, including assessment of transmissibility and immune escape, thus providing candidates with early warning signals for when a new variant of SARS-CoV-2 emerges.

More information Original publication

DOI

10.1093/nar/gkaf077

Type

Journal article

Publication Date

2025-02-01T00:00:00+00:00

Volume

Addresses

I, n, s, t, i, t, u, t, e, , o, f, , I, n, f, e, c, t, i, o, n, ,, , V, e, t, e, r, i, n, a, r, y, , a, n, d, , E, c, o, l, o, g, i, c, a, l, , S, c, i, e, n, c, e, s, ,, , F, a, c, u, l, t, y, , o, f, , H, e, a, l, t, h, , a, n, d, , L, i, f, e, , S, c, i, e, n, c, e, s, ,, , U, n, i, v, e, r, s, i, t, y, , o, f, , L, i, v, e, r, p, o, o, l, ,, , L, i, v, e, r, p, o, o, l, ,, , L, 3, , 5, R, F, ,, , U, n, i, t, e, d, , K, i, n, g, d, o, m, .

Keywords

Humans, Evolution, Molecular, Mutation, Genome, Viral, Spike Glycoprotein, Coronavirus, Machine Learning, COVID-19, SARS-CoV-2

Cookies on this website