A clash of ideas - the varying uses of the 'species' term in virology and their utility for classifying viruses in metagenomic datasets.
Species definitions of viruses are frequently descriptive, with assignments often being based on their disease manifestations, host range, geographical distribution and transmission routes. This method of categorizing viruses has recently been challenged by technology advances, such as high-throughput sequencing. These have dramatically increased knowledge of viral diversity in the wider environment that dwarfs the current catalogue of viruses classified by the International Committee for the Taxonomy of Viruses (ICTV). However, because such viruses are known only from their sequences without phenotypic information, it is unclear how they might be classified consistently with much of the existing taxonomy framework. This difficulty exposes deeper incompatibilities in how species are conceptualized. The original species assignments based on disease or other biological attributes were primarily descriptive, similar to principles used elsewhere in biology for species taxonomies. In contrast, purely sequence-based classifications rely on genetic metrics such as divergence thresholds that include or exclude viruses in individual species categories. These different approaches bring different preconceptions about the nature of a virus species, the former being more easily conceptualized as a category with a part/whole relationship of individuals and species, while species defined by divergence thresholds or other genetic metrics are essentially logically defined groups with specific inclusion and exclusion criteria. While descriptive species definitions match our intuitive division of viruses into natural kinds, rules-based genetic classifications are required for viruses known from sequence alone, whose incorporation into the ICTV taxonomy is essential if it is to represent the true diversity of viruses in nature.