Multi Locus View : An Extensible Web Based Tool for the Analysis of Genomic Data
Sergeant MJ., Hughes JR., Hentges L., Lunter G., Downes DJ., Taylor S.
AbstractMotivationTracking and understanding data quality, analysis and reproducibility are critical concerns in the biological sciences. This is especially true in genomics where Next Generation Sequencing (NGS) based technologies such as ChIP-seq, RNA-seq and ATAC-seq are generating a flood of genome-scale data. These data-types are extremely high level and complex with single experiments capable of mapping ten to hundreds of thousands of biologically meaningful events across the genome. However, such data are usually processed with automated tools and pipelines, generating tabular outputs and static visualizations. These are difficult to interact with and require substantial bioinformatic skills to manipulate and query. Similarly, interpretation is normally made at a high level without the ability to visualise the underlying data in detail and so the complexity and quality of the real underlying biological signal is lost. Also genomics datasets require integration with other genomics datasets to be properly interpreted and this integration with multiple tracks again requires substantial bioinformatics skills and is difficult to visualise across multiple pertinent datasets. Conventional genome browsers do allow for the detailed visualisation of multiple tracks but are limited to browsing single locations and do not allow for interactions with the dataset as a whole. MLV has been developed to allow users to fluidly interact with genomics datasets at multiple scales, from complete metadata labelled and clustered populations to detailed representations of individual elements. It has inbuilt tools to integrate signals across multiple datasets and to perform dimensionality reduction and clustering analysis based on the extracted signal, allowing for the high-level analysis of complex datasets while maintaining visualisation of the fine grain structure of the data. MLV’s ability to visualise clustering within the data combined with efficient tools for large-scale tagging of individual elements makes it a unique tool for the generation of annotated datasets for modern machine learning approaches.ResultsMulti Locus View (MLV) is a web based tool for the visualisation, analysis and annotation of Next Generation Sequencing data sets. The user is able to browse the raw data, cluster, and combine the data with other analysis. Intuitive filtering and visualisation then enables the user to quickly locate and annotate regions of interest. User datasets can then be shared with other users or made public for quick assessment from the academic community. MLV is publically available at https://mlv.molbiol.ox.ac.uk and the source code is available at https://github.com/Hughes-Genome-Group/mlv