Fast randomization of large genomic datasets while preserving alteration counts.

Gobbi A.; Iorio F.; Dawson KJ.; Wedge DC.; Tamborero D.; Alexandrov LB.; Lopez-Bigas N.; Garnett MJ.; Jurman G.; Saez-Rodriguez J.

Fast randomization of large genomic datasets while preserving alteration counts.

Gobbi A., Iorio F., Dawson KJ., Wedge DC., Tamborero D., Alexandrov LB., Lopez-Bigas N., Garnett MJ., Jurman G., Saez-Rodriguez J.

MOTIVATION: Studying combinatorial patterns in cancer genomic datasets has recently emerged as a tool for identifying novel cancer driver networks. Approaches have been devised to quantify, for example, the tendency of a set of genes to be mutated in a 'mutually exclusive' manner. The significance of the proposed metrics is usually evaluated by computing P-values under appropriate null models. To this end, a Monte Carlo method (the switching-algorithm) is used to sample simulated datasets under a null model that preserves patient- and gene-wise mutation rates. In this method, a genomic dataset is represented as a bipartite network, to which Markov chain updates (switching-steps) are applied. These steps modify the network topology, and a minimal number of them must be executed to draw simulated datasets independently under the null model. This number has previously been deducted empirically to be a linear function of the total number of variants, making this process computationally expensive. RESULTS: We present a novel approximate lower bound for the number of switching-steps, derived analytically. Additionally, we have developed the R package BiRewire, including new efficient implementations of the switching-algorithm. We illustrate the performances of BiRewire by applying it to large real cancer genomics datasets. We report vast reductions in time requirement, with respect to existing implementations/bounds and equivalent P-value computations. Thus, we propose BiRewire to study statistical properties in genomic datasets, and other data that can be modeled as bipartite networks. AVAILABILITY AND IMPLEMENTATION: BiRewire is available on BioConductor at http://www.bioconductor.org/packages/2.13/bioc/html/BiRewire.html. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Original publication

DOI

10.1093/bioinformatics/btu474

Type

Journal article

Journal

Bioinformatics

Publication Date

01/09/2014

Volume

Pages

i617 - i623

Keywords

Algorithms, Genomics, Humans, Markov Chains, Monte Carlo Method, Neoplasms, Random Allocation, Software

Cookies on this website

Fast randomization of large genomic datasets while preserving alteration counts.

Gobbi A., Iorio F., Dawson KJ., Wedge DC., Tamborero D., Alexandrov LB., Lopez-Bigas N., Garnett MJ., Jurman G., Saez-Rodriguez J.

DOI

Type

Journal

Publication Date

Volume

Pages

Keywords