register interest

Professor Christopher Yau

Research Area: Bioinformatics & Stats (inc. Modelling and Computational Biology)
Technology Exchange: Bioinformatics and Statistical genetics
Scientific Themes: Genetics & Genomics and Cancer Biology
Keywords: Statistics, Computational Biology, Cancer and Bioinformatics
Web Links:

The focus of my group is the development of computational statistical methods for applications in genetics and genomics. The main areas of work include:

  • Translational Computational Oncogenomics. The development of cutting edge computational statistical methods and tools that can be widely used by specialists and non-specialists alike for research and clinical practice in cancer.
  • Single Cell Informatics. Developing novel statistical techniques for single cell genomics.
  • Data-driven Statistics. Developing novel statistical ideas and generic techniques that are inspired by real data analysis problems in genetics and genomics. Developing computational techniques formulated on sound statistical principals for the the analysis of very large datasets commonly found in genomics (Big Data) and learning biologically relevant features (Deep or Representation Learning).

Name Department Institution Country
Professor Chris Holmes Wellcome Trust Centre for Human Genetics Oxford University, Henry Wellcome Building of Genomic Medicine United Kingdom
Professor Ian Tomlinson Wellcome Trust Centre for Human Genetics Oxford University, Henry Wellcome Building of Genomic Medicine United Kingdom
Dr Jean-Baptiste Cazier Wellcome Trust Centre for Human Genetics Oxford University, Henry Wellcome Building of Genomic Medicine United Kingdom
Prof Ahmed Ashour Ahmed (RDM) Weatherall Institute of Molecular Medicine Oxford University, Weatherall Institute of Molecular Medicine United Kingdom
Dr Samantha JL Knight Wellcome Trust Centre for Human Genetics Oxford University, Henry Wellcome Building of Genomic Medicine United Kingdom
Professor Mark McCarthy Oxford University, Oxford Centre for Diabetes, Endocrinology & Metabolism United Kingdom
Prof Anna L Gloyn (RDM) OCDEM Oxford University, Oxford Centre for Diabetes, Endocrinology & Metabolism United Kingdom
Prof Adam Mead MRCP FRCPath (RDM) Weatherall Institute of Molecular Medicine Oxford University, Weatherall Institute of Molecular Medicine United Kingdom
Dr Caleb Webber Department of Physiology, Anatomy and Genetics University of Oxford United Kingdom
Dr Michalis Titsias Department of Informatics Athens University of Economics and Business Greece
Dr Richard Savage Systems Biology Centre University of Warwick United Kingdom
Campbell KR, Yau C. 2017. Probabilistic modeling of bifurcations in single-cell gene expression data using a Bayesian mixture of factor analyzers. Wellcome Open Res, 2 pp. 19. | Show Abstract | Read more

Modeling bifurcations in single-cell transcriptomics data has become an increasingly popular field of research. Several methods have been proposed to infer bifurcation structure from such data, but all rely on heuristic non-probabilistic inference. Here we propose the first generative, fully probabilistic model for such inference based on a Bayesian hierarchical mixture of factor analyzers. Our model exhibits competitive performance on large datasets despite implementing full Markov-Chain Monte Carlo sampling, and its unique hierarchical prior structure enables automatic determination of genes driving the bifurcation process. We additionally propose an Empirical-Bayes like extension that deals with the high levels of zero-inflation in single-cell RNA-seq data and quantify when such models are useful. We apply or model to both real and simulated single-cell gene expression data and compare the results to existing pseudotime methods. Finally, we discuss both the merits and weaknesses of such a unified, probabilistic approach in the context practical bioinformatics analyses.

Campbell KR, Yau C. 2017. switchde: inference of switch-like differential expression along single-cell trajectories. Bioinformatics, 33 (8), pp. 1241-1242. | Show Abstract | Read more

Motivation: Pseudotime analyses of single-cell RNA-seq data have become increasingly common. Typically, a latent trajectory corresponding to a biological process of interest-such as differentiation or cell cycle-is discovered. However, relatively little attention has been paid to modelling the differential expression of genes along such trajectories. Results: We present switchde , a statistical framework and accompanying R package for identifying switch-like differential expression of genes along pseudotemporal trajectories. Our method includes fast model fitting that provides interpretable parameter estimates corresponding to how quickly a gene is up or down regulated as well as where in the trajectory such regulation occurs. It also reports a P -value in favour of rejecting a constant-expression model for switch-like differential expression and optionally models the zero-inflation prevalent in single-cell data. Availability and Implementation: The R package switchde is available through the Bioconductor project at https://bioconductor.org/packages/switchde . Contact: kieran.campbell@sjc.ox.ac.uk. Supplementary information: Supplementary data are available at Bioinformatics online.

Sahasrabudhe R, Lott P, Bohorquez M, Toal T, Estrada AP, Suarez JJ, Brea-Fernández A, Cameselle-Teijeiro J, Pinto C, Ramos I et al. 2017. Germline Mutations in PALB2, BRCA1, and RAD51C, Which Regulate DNA Recombination Repair, in Patients With Gastric Cancer. Gastroenterology, 152 (5), pp. 983-986.e6. | Show Abstract | Read more

Up to 10% of cases of gastric cancer are familial, but so far, only mutations in CDH1 have been associated with gastric cancer risk. To identify genetic variants that affect risk for gastric cancer, we collected blood samples from 28 patients with hereditary diffuse gastric cancer (HDGC) not associated with mutations in CDH1 and performed whole-exome sequence analysis. We then analyzed sequences of candidate genes in 333 independent HDGC and non-HDGC cases. We identified 11 cases with mutations in PALB2, BRCA1, or RAD51C genes, which regulate homologous DNA recombination. We found these mutations in 2 of 31 patients with HDGC (6.5%) and 9 of 331 patients with sporadic gastric cancer (2.8%). Most of these mutations had been previously associated with other types of tumors and partially co-segregated with gastric cancer in our study. Tumors that developed in patients with these mutations had a mutation signature associated with somatic homologous recombination deficiency. Our findings indicate that defects in homologous recombination increase risk for gastric cancer.

Campbell KR, Yau C. 2016. Order Under Uncertainty: Robust Differential Expression Analysis Using Probabilistic Models for Pseudotime Inference. PLoS Comput Biol, 12 (11), pp. e1005212. | Show Abstract | Read more

Single cell gene expression profiling can be used to quantify transcriptional dynamics in temporal processes, such as cell differentiation, using computational methods to label each cell with a 'pseudotime' where true time series experimentation is too difficult to perform. However, owing to the high variability in gene expression between individual cells, there is an inherent uncertainty in the precise temporal ordering of the cells. Pre-existing methods for pseudotime estimation have predominantly given point estimates precluding a rigorous analysis of the implications of uncertainty. We use probabilistic modelling techniques to quantify pseudotime uncertainty and propagate this into downstream differential expression analysis. We demonstrate that reliance on a point estimate of pseudotime can lead to inflated false discovery rates and that probabilistic approaches provide greater robustness and measures of the temporal resolution that can be obtained from pseudotime inference.

Miranda F, Mannion D, Liu S, Zheng Y, Mangala LS, Redondo C, Herrero-Gonzalez S, Xu R, Taylor C, Chedom DF et al. 2016. Salt-Inducible Kinase 2 Couples Ovarian Cancer Cell Metabolism with Survival at the Adipocyte-Rich Metastatic Niche. Cancer Cell, 30 (2), pp. 273-289. | Show Abstract | Read more

The adipocyte-rich microenvironment forms a niche for ovarian cancer metastasis, but the mechanisms driving this process are incompletely understood. Here we show that salt-inducible kinase 2 (SIK2) is overexpressed in adipocyte-rich metastatic deposits compared with ovarian primary lesions. Overexpression of SIK2 in ovarian cancer cells promotes abdominal metastasis while SIK2 depletion prevents metastasis in vivo. Importantly, adipocytes induce calcium-dependent activation and autophosphorylation of SIK2. Activated SIK2 plays a dual role in augmenting AMPK-induced phosphorylation of acetyl-CoA carboxylase and in activating the PI3K/AKT pathway through p85α-S154 phosphorylation. These findings identify SIK2 at the apex of the adipocyte-induced signaling cascades in cancer cells and make a compelling case for targeting SIK2 for therapy in ovarian cancer.

Hellner K, Miranda F, Fotso Chedom D, Herrero-Gonzalez S, Hayden DM, Tearle R, Artibani M, KaramiNejadRanjbar M, Williams R, Gaitskell K et al. 2016. Premalignant SOX2 overexpression in the fallopian tubes of ovarian cancer patients: Discovery and validation studies. EBioMedicine, 10 pp. 137-149. | Show Abstract | Read more

Current screening methods for ovarian cancer can only detect advanced disease. Earlier detection has proved difficult because the molecular precursors involved in the natural history of the disease are unknown. To identify early driver mutations in ovarian cancer cells, we used dense whole genome sequencing of micrometastases and microscopic residual disease collected at three time points over three years from a single patient during treatment for high-grade serous ovarian cancer (HGSOC). The functional and clinical significance of the identified mutations was examined using a combination of population-based whole genome sequencing, targeted deep sequencing, multi-center analysis of protein expression, loss of function experiments in an in-vivo reporter assay and mammalian models, and gain of function experiments in primary cultured fallopian tube epithelial (FTE) cells. We identified frequent mutations involving a 40kb distal repressor region for the key stem cell differentiation gene SOX2. In the apparently normal FTE, the region was also mutated. This was associated with a profound increase in SOX2 expression (p<2(-16)), which was not found in patients without cancer (n=108). Importantly, we show that SOX2 overexpression in FTE is nearly ubiquitous in patients with HGSOCs (n=100), and common in BRCA1-BRCA2 mutation carriers (n=71) who underwent prophylactic salpingo-oophorectomy. We propose that the finding of SOX2 overexpression in FTE could be exploited to develop biomarkers for detecting disease at a premalignant stage, which would reduce mortality from this devastating disease.

Titsias MK, Holmes CC, Yau C. 2016. Statistical Inference in Hidden Markov Models Using k -Segment Constraints Journal of the American Statistical Association, 111 (513), pp. 200-215. | Read more

Žurauskienė J, Yau C. 2016. pcaReduce: hierarchical clustering of single cell transcriptional profiles. BMC Bioinformatics, 17 (1), pp. 140. | Show Abstract | Read more

BACKGROUND: Advances in single cell genomics provide a way of routinely generating transcriptomics data at the single cell level. A frequent requirement of single cell expression analysis is the identification of novel patterns of heterogeneity across single cells that might explain complex cellular states or tissue composition. To date, classical statistical analysis tools have being routinely applied, but there is considerable scope for the development of novel statistical approaches that are better adapted to the challenges of inferring cellular hierarchies. RESULTS: We have developed a novel agglomerative clustering method that we call pcaReduce to generate a cell state hierarchy where each cluster branch is associated with a principal component of variation that can be used to differentiate two cell states. Using two real single cell datasets, we compared our approach to other commonly used statistical techniques, such as K-means and hierarchical clustering. We found that pcaReduce was able to give more consistent clustering structures when compared to broad and detailed cell type labels. CONCLUSIONS: Our novel integration of principal components analysis and hierarchical clustering establishes a connection between the representation of the expression data and the number of cell types that can be discovered. In doing so we found that pcaReduce performs better than either technique in isolation in terms of characterising putative cell states. Our methodology is complimentary to other single cell clustering techniques and adds to a growing palette of single cell bioinformatics tools for profiling heterogeneous cell populations.

Pierson E, Yau C. 2015. ZIFA: Dimensionality reduction for zero-inflated single-cell gene expression analysis. Genome Biol, 16 (1), pp. 241. | Show Abstract | Read more

Single-cell RNA-seq data allows insight into normal cellular function and various disease states through molecular characterization of gene expression on the single cell level. Dimensionality reduction of such high-dimensional data sets is essential for visualization and analysis, but single-cell RNA-seq data are challenging for classical dimensionality-reduction methods because of the prevalence of dropout events, which lead to zero-inflated data. Here, we develop a dimensionality-reduction method, (Z)ero (I)nflated (F)actor (A)nalysis (ZIFA), which explicitly models the dropout characteristics, and show that it improves modeling accuracy on simulated and biological data sets.

Taylor JC, Martin HC, Lise S, Broxholme J, Cazier JB, Rimmer A, Kanapin A, Lunter G, Fiddy S, Allan C et al. 2015. Factors influencing success of clinical genome sequencing across a broad spectrum of disorders. Nat Genet, 47 (7), pp. 717-726. | Show Abstract | Read more

To assess factors influencing the success of whole-genome sequencing for mainstream clinical diagnosis, we sequenced 217 individuals from 156 independent cases or families across a broad spectrum of disorders in whom previous screening had identified no pathogenic variants. We quantified the number of candidate variants identified using different strategies for variant calling, filtering, annotation and prioritization. We found that jointly calling variants across samples, filtering against both local and external databases, deploying multiple annotation tools and using familial transmission above biological plausibility contributed to accuracy. Overall, we identified disease-causing variants in 21% of cases, with the proportion increasing to 34% (23/68) for mendelian disorders and 57% (8/14) in family trios. We also discovered 32 potentially clinically actionable variants in 18 genes unrelated to the referral disorder, although only 4 were ultimately considered reportable. Our results demonstrate the value of genome sequencing for routine clinical diagnosis but also highlight many outstanding challenges.

Knight SJL, Clifford R, Robbe P, Ramos SDC, Burns A, Timbs AT, Alsolami R, Weller S, Hamblin A, Mason J et al. 2014. The Identification of Further Minimal Regions of Overlap in Chronic Lymphocytic Leukemia Using High-Resolution SNP Arrays BLOOD, 124 (21),

Zhang X, Nott DJ, Yau C, Jasra A. 2014. A Sequential Algorithm for Fast Fitting of Dirichlet Process Mixture Models Journal of Computational and Graphical Statistics, 23 (4), pp. 1143-1162. | Read more

Petousi N, Copley RR, Lappin TR, Haggan SE, Bento CM, Cario H, Percy MJ, WGS Consortium, Ratcliffe PJ, Robbins PA, McMullin MF. 2014. Erythrocytosis associated with a novel missense mutation in the BPGM gene. Haematologica, 99 (10), pp. e201-e204. | Read more

Cazier JB, Rao SR, McLean CM, Walker AK, Wright BJ, Jaeger EE, Kartsonaki C, Marsden L, Yau C, Camps C et al. 2014. Whole-genome sequencing of bladder cancers reveals somatic CDKN1A mutations and clinicopathological associations with mutation burden. Nat Commun, 5 pp. 3756. | Show Abstract | Read more

Bladder cancers are a leading cause of death from malignancy. Molecular markers might predict disease progression and behaviour more accurately than the available prognostic factors. Here we use whole-genome sequencing to identify somatic mutations and chromosomal changes in 14 bladder cancers of different grades and stages. As well as detecting the known bladder cancer driver mutations, we report the identification of recurrent protein-inactivating mutations in CDKN1A and FAT1. The former are not mutually exclusive with TP53 mutations or MDM2 amplification, showing that CDKN1A dysfunction is not simply an alternative mechanism for p53 pathway inactivation. We find strong positive associations between higher tumour stage/grade and greater clonal diversity, the number of somatic mutations and the burden of copy number changes. In principle, the identification of sub-clones with greater diversity and/or mutation burden within early-stage or low-grade tumours could identify lesions with a high risk of invasive progression.

Cited:

48

Scopus

Mouradov D, Domingo E, Gibbs P, Jorissen RN, Li S, Soo PY, Lipton L, Desai J, Danielsen HE, Oukrif D et al. 2013. Survival in stage II/III colorectal cancer is independently predicted by chromosomal and microsatellite instability, but not by specific driver mutations American Journal of Gastroenterology, 108 (11), pp. 1785-1793. | Read more

Yau C. 2013. OncoSNP-SEQ: a statistical approach for the identification of somatic copy number alterations from next-generation sequencing of cancer genomes Bioinformatics, 29 (19), pp. 2482-2484. | Read more

Mouradov D, Domingo E, Gibbs P, Jorissen RN, Li S, Soo PY, Lipton L, Desai J, Danielsen HE, Oukrif D et al. 2013. Survival in stage II/III colorectal cancer is independently predicted by chromosomal and microsatellite instability, but not by specific driver mutations. Am J Gastroenterol, 108 (11), pp. 1785-1793. | Show Abstract | Read more

OBJECTIVES: Microsatellite instability (MSI) is an established marker of good prognosis in colorectal cancer (CRC). Chromosomal instability (CIN) is strongly negatively associated with MSI and has been shown to be a marker of poor prognosis in a small number of studies. However, a substantial group of "double-negative" (MSI-/CIN-) CRCs exists. The prognosis of these patients is unclear. Furthermore, MSI and CIN are each associated with specific molecular changes, such as mutations in KRAS and BRAF, that have been associated with prognosis. It is not known which of MSI, CIN, and the specific gene mutations are primary predictors of survival. METHODS: We evaluated the prognostic value (disease-free survival, DFS) of CIN, MSI, mutations in KRAS, NRAS, BRAF, PIK3CA, FBXW7, and TP53, and chromosome 18q loss-of-heterozygosity (LOH) in 822 patients from the VICTOR trial of stage II/III CRC. We followed up promising associations in an Australian community-based cohort (N=375). RESULTS: In the VICTOR patients, no specific mutation was associated with DFS, but individually MSI and CIN showed significant associations after adjusting for stage, age, gender, tumor location, and therapy. A combined analysis of the VICTOR and community-based cohorts showed that MSI and CIN were independent predictors of DFS (for MSI, hazard ratio (HR)=0.58, 95% confidence interval (CI) 0.36-0.93, and P=0.021; for CIN, HR=1.54, 95% CI 1.14-2.08, and P=0.005), and joint CIN/MSI testing significantly improved the prognostic prediction of MSI alone (P=0.028). Higher levels of CIN were monotonically associated with progressively poorer DFS, and a semi-quantitative measure of CIN was a better predictor of outcome than a simple CIN+/- variable. All measures of CIN predicted DFS better than the recently described Watanabe LOH ratio. CONCLUSIONS: MSI and CIN are independent predictors of DFS for stage II/III CRC. Prognostic molecular tests for CRC relapse should currently use MSI and a quantitative measure of CIN rather than specific gene mutations.

Yau C, Holmes CC. 2013. A DECISION-THEORETIC APPROACH FOR SEGMENTAL CLASSIFICATION ANNALS OF APPLIED STATISTICS, 7 (3), pp. 1814-1835. | Read more

Becker J, Yau C, Hancock JM, Holmes CC. 2013. NucleoFinder: a statistical approach for the detection of nucleosome positions. Bioinformatics, 29 (6), pp. 711-716. | Show Abstract | Read more

MOTIVATION: The identification of nucleosomes along the chromatin is key to understanding their role in the regulation of gene expression and other DNA-related processes. However, current experimental methods (MNase-ChIP, MNase-Seq) sample nucleosome positions from a cell population and contain biases, making thus the precise identification of individual nucleosomes not straightforward. Recent works have only focused on the first point, where noise reduction approaches have been developed to identify nucleosome positions. RESULTS: In this article, we propose a new approach, termed NucleoFinder, that addresses both the positional heterogeneity across cells and experimental biases by seeking nucleosomes consistently positioned in a cell population and showing a significant enrichment relative to a control sample. Despite the absence of validated dataset, we show that our approach (i) detects fewer false positives than two other nucleosome calling methods and (ii) identifies two important features of the nucleosome organization (the nucleosome spacing downstream of active promoters and the enrichment/depletion of GC/AT dinucleotides at the centre of in vitro nucleosomes) with equal or greater ability than the other two methods.

Sengupta N, Yau C, Sakthianandeswaren A, Mouradov D, Gibbs P, Suraweera N, Cazier JB, Polanco-Echeverry G, Ghosh A, Thaha M et al. 2013. Analysis of colorectal cancers in British Bangladeshi identifies early onset, frequent mucinous histotype and a high prevalence of RBFOX1 deletion. Mol Cancer, 12 (1), pp. 1. | Show Abstract | Read more

BACKGROUND: Prevalence of colorectal cancer (CRC) in the British Bangladeshi population (BAN) is low compared to British Caucasians (CAU). Genetic background may influence mutations and disease features. METHODS: We characterized the clinicopathological features of BAN CRCs and interrogated their genomes using mutation profiling and high-density single nucleotide polymorphism (SNP) arrays and compared findings to CAU CRCs. RESULTS: Age of onset of BAN CRC was significantly lower than for CAU patients (p=3.0 x 10-5) and this difference was not due to Lynch syndrome or the polyposis syndromes. KRAS mutations in BAN microsatellite stable (MSS) CRCs were comparatively rare (5.4%) compared to CAU MSS CRCs (25%; p=0.04), which correlates with the high percentage of mucinous histotype observed (31%) in the BAN samples. No BRAF mutations was seen in our BAN MSS CRCs (CAU CRCs, 12%; p=0.08). Array data revealed similar patterns of gains (chromosome 7 and 8q), losses (8p, 17p and 18q) and LOH (4q, 17p and 18q) in BAN and CAU CRCs. A small deletion on chromosome 16p13.2 involving the alternative splicing factor RBFOX1 only was found in significantly more BAN (50%) than CAU CRCs (15%) cases (p=0.04). Focal deletions targeting the 5' end of the gene were also identified. Novel RBFOX1 mutations were found in CRC cell lines and tumours; mRNA and protein expression was reduced in tumours. CONCLUSIONS: KRAS mutations were rare in BAN MSS CRC and a mucinous histotype common. Loss of RBFOX1 may explain the anomalous splicing activity associated with CRC.

Knight SJ, Yau C, Clifford R, Timbs AT, Sadighi Akha E, Dréau HM, Burns A, Ciria C, Oscier DG, Pettitt AR et al. 2012. Quantification of subclonal distributions of recurrent genomic aberrations in paired pre-treatment and relapse samples from patients with B-cell chronic lymphocytic leukemia. Leukemia, 26 (7), pp. 1564-1575. | Show Abstract | Read more

Genome-wide array approaches and sequencing analyses are powerful tools for identifying genetic aberrations in cancers, including leukemias and lymphomas. However, the clinical and biological significance of such aberrations and their subclonal distribution are poorly understood. Here, we present the first genome-wide array based study of pre-treatment and relapse samples from patients with B-cell chronic lymphocytic leukemia (B-CLL) that uses the computational statistical tool OncoSNP. We show that quantification of the proportion of copy number alterations (CNAs) and copy neutral loss of heterozygosity regions (cnLOHs) in each sample is feasible. Furthermore, we (i) reveal complex changes in the subclonal architecture of paired samples at relapse compared with pre-treatment, (ii) provide evidence supporting an association between increased genomic complexity and poor clinical outcome (iii) report previously undefined, recurrent CNA/cnLOH regions that expand or newly occur at relapse and therefore might harbor candidate driver genes of relapse and/or chemotherapy resistance. Our findings are likely to impact on future therapeutic strategies aimed towards selecting effective and individually tailored targeted therapies.

Yau C, Holmes C. 2011. Hierarchical Bayesian nonparametric mixture models for clustering with variable relevance determination. Bayesian Anal, 6 (2), pp. 329-352. | Show Abstract | Read more

We propose a hierarchical Bayesian nonparametric mixture model for clustering when some of the covariates are assumed to be of varying relevance to the clustering problem. This can be thought of as an issue in variable selection for unsupervised learning. We demonstrate that by defining a hierarchical population based nonparametric prior on the cluster locations scaled by the inverse covariance matrices of the likelihood we arrive at a 'sparsity prior' representation which admits a conditionally conjugate prior. This allows us to perform full Gibbs sampling to obtain posterior distributions over parameters of interest including an explicit measure of each covariate's relevance and a distribution over the number of potential clusters present in the data. This also allows for individual cluster specific variable selection. We demonstrate improved inference on a number of canonical problems.

Yau C, Papaspiliopoulos O, Roberts GO, Holmes C. 2011. Bayesian Nonparametric Hidden Markov Models with application to the analysis of copy-number-variation in mammalian genomes. J R Stat Soc Series B Stat Methodol, 73 (1), pp. 37-57. | Show Abstract | Read more

We consider the development of Bayesian Nonparametric methods for product partition models such as Hidden Markov Models and change point models. Our approach uses a Mixture of Dirichlet Process (MDP) model for the unknown sampling distribution (likelihood) for the observations arising in each state and a computationally efficient data augmentation scheme to aid inference. The method uses novel MCMC methodology which combines recent retrospective sampling methods with the use of slice sampler variables. The methodology is computationally efficient, both in terms of MCMC mixing properties, and robustness to the length of the time series being investigated. Moreover, the method is easy to implement requiring little or no user-interaction. We apply our methodology to the analysis of genomic copy number variation.

McGuinness L, Taylor C, Taylor RD, Yau C, Langenhan T, Hart ML, Christian H, Tynan PW, Donnelly P, Emptage NJ. 2010. Presynaptic NMDARs in the hippocampus facilitate transmitter release at theta frequency. Neuron, 68 (6), pp. 1109-1127. | Show Abstract | Read more

A rise in [Ca(2+)](i) provides the trigger for neurotransmitter release at neuronal boutons. We have used confocal microscopy and Ca(2+) sensitive dyes to directly measure the action potential-evoked [Ca(2+)](i) in the boutons of Schaffer collaterals. This reveals that the trial-by-trial amplitude of the evoked Ca(2+) transient is bimodally distributed. We demonstrate that "large" Ca(2+) transients occur when presynaptic NMDA receptors are activated following transmitter release. Presynaptic NMDA receptor activation proves critical in producing facilitation of transmission at theta frequencies. Because large Ca(2+) transients "report" transmitter release, their frequency on a trial-by-trial basis can be used to estimate the probability of release, p(r). We use this novel estimator to show that p(r) increases following the induction of long-term potentiation.

Lee A, Yau C, Giles MB, Doucet A, Holmes CC. 2010. On the utility of graphics cards to perform massively parallel simulation of advanced Monte Carlo methods. J Comput Graph Stat, 19 (4), pp. 769-789. | Show Abstract | Read more

We present a case-study on the utility of graphics cards to perform massively parallel simulation of advanced Monte Carlo methods. Graphics cards, containing multiple Graphics Processing Units (GPUs), are self-contained parallel computational devices that can be housed in conventional desktop and laptop computers and can be thought of as prototypes of the next generation of many-core processors. For certain classes of population-based Monte Carlo algorithms they offer massively parallel simulation, with the added advantage over conventional distributed multi-core processors that they are cheap, easily accessible, easy to maintain, easy to code, dedicated local devices with low power consumption. On a canonical set of stochastic simulation examples including population-based Markov chain Monte Carlo methods and Sequential Monte Carlo methods, we nd speedups from 35 to 500 fold over conventional single-threaded computer code. Our findings suggest that GPUs have the potential to facilitate the growth of statistical modelling into complex data rich domains through the availability of cheap and accessible many-core computation. We believe the speedup we observe should motivate wider use of parallelizable simulation methods and greater methodological attention to their design.

Wellcome Trust Case Control Consortium, Craddock N, Hurles ME, Cardin N, Pearson RD, Plagnol V, Robson S, Vukcevic D, Barnes C, Conrad DF et al. 2010. Genome-wide association study of CNVs in 16,000 cases of eight common diseases and 3,000 shared controls. Nature, 464 (7289), pp. 713-720. | Show Abstract | Read more

Copy number variants (CNVs) account for a major proportion of human genetic polymorphism and have been predicted to have an important role in genetic susceptibility to common disease. To address this we undertook a large, direct genome-wide study of association between CNVs and eight common human diseases. Using a purpose-designed array we typed approximately 19,000 individuals into distinct copy-number classes at 3,432 polymorphic CNVs, including an estimated approximately 50% of all common CNVs larger than 500 base pairs. We identified several biological artefacts that lead to false-positive associations, including systematic CNV differences between DNAs derived from blood and cell lines. Association testing and follow-up replication analyses confirmed three loci where CNVs were associated with disease-IRGM for Crohn's disease, HLA for Crohn's disease, rheumatoid arthritis and type 1 diabetes, and TSPAN8 for type 2 diabetes-although in each case the locus had previously been identified in single nucleotide polymorphism (SNP)-based studies, reflecting our observation that most common CNVs that are well-typed on our array are well tagged by SNPs and so have been indirectly explored through SNP studies. We conclude that common CNVs that can be typed on existing platforms are unlikely to contribute greatly to the genetic basis of common human diseases.

Yau C, Mouradov D, Jorissen RN, Colella S, Mirza G, Steers G, Harris A, Ragoussis J, Sieber O, Holmes CC. 2010. A statistical approach for detecting genomic aberrations in heterogeneous tumor samples from single nucleotide polymorphism genotyping data. Genome Biol, 11 (9), pp. R92. | Show Abstract | Read more

We describe a statistical method for the characterization of genomic aberrations in single nucleotide polymorphism microarray data acquired from cancer genomes. Our approach allows us to model the joint effect of polyploidy, normal DNA contamination and intra-tumour heterogeneity within a single unified Bayesian framework. We demonstrate the efficacy of our method on numerous datasets including laboratory generated mixtures of normal-cancer cell lines and real primary tumours.

Winchester L, Yau C, Ragoussis J. 2009. Comparing CNV detection methods for SNP arrays. Brief Funct Genomic Proteomic, 8 (5), pp. 353-366. | Show Abstract | Read more

Data from whole genome association studies can now be used for dual purposes, genotyping and copy number detection. In this review we discuss some of the methods for using SNP data to detect copy number events. We examine a number of algorithms designed to detect copy number changes through the use of signal-intensity data and consider methods to evaluate the changes found. We describe the use of several statistical models in copy number detection in germline samples. We also present a comparison of data using these methods to assess accuracy of prediction and detection of changes in copy number.

Giannoulatou E, Yau C, Colella S, Ragoussis J, Holmes CC. 2008. GenoSNP: a variational Bayes within-sample SNP genotyping algorithm that does not require a reference population. Bioinformatics, 24 (19), pp. 2209-2214. | Show Abstract | Read more

UNLABELLED: Current genotyping algorithms typically call genotypes by clustering allele-specific intensity data on a single nucleotide polymorphism (SNP) by SNP basis. This approach assumes the availability of a large number of control samples that have been sampled on the same array and platform. We have developed a SNP genotyping algorithm for the Illumina Infinium SNP genotyping assay that is entirely within-sample and does not require the need for a population of control samples nor parameters derived from such a population. Our algorithm exhibits high concordance with current methods and >99% call accuracy on HapMap samples. The ability to call genotypes using only within-sample information makes the method computationally light and practical for studies involving small sample sizes and provides a valuable independent quality control metric for other population-based approaches. AVAILABILITY: http://www.stats.ox.ac.uk/~giannoul/GenoSNP/.

Buttrick GJ, Beaumont LM, Leitch J, Yau C, Hughes JR, Wakefield JG. 2008. Akt regulates centrosome migration and spindle orientation in the early Drosophila melanogaster embryo. J Cell Biol, 180 (3), pp. 537-548. | Show Abstract | Read more

Correct positioning and morphology of the mitotic spindle is achieved through regulating the interaction between microtubules (MTs) and cortical actin. Here we find that, in the Drosophila melanogaster early embryo, reduced levels of the protein kinase Akt result in incomplete centrosome migration around cortical nuclei, bent mitotic spindles, and loss of nuclei into the interior of the embryo. We show that Akt is enriched at the embryonic cortex and is required for phosphorylation of the glycogen synthase kinase-3beta homologue Zeste-white 3 kinase (Zw3) and for the cortical localizations of the adenomatosis polyposis coli (APC)-related protein APC2/E-APC and the MT + Tip protein EB1. We also show that reduced levels of Akt result in mislocalization of APC2 in postcellularized embryonic mitoses and misorientation of epithelial mitotic spindles. Together, our results suggest that Akt regulates a complex containing Zw3, Armadillo, APC2, and EB1 and that this complex has a role in stabilizing MT-cortex interactions, facilitating both centrosome separation and mitotic spindle orientation.

Yau C, Holmes CC. 2008. CNV discovery using SNP genotyping arrays. Cytogenet Genome Res, 123 (1-4), pp. 307-312. | Show Abstract | Read more

Genome-wide single nucleotide polymorphism (SNP) genotyping platforms have made an important contribution to population genetics and genetic epidemiology. Recently there has been a realisation that these SNP platforms can also be used for typing copy number variants (CNVs). This allows for 'generalised' genotyping of both SNPs and CNVs simultaneously on a common sample set, with advantages in terms of cost and unified analysis. In this article we review various statistical approaches to calling CNVs from SNP data. We highlight three tiers of algorithms depending on the level of information used.

Colella S, Yau C, Taylor JM, Mirza G, Butler H, Clouston P, Bassett AS, Seller A, Holmes CC, Ragoussis J. 2007. QuantiSNP: an Objective Bayes Hidden-Markov Model to detect and accurately map copy number variation using SNP genotyping data. Nucleic Acids Res, 35 (6), pp. 2013-2025. | Show Abstract | Read more

Array-based technologies have been used to detect chromosomal copy number changes (aneuploidies) in the human genome. Recent studies identified numerous copy number variants (CNV) and some are common polymorphisms that may contribute to disease susceptibility. We developed, and experimentally validated, a novel computational framework (QuantiSNP) for detecting regions of copy number variation from BeadArray SNP genotyping data using an Objective Bayes Hidden-Markov Model (OB-HMM). Objective Bayes measures are used to set certain hyperparameters in the priors using a novel re-sampling framework to calibrate the model to a fixed Type I (false positive) error rate. Other parameters are set via maximum marginal likelihood to prior training data of known structure. QuantiSNP provides probabilistic quantification of state classifications and significantly improves the accuracy of segmental aneuploidy identification and mapping, relative to existing analytical tools (Beadstudio, Illumina), as demonstrated by validation of breakpoint boundaries. QuantiSNP identified both novel and validated CNVs. QuantiSNP was developed using BeadArray SNP data but it can be adapted to other platforms and we believe that the OB-HMM framework has widespread applicability in genomic research. In conclusion, QuantiSNP is a novel algorithm for high-resolution CNV/aneuploidy detection with application to clinical genetics, cancer and disease association studies.

Yau C. 2013. OncoSNP-SEQ: a statistical approach for the identification of somatic copy number alterations from next-generation sequencing of cancer genomes Bioinformatics, 29 (19), pp. 2482-2484. | Read more

Knight SJ, Yau C, Clifford R, Timbs AT, Sadighi Akha E, Dréau HM, Burns A, Ciria C, Oscier DG, Pettitt AR et al. 2012. Quantification of subclonal distributions of recurrent genomic aberrations in paired pre-treatment and relapse samples from patients with B-cell chronic lymphocytic leukemia. Leukemia, 26 (7), pp. 1564-1575. | Show Abstract | Read more

Genome-wide array approaches and sequencing analyses are powerful tools for identifying genetic aberrations in cancers, including leukemias and lymphomas. However, the clinical and biological significance of such aberrations and their subclonal distribution are poorly understood. Here, we present the first genome-wide array based study of pre-treatment and relapse samples from patients with B-cell chronic lymphocytic leukemia (B-CLL) that uses the computational statistical tool OncoSNP. We show that quantification of the proportion of copy number alterations (CNAs) and copy neutral loss of heterozygosity regions (cnLOHs) in each sample is feasible. Furthermore, we (i) reveal complex changes in the subclonal architecture of paired samples at relapse compared with pre-treatment, (ii) provide evidence supporting an association between increased genomic complexity and poor clinical outcome (iii) report previously undefined, recurrent CNA/cnLOH regions that expand or newly occur at relapse and therefore might harbor candidate driver genes of relapse and/or chemotherapy resistance. Our findings are likely to impact on future therapeutic strategies aimed towards selecting effective and individually tailored targeted therapies.

Yau C, Papaspiliopoulos O, Roberts GO, Holmes C. 2011. Bayesian Nonparametric Hidden Markov Models with application to the analysis of copy-number-variation in mammalian genomes. J R Stat Soc Series B Stat Methodol, 73 (1), pp. 37-57. | Show Abstract | Read more

We consider the development of Bayesian Nonparametric methods for product partition models such as Hidden Markov Models and change point models. Our approach uses a Mixture of Dirichlet Process (MDP) model for the unknown sampling distribution (likelihood) for the observations arising in each state and a computationally efficient data augmentation scheme to aid inference. The method uses novel MCMC methodology which combines recent retrospective sampling methods with the use of slice sampler variables. The methodology is computationally efficient, both in terms of MCMC mixing properties, and robustness to the length of the time series being investigated. Moreover, the method is easy to implement requiring little or no user-interaction. We apply our methodology to the analysis of genomic copy number variation.

Lee A, Yau C, Giles MB, Doucet A, Holmes CC. 2010. On the utility of graphics cards to perform massively parallel simulation of advanced Monte Carlo methods. J Comput Graph Stat, 19 (4), pp. 769-789. | Show Abstract | Read more

We present a case-study on the utility of graphics cards to perform massively parallel simulation of advanced Monte Carlo methods. Graphics cards, containing multiple Graphics Processing Units (GPUs), are self-contained parallel computational devices that can be housed in conventional desktop and laptop computers and can be thought of as prototypes of the next generation of many-core processors. For certain classes of population-based Monte Carlo algorithms they offer massively parallel simulation, with the added advantage over conventional distributed multi-core processors that they are cheap, easily accessible, easy to maintain, easy to code, dedicated local devices with low power consumption. On a canonical set of stochastic simulation examples including population-based Markov chain Monte Carlo methods and Sequential Monte Carlo methods, we nd speedups from 35 to 500 fold over conventional single-threaded computer code. Our findings suggest that GPUs have the potential to facilitate the growth of statistical modelling into complex data rich domains through the availability of cheap and accessible many-core computation. We believe the speedup we observe should motivate wider use of parallelizable simulation methods and greater methodological attention to their design.

Yau C, Mouradov D, Jorissen RN, Colella S, Mirza G, Steers G, Harris A, Ragoussis J, Sieber O, Holmes CC. 2010. A statistical approach for detecting genomic aberrations in heterogeneous tumor samples from single nucleotide polymorphism genotyping data. Genome Biol, 11 (9), pp. R92. | Show Abstract | Read more

We describe a statistical method for the characterization of genomic aberrations in single nucleotide polymorphism microarray data acquired from cancer genomes. Our approach allows us to model the joint effect of polyploidy, normal DNA contamination and intra-tumour heterogeneity within a single unified Bayesian framework. We demonstrate the efficacy of our method on numerous datasets including laboratory generated mixtures of normal-cancer cell lines and real primary tumours.

Giannoulatou E, Yau C, Colella S, Ragoussis J, Holmes CC. 2008. GenoSNP: a variational Bayes within-sample SNP genotyping algorithm that does not require a reference population. Bioinformatics, 24 (19), pp. 2209-2214. | Show Abstract | Read more

UNLABELLED: Current genotyping algorithms typically call genotypes by clustering allele-specific intensity data on a single nucleotide polymorphism (SNP) by SNP basis. This approach assumes the availability of a large number of control samples that have been sampled on the same array and platform. We have developed a SNP genotyping algorithm for the Illumina Infinium SNP genotyping assay that is entirely within-sample and does not require the need for a population of control samples nor parameters derived from such a population. Our algorithm exhibits high concordance with current methods and >99% call accuracy on HapMap samples. The ability to call genotypes using only within-sample information makes the method computationally light and practical for studies involving small sample sizes and provides a valuable independent quality control metric for other population-based approaches. AVAILABILITY: http://www.stats.ox.ac.uk/~giannoul/GenoSNP/.

Colella S, Yau C, Taylor JM, Mirza G, Butler H, Clouston P, Bassett AS, Seller A, Holmes CC, Ragoussis J. 2007. QuantiSNP: an Objective Bayes Hidden-Markov Model to detect and accurately map copy number variation using SNP genotyping data. Nucleic Acids Res, 35 (6), pp. 2013-2025. | Show Abstract | Read more

Array-based technologies have been used to detect chromosomal copy number changes (aneuploidies) in the human genome. Recent studies identified numerous copy number variants (CNV) and some are common polymorphisms that may contribute to disease susceptibility. We developed, and experimentally validated, a novel computational framework (QuantiSNP) for detecting regions of copy number variation from BeadArray SNP genotyping data using an Objective Bayes Hidden-Markov Model (OB-HMM). Objective Bayes measures are used to set certain hyperparameters in the priors using a novel re-sampling framework to calibrate the model to a fixed Type I (false positive) error rate. Other parameters are set via maximum marginal likelihood to prior training data of known structure. QuantiSNP provides probabilistic quantification of state classifications and significantly improves the accuracy of segmental aneuploidy identification and mapping, relative to existing analytical tools (Beadstudio, Illumina), as demonstrated by validation of breakpoint boundaries. QuantiSNP identified both novel and validated CNVs. QuantiSNP was developed using BeadArray SNP data but it can be adapted to other platforms and we believe that the OB-HMM framework has widespread applicability in genomic research. In conclusion, QuantiSNP is a novel algorithm for high-resolution CNV/aneuploidy detection with application to clinical genetics, cancer and disease association studies.

1403

Thank you for registering your interest

We were unable to record your request to register for interest in future opportunities. Please try again and if problems persist contact us at webteam@ndm.ox.ac.uk