Cookies on this website

We use cookies to ensure that we give you the best experience on our website. If you click 'Accept all cookies' we'll assume that you are happy to receive all cookies and you won't see this message again. If you click 'Reject all non-essential cookies' only necessary cookies providing core functionality such as security, network management, and accessibility will be enabled. Click 'Find out more' for information on how to change your cookie settings.

Over the past decade, data-driven science has produced enormous sets of data. The convergence of statistics and computer science, in the field known as machine learning, provide the means to understand these large datasets. Ultimately, machine learning algorithms will be develop into clinical decision making support systems.

Q: Can you explain what big data is?

Christopher Yau: Big data is a phenomenon that has arisen out of a long period of data-driven science. Over the last decade, we have seen the cost of genomic technologies rapidly reduce; at the same time, the number of samples and people that we are analysing has rapidly increased. In big data we are looking at the massive data repositories that we have generated over the last ten years and will be generating in years to come, and try to extract knowledge from these data sets to answer both specific and multiple scientific questions.

Q: How is big data helping you explain differences in cancer?

CY: My group works with the ovarian cancer laboratory in Oxford, and in one of our studies we have been able to take 40 tumour samples from a single ovarian cancer patient. These tumour samples have come from different parts of the body where the disease has spread and at different times during the patient's diagnosis. We have also taken samples before and after chemotherapy. We have sequenced each of these 40 tumour samples and the process has generated a massive 40 terabytes of raw data for this individual patient. What we can learn from this data is critical insights into how the cancer arose in this particular individual, how it evolved and spread, and how this tumour reacted to chemotherapy. This has been really important, as this comprehensive profiling has allowed us to gain insights into the tumour evolution in this patient, which we couldn't have got from just a single tumour sample. What is exciting in the next few years is that we be will applying the same technique to further ovarian cancer patients, and putting together a massive comprehensive profile of ovarian cancers in different patients.

Q: Why is it important to understand these differences in tumours?

CY: Patients respond differently to cancer treatments, and at the moment we don't have a full understanding or the ability to predict exactly how they will respond. By looking at the genetic differences between the cancers and genetic differences between cells within the same tumour, we can learn about the mechanisms of drug resistance and also of radiotherapy resistance in these patients. By relating the genetic changes to how they respond we hope to produce better and more effective treatment plans in the future.

Q: What are the most important lines of research that have risen in the last five or ten years?

CY: What has been really exciting in the last ten years has been the convergence of two different fields. On the one hand we have statistics, which is traditionally a mathematical discipline, and on the other hand we have computer science, which is more technologically driven. As these two fields have converged, in the field known as machine learning, what we have seen is an explosion of new ideas for analysing data and developing smarter, more efficient computational algorithms. This has been really important, because in parallel, in genetics we have seen an explosion in the amount of data we can collect and generate; without these new ideas coming from machine learning for interpreting this data, we would have the means of generating lots of data but no means of understanding it.

Q: Why does your research matter and why should we put money into it?

CY: One of the most interesting things that has occurred this year has been the announcement of the Genomics England 100,000 Genomes Project. This project is going to be very important, as it will sequence 100,000 genomes and provide another resource of data for us to study. It will also help the NHS prepare for the challenge of integrating genomic technologies into modern healthcare. However, whilst it's easy to buy more sequences to meet the capacity challenge of integrating genome sequencing into healthcare, it isn't quite so easy to hire and train new data analysts. So, a lot of my work is concerned with developing machine learning algorithms that make the task of processing and analysing complex genomic data sets much easier and much faster. It is important also to consider that machine learning in a biomedical context is a lot different to machine learning in other applications. Most people are probably familiar with the use of automatic face tagging in social networking sites, or speech recognition software on their phones. In these applications, if you make an error it is generally not a terrible thing and might in fact be quite amusing; but, in a clinical setting we can't afford to make errors, and we need to engineer machine learning algorithms to respect much more stringent specifications for robustness and reliability.

Q: How does your research fit into translational medicine within the Department?

CY: Ultimately what we would like to do is turn the machine learning algorithms that we develop from research tools into clinical decision making support systems. For example, we have been working with the Biomedical Research Centre here in Oxford to develop a diagnostic system for leukaemia, which allows us to translate complex genomic data coming from a particular type of technology into a simple 1-2 page report that can be used by clinicians to develop a patient treatment plan. In the future this automation is going to become more important because we will see multiple genomic technologies being used, and also the need to integrate these genomic technologies with imaging technologies and other types of information that is being gathered about patients.

Christopher Yau

Computational statistics

The focus of Professor Christopher Yau is the development of computational statistical methods for applications in genetics and genomics. His aim is to develop methods and tools that can be widely used by specialists and non-specialists alike for research and clinical practice in cancer.

More podcasts related to Genetics

Anna Gloyn: Genetics and Diabetes

Predictions suggest that by 2030, 366 million people worldwide will be affected by diabetes, a disease which already uses 10% of the NHS budget. Continued breakthroughs in the area of genetics related to different types of diabetes enable better diagnosis and treatment for patients and identify novel pathways that can be targeted for therapeutic interventions.

Erika Mancini: Chromatin Remodelling

Chromatin plays an important role in the regulation of gene expression. The movement of nucleosomes, packing and unpacking DNA, is governed by chromatin remodelling ATPases. Malfunctions in the regulation of chromatin structure often leads to complex multi-system diseases and cancer, notably leukemia.

Diabetes and Genomics by Mark McCarthy

Diabetes and obesity are both major challenges for global healthcare, with the social, health and economic costs over the next fifty years being in the ‘trillions’ of dollars. Genetics is one of the more important tools for developing a systematic understanding of the disease and how best to treat it in different patients.

Silvia Paracchini: Dyslexia and Genetics

Dyslexia is an impairment in learning to read that affects up to 10% of children; it can have profound effects on an individual life. Dyslexia has an important genetic component; candidate genes control important stages during foetal brain development. Understanding the biology of dyslexia could help us design more effective diagnostic criteria and treatment plans.

Claire Palles: Gastrointestinal cancers

The gastrointestinal track is responsible for more cancers than any other system. A condition called Barrett's oesophagus, characterised by a change in the cells lining the oesophagus, can lead to oesophageal adenocarcinoma. Only few people with Barrett's oesophagus will go on to develop cancer, and genome sequencing studies aim to identify genetic risk factors and therefore better target high-risk patients.

Antonio Velayos-Baeza: Rare neurological disorders

ChAc is a rare progressive neurological disorder caused by mutations in a very complex gene. A better understanding of the biology underlying this disease helps develop better diagnostic tools, and opens up the possibility of discovering targets for possible future treatments.

Zamin Iqbal: Computation and genetics

Resistance to drugs in bacteria can be aquired by swapping genes between individual bacteria. Computer programs developed by Dr Iqbal enable doctors to predict which antibiotics will be met with drug resistance, enabling the selection of the right drug. His work also enables the tracking of an infection from patient to patient, as well as the tracking of the spread of an infection within a hospital.

Gerton Lunter: The evolution of the genome

Computational and stastistical methods help us understand evolution as well as genetic disease. Looking at our genomes opens up clinical possibilities, for example in cancer, allowing more genes to be looked at - more quickly and more cheaply, wich can impact prognosis and treatment selection.

Catherine Green: DNA replication and Cancer

The process of DNA replication is complex, and mistakes can lead to genome instability. Surveillance systems are not always successful which results in mutations that have the potential to inactivate genes or change their activity. This can lead to cancer, and many chemotherapeutic drugs are designed to disrupt DNA replication. A better understanding of these mechanisms can help us develop new drugs with reduced side effects.

Peter Donnelly: Human Genetics

Professor Donnelly tries to understand the genetic basis of common human diseases. Information about the genetic variants can give us clues into the biology of the diseases. We can then use that information to develop new drugs, to find new drug targets and develop new therapies. Changes in clinical practice are already happening, and we expect genetics to play an important role in translational medicine over the next ten or twenty years.

Translational Medicine

From Bench to Bedside

Ultimately, medical research must translate into improved treatments for patients. At the Nuffield Department of Medicine, our researchers collaborate to develop better health care, improved quality of life, and enhanced preventative measures for all patients. Our findings in the laboratory are translated into changes in clinical practice, from bench to bedside.