Cancer And Clinical Trials: The Role Of Big Data In Personalizing The Health Experience

This article was written in collaboration with Ellen M. Martin and Tobi Skotnes. Dr. Feldman delivered a webinar on this topic on September 18, 2013 and spoke about it at the Strata Rx conference.

Big Data and analytics are the foundation of personalized medicine

Despite considerable progress in prevention and treatment, cancer remains the second leading cause of death in the United States. Even with the $50 billion pharmaceutical companies spend on research and development every year, any given cancer drug is ineffective in 75% of the patients receiving it. Typically, oncologists start patients on the cheapest likely chemotherapy (or the one their formulary suggests first) and in the 75% likelihood of non-response, iterate with increasingly expensive drugs until they find one that works, or until the patient dies. This process is inefficient and expensive, and subjects patients to unnecessary side effects, as well as causing them to lose precious time in their fight against a progressive disease. The vision is to enable oncologists to prescribe the right chemical the first time–one that will kill the target cancer cells with the least collateral damage to the patient.

How data can improve cancer treatment

Big data is enabling a new understanding of the molecular biology of cancer. The focus has changed over the last 20 years from the location of the tumor in the body (e.g., breast, colon or blood), to the effect of the individual’s genetics, especially the genetics of that individual’s cancer cells, on her response to treatment and sensitivity to side effects. For example, researchers have to date identified four distinct cell genotypes of breast cancer; identifying the cancer genotype allows the oncologist to prescribe the most effective available drug first.

Herceptin, the first drug developed to target a particular cancer genotype (HER2), rapidly demonstrated both the promise and the limitations of this approach. (Among the limitations, HER2 is only one of four known and many unknown breast cancer genotypes, and treatment selects for populations of resistant cancer cells, so the cancer can return in a more virulent form.)

How data can improve clinical trials

As with treatment, progress in developing better cancer drugs has been hindered by a lack of genomic and metabolic understanding. The historical approach to cancer drug clinical trials is to recruit uncharacterized (without any genomic, metabolic or other differentiators that may affect response to the candidate treatment) subjects to test one-size-fits-all drugs. Given what we know now about cancer genotypes and individual response to drugs, it’s amazing any drugs were able to show statistically significant efficacy and reach the market.

Using personal medical and population genomics data, clinicians now have tools to design more targeted clinical trials by matching cancer cell types and individual metabolic response to the drug candidate, recruiting subjects who will be more likely to respond and excluding those likely to have treatment-limiting side effects.

Using new data to address unmet medical needs

In addition to cancer, there are many diseases with wide individual variability and a dearth of effective treatments: e.g., Alzheimer’s, depression, diabetes, asthma, and arthritis. A flood of new data streams in health care (from digitized medical records, genomics, pharmaceutical data, and data from trackers and sensors) may enable clinicians to make better diagnoses and prognoses that can give patients better prevention and treatment choices. Furthermore, aggregated health data can enable researchers to determine which patients are good candidates for particular clinical trials or treatment protocols. Using sensors, at-home monitors, and smartphone device trackers, clinicians can capture clinical data in real time to monitor patients’ progress outside of the hospital between visits. This new approach is becoming possible through a combination of data sources and improved data management and analytics to move toward more effective treatments–and ultimately, personalized medicine.

New ways to analyze medical data: GNS, Ayasdi, Explorys

Companies are applying old and new methods to analyze the multidimensional data sets collected from cancer and clinical trial research. GNS Healthcare, Ayasdi, and Explorys are a few of these companies, using topology, causal models, and multiple processors, respectively, to analyze and visualize the data.

  • GNS Healthcare uses machine learning and statistics to create software models that let users predict the outcome of “what if” scenarios. Their next-generation REFS machine-learning engine (on a cloud platform) extracts these models directly from multiple sources of data to determine comparative effectiveness and create simulations across an entire patient population as well as on an individual level. This can help determine which treatment or line of action will be best for individuals and for the health system as a whole. For example, GNS recently announced that they are using EMR and genomic data to create a computer model that can predict which pregnant women are at risk of preterm labor.
  • Ayasdi uses a more esoteric topological analysis, a “math of shapes”, on their Iris platform, to visualize data in a multidimensional graphic that readily shows outliers as well as high and low-response groups in the data, even without pre-specifying the characteristics of those clusters. The outliers can represent unknown biomarkers, or subgroups of patients that would be well (or poorly) suited to a clinical trial of a particular drug. Other clusters in the visualization could point to data sets that demand further analysis that are invisible through other analytic methods. Ayasdi has found a number of novel biomarkers, the first of which was a new subset of “triple negative” survivors that had elevated expression levels of genes involved in the immune system for breast cancer.
  • Explorys focuses on the aggregation, storage, and analysis of multiple data sources, including all clinical, financial, and operational data related to patient care. Massive parallel processing allows Explorys to look at multiple data sets from multiple angles at the same time, processing the data in real time for real time results.

Why We Care: The Future of Medicine

These trends in clinical trials and cancer research represents the dawn of a new age of personalized medicine.

For pharma researchers, designing more precise clinical trials can reduce drug development failures and costs. For clinicians, matching drug treatments to patients could mean improved response and lower costs. For patients, reduced side effects and avoiding trial-and-error dosing improves quality of life. All this has potential to save the healthcare system $300 billion dollars a year, and hundreds of thousands of lives.

For data scientists, new research into disease mechanisms relies on the aggregation of vast stores of data from many different sources. That presents a major data management and analysis challenge as medical records convert to digital ones, growing sets of genomic and pharmaceutical data become available, and mobile data flows come on stream from millions of people.

This article was first published in O'Reilly's Strata and it is reprinted here under the terms of the Creative Commons license.