Crowdsourcing a Better Prostate Cancer Prediction Tool

Press Release | University of Colorado Anschutz Medical Campus | November 15, 2016

Knowing the likely course of cancer can influence treatment decisions. Now a new prediction model published today in Lancet Oncology offers a more accurate prognosis for a patient's metastatic castration-resistant prostate cancer. The approach was as novel as the result - while researchers commonly work in small groups, intentionally isolating their data, the current study embraces the call in Joe Biden's "Cancer Moonshot" to open their question and their data, collecting previously published clinical trial data and calling for worldwide collaboration to evaluate its predictive power. That is, researchers crowdsourced the question of prostate cancer prognosis, eventually involving over 550 international researchers and resulting in 50 computational models from 50 different teams. The approach was intentionally controversial.

"Scientists like me who mine open data have been called 'research parasites'. While not the most flattering name, the idea of leveraging existing data to gain new insights is a very important part of modern biomedical research. This project shows the power of the parasites," says James Costello, PhD, senior author of the paper, investigator at the University of Colorado Cancer Center, assistant professor in the Department of Pharmacology at the CU School of Medicine, and director of Computational and Systems Biology Challenges within the Sage Bionetworks/DREAM organization.

The project was overseen as a collaborative effort between 16 institutions, led by academic research institutions including CU Cancer Center, open-data initiatives including Project Data Sphere, Sage Bionetworks, and the National Cancer Institute's DREAM Challenges, and industry and research partners including Sanofi, AstraZeneca, and the Prostate Cancer Foundation. Challenge organizers made available the results from five completed clinical trials. Teams were challenged to connect a deep set of clinical measurements to overall patient survival, organizing their insights into novel computational models to better predict patient survival based on clinical data.

"The idea is that if a patient comes into the clinic and has these measurements and test results, can we put this data in a model to say if this patient will progress slowly or quickly. If we know the features of patients at the greatest risk, we can know who should receive standard treatment and who might benefit more from a clinical trial," Costello says.

The most successful of the 50 models was submitted by a team led by Tero Aittokallio, PhD, from the Institute for Molecular Medicine Finland, FIMM, at University of Helsinki, and professor in the Department of Mathematics and Statistics at University of Turku, Finland.

"My group has a long-term expertise in developing multivariate machine learning models for various biomedical applications, but this Challenge provided the unique opportunity to work on clinical trial data, with the eventual aim to help patients with metastatic castration-resistant prostate cancer," Aittokallio says.

Basically, the model depended on not only groups of single patient measurements to predict outcomes, but on exploring which interactions between measurements were most predictive - for example, data describing a patient's blood system composition and immune function were only weakly predictive of survival on their own, but when combined became an important part of the winning model. The model used a computational learning strategy technically referred to as an ensemble of penalized Cox regression models, hence the model's name ePCR. This model then competed with 49 other entries, submitted by other teams working independently around the world.

"Having 50 independent models allowed us to do two very important things. First when a single clinical feature known to be predictive of patient survival is picked out by 40 of the 50 teams, this greatly strengthens our overall confidence. Second, we were able to discover important clinical features we hadn't fully appreciated before," Costello says.

In this case, many models found that in addition to factors like prostate-specific antigen (PSA) and lactate dehydrogenase (LDH) that have long been known to predict prostate cancer performance, blood levels of an enzyme called asparate aminotransferease (AST) is an important predictor of patient survival. This AST is an indirect measure of liver function and the fact that disturbed levels of AST are associated with poor patient performance implies that studies could evaluate the role of AST in prostate cancer.

"The benefits of a DREAM Challenge are the ability to attract talented individuals and teams from around the world, and a rigorous framework for the assessment of methods. These two ingredients came together for our Challenge, leading to a new benchmark in metastatic prostate cancer," says paper first author, Justin Guinney, PhD, director of Computational Oncology for Sage Bionetworks located at Fred Hutchinson Cancer Research Center.

"A goal of the Project Data Sphere initiative is to spark innovation - to unlock the potential of valuable data by generating new insights and opening up a new world of research possibilities. Prostate Cancer DREAM Challenge did just that. To witness cancer clinical trial data from Project Data Sphere be used in research collaboration and ultimately help improve patient care in the future is extremely rewarding!" says Liz Zhou, MD, MS, director of Global Health Outcome Research at Sanofi.

The goal now is to make the ePCR model publicly accessible through an online tool with an eye towards clinical application. In fact, the National Cancer Institute (NCI) has contracted the winning team to do exactly this. Soon, when patients face difficult decisions about the best treatment for metastatic castration-resistant prostate cancer, ePCR tool could be an important piece of the decision-making process.

Challenge winners and results can be found on the Prostate Cancer DREAM Challenge homepage. The clinical trial data can be found at Project Data Sphere. The research article describing this work can be found at The Lancet Oncology. Additional papers that describe individual team methods can be found in the DREAM Channel at F1000Research.

About University of Colorado Cancer Center

The University of Colorado Cancer Center, located at the Anschutz Medical Campus, is Colorado's only National Cancer Institute-designated comprehensive cancer center, a distinction recognizing its outstanding contributions to research, clinical trials, prevention and cancer control. CU Cancer Center's clinical partner University of Colorado Hospital is ranked 15th by US News and World Report for Cancer and the CU Cancer Center is a member of the prestigious National Comprehensive Cancer Network®, an alliance of the nation's leading cancer centers working to establish and deliver the gold standard in cancer clinical guidelines. CU Cancer Center is a consortium of more than 400 researchers and physicians at three state universities and three institutions, all working toward one goal: Translating science into life. For more information visit Coloradocancercenter.org and follow CU Cancer Center on Facebook and Twitter.

About the DREAM Challenges Initiative

Founded in 2006 by A. Califano (Columbia University) and Gustavo Stolovitzky (IBM Research) the Dialogue on Reverse Engineering Assessment and Methods (DREAM) Challenges Initiative poses fundamental questions about systems biology and translational medicine. Designed and run by a community of researchers from a variety of organizations, the DREAM challenges invite participants to propose solutions -- fostering collaboration and building communities in the process. Expertise and institutional support are provided by Sage Bionetworks, along with the infrastructure to host challenges via their Synapse platform. Together, the leaders of the DREAM Challenges Initiative share a vision allowing individuals and groups to collaborate openly so that the "wisdom of the crowd" provides the greatest impact on science and human health. More information is available at: http://dreamchallenges.org/.

About the Project Data Sphere Initiative

Project Data Sphere, LLC, an independent, not-for-profit initiative of the CEO Roundtable on Cancer's Life Sciences Consortium (LSC), operates the Project Data Sphere® platform. Launched in April 2014, the Project Data Sphere platform provides one place where the cancer community can broadly share, integrate, analyze and discuss historical patient-level comparator arm data sets (historical patient-level cancer phase III) from multiple providers, with the goal of advancing research. With its broad-access approach, the initiative brings diverse minds and technology together to help unleash the full potential of existing clinical trial data and speed innovation by generating collective insights that may lead to improved trial design, disease modeling and beyond. The platform currently contains 27,600 patient lives of data; 9,400 of those are across a wide spectrum of prostate cancer populations. In order to ensure that researchers can realize the full potential of this data, PDS teamed with CEO Roundtable on Cancer Member, SAS Institute Inc. SAS, a leader in data and health analytics, developed and hosts the site and provides free state-of-the-art analytic tools to authorized users within the Project Data Sphere environment.

About Sage Bionetworks

Sage Bionetworks is a nonprofit biomedical research organization, founded in 2009, with a vision to promote innovations in personalized medicine by enabling a community-based approach to scientific inquiries and discoveries. Sage Bionetworks strives to activate patients and to incentivize scientists, funders and researchers to work in fundamentally new ways in order to shape research, accelerate access to knowledge and transform human health. It is located on the campus of the Fred Hutchinson Cancer Research Center in Seattle, Washington and is supported through a portfolio of philanthropic donations, competitive research grants, and commercial partnerships. More information is available at http://www.sagebase.org.