Clinical trials are lengthy, costly and create masses of data. Alan Bell, Life Sciences Director at Tessella, discusses how such data can be shared and mined long after the trial has ended for future insight and commercial gain
For pharma companies, one could argue that data is their primary product. Fortunes are spent trialling, recording and interrogating information about different permutations of molecules and their effects. The result is a vast portfolio of information that represents the opportunity to license a drug. Historically this data has been used very inefficiently. Pharma R&D leaders are very good at using advanced analytics during the development and licensing process to acquire information, achieve the best return on investment and improve productivity. But once the process is complete, the data tends to be filed away and ignored.
This is a shame, as that data could be hugely useful for further innovation. Using data more effectively, as well as sharing it across non-competing organisations, presents opportunities to spot new breakthroughs, avoid duplications, and save time and money.
Post-analysis of data can offer insight into improving efficacy, or identify why drugs failed clinical trials. It could mean spotting new breakthroughs by way of alternative uses for existing or failed drugs. As new data sources become more readily available, from processes such as genome sequencing or even health records, there is more opportunity to cross-examine data to reveal new insights and spot new opportunities.
The sheer volume and complexity of data available to pharma companies bring many legitimate challenges that hold back its full exploitation
Doing so is not easy. The sheer volume and complexity of data available to pharma companies bring many legitimate challenges that hold back its full exploitation. But the prize of overcoming these is worth the effort.
The first challenge is accessing data. Many companies and even departments are reluctant to share. There is a mindset of ‘it’s my data and I don’t trust others with it’. This isn’t just protectionism: data analytics is complex and there are legitimate concerns about untrained eyes drawing wrong conclusions from specialist data.
Linked to this is the worry about trusting data. In areas like retail, it may be enough simply to spot correlations. If 70% of people who like Product A also like Product B, it is reasonable to recommend Product B to anyone who buys Product A. This isn’t good enough in life sciences; we need to go beyond spotting correlations and understand the underlying causal relationships, including eliminating bias.
Such correlation spotting is neither unheard of nor entirely unexpected. Companies have been known – not unreasonably – to assign statisticians to problems, who find correlations between a disease and drug response. They report it to biologists who come up with a plausible reason for the correlation. The company then spends a fortune to validate it, only to find the correlation was an anomaly which has disappeared.
Using data usefully in the complex life sciences arena is a big challenge. A vast variety of skills must be brought to bear on diverse and complex problems – modelling, genetic algorithms, machine learning, neural networks, etc. Even multinational pharma companies can’t justify having full-time employees in all these areas – and even if they could, they would struggle to find people to fill the roles.
So, here are two opportunities to create much greater innovation in the pharma industry through better use of data: 1) make it more available and 2) use it more intelligently.
First, making it more available goes for sharing both internally and externally. In both cases we need to create some form of neutral ground to ensure confidence in sharing data. Internally, encouraging staff to share their own data and embrace publicly available data is a good start, but some form of proactive data-centric strategy is more likely to be successful.
Even where there is corporate desire to share, there are barriers
Even where there is corporate desire to share, there are barriers. Data sets are often broad and complex. Different organisations and even different departments may use different nomenclatures and distribute data in many different formats. As a result, it is difficult to integrate data sets into a single resource and it is difficult for non-technical scientists to handle, understand and query.
It is therefore important to have a process that actively collates, interprets and presents data in a usable format, and incentives for your organisation’s data custodians to contribute it to this central resource. For example, H3 Biomedicine, a cancer drug discovery company, has taken an active policy of establishing a data-centric culture. H3 utilises the large volume of pharmacogenomic data that has been amassed in the public and private sectors.
Large data sets have been generated that characterise cell-line, patient and in-vivo models, based on a variety of molecular features and at varying depths of characterisation. Large scale screening experiments have generated arrays of pharmacology data that can be correlated to molecular features to better understand the links between these features, the cause of disease and potential treatment options. These data sets can be used to explore new hypotheses, identify potential biomarkers and validate internally generated results.
Individual companies don’t always have the body of data necessary to really understand big issues
The H3 bioinformatics team was assigned to integrate the data sets the company used with many publicly available data sets, and support scientists in exploring it. However, this process was laborious, creating bottlenecks, and rarely the best use of the team’s expertise. To address this H3, together with Tessella, developed the Translational Informatics Platform (TIP). This allowed them to integrate multiple pharmaco-genomics data sources and provide a framework within which to query and explore that data – presented through a simple user interface. As the user interacts with the system, TIP dynamically provides additional filters and options based on what the user has selected and what data is available in the system.
None of this happened by itself. It works because someone took charge of ensuring data sharing was a focus, brought in the relevant expertise to collate and interpret it, and created software to ensure it could be delivered in a useable way.
Sharing internally is part of the battle. But individual companies don’t always have the body of data necessary to really understand big issues; it needs the weight of multiple companies, combined with academic and healthcare data to make the biggest breakthroughs.
Progress is being made. Recently there has been a drive towards open pharma data. Companies such as GSK and AstraZeneca are making some exciting moves here. This is also driving companies to do their own post-analysis; they want to understand their own data before others do, helping them make more of their own data. But there is still a gold mine of data not being used.
Success is about creating momentum. As companies get more used to sharing, and benefiting from shared data internally, they will be more likely to see the value in sharing externally. Once a few companies lead the way and standard systems are developed to safely and practically share and interrogate data, we hope things will snowball, creating many new opportunities.
Making the data available is part of the challenge; we then need to use it intelligently. This is about exploring data in a more scientifically rigorous way. We need to avoid ‘correlation hunting’ and post-rationalising. Instead we should start with an idea of what we’re looking for to avoid spotting things that aren’t there. Building predictive models that can be properly tested can give you likely outcomes that can help focus research. We can’t just rely on the data, we also need to appreciate the context. Science has to deal with causality – understanding complexities of bias and false correlations to deliver real insights.
We should start with an idea of what we’re looking for to avoid spotting things that aren’t there
For example, a key part of Phase II trials is selecting the dose. Often an analysis part way through will reject underperforming doses, but statistically underperforming doses could still turn out to be most suitable – just as with a coin one could toss 10 heads in a row. A better approach is not to reject doses outright but increase the allocation to those performing best, and to build a dose response model. A dose response model will produce sample points along a continuous curve rather than a string of isolated data points. This gives better insight into the ideal dose, highlighting if underperforming doses are likely to recover, and also whether doses between those you are testing would be preferable.
This still allows you to bias tests towards doses that are working, but does not throw away others that may recover as the trials continue. Building models to get the most out of existing data maximises the chance of success in Phase III.
Data mining processes can also have models built into them. For example, if it’s known that a protein is created at a certain rate but it is not known what the rate is, data can be used to train the model, and can then make predictions in areas where there is no data.
In applying these approaches, technology platforms also need to be fully appreciated. A lot of companies have bought analytics platforms, assuming they have all the answers. But the problems are too complex to automate, and they need people to ensure these technologies deliver the promised value. Companies need to recruit and cultivate ‘translators’ – people who can talk to scientists with problems and make the technology do what it needs to do to solve them.
Through data analytics, companies have identified and understood anomalies that without analytics may have cost them a licence
Applying this scientifically rigorous approach to data requires a considerable range of skills – covering an understanding of analytics and modelling techniques as well as the issues in life sciences.
The benefits of getting it right are tangible. Through data analytics, companies have identified and understood anomalies that without analytics may have cost them a licence. One company found high doses of aspirin use in the US meant a drug was less effective there, but worked well elsewhere. Finding this out was a huge data analytics task but the upshot was a clear licensing case to take to the regulators to avoid a blanket failure.
Bringing all sources of data together and bringing all analytics expertise and models to bear on problems offers huge opportunities for more innovation in the pharma industry. Better use of data will encourage others to open up their data, creating ever more opportunities and benefiting everyone.