Clinical analysis techniques require extremely high levels of data throughput in the form of powerful storage and computing platforms. These types of studies, in which hundreds of human genomes are sequenced, are now becoming feasible thanks to cloud computing. Beyond the research implications, the use of cloud computing could also benefit patient care
Following the decoding of the human genome, the pharma industry should harness the power of cloud computing to tackle the large amounts of data generated by the development of cancer drugs, says GlobalData’s healthcare industry dynamics analyst, Adam Dion.
Arguably, two of the most noteworthy advances in medicine and technology in the past decade have been genomics research and cloud computing. Until now, these two developments have been on separate pathways, each having little to do with the other. But today, medical research institutions are collaborating with large software and hardware vendors to use cloud computing to tackle some of the most challenging issues – in particular, cancer treatment.
Genomic studies and DNA sequencing are massive computational challenges. These clinical analysis techniques require extremely high levels of data throughput in the form of powerful storage and computing platforms. For instance, the first genome sequence took up about 750 gigabytes of data that had to be analysed, backed up and archived for long-term storage. Today, scientists and researchers engaged in genome studies often work with calculations so complex that it could take years for individual computers to complete them. However, these types of studies, in which hundreds of human genomes are sequenced, are now becoming feasible thanks to cloud computing.
Global business intelligence provider GlobalData believes that by leveraging the power of cloud computing, researchers will have the processing capabilities needed to make sense of the incredible volumes of data produced by genomic studies, and to deliver actionable insights to improve cancer treatment.
The cloud system taps into the processing power of all available computers on a vendor’s back-end
While cloud, or grid computing, is an emerging field of computer science, consumers and multinational corporations have been benefiting from cloud usage for a number of years through developments that store music, photos and documents wirelessly and stream this information to a user’s device. Similarly, through grid computing, geneticists and oncology researchers can now send the data and calculations from genome sequencing to the cloud for processing. The cloud system taps into the processing power of all available computers on a vendor’s back-end, significantly speeding up the calculations.
Beyond the research implications, the use of cloud computing could also benefit patient care. In terms of cancer treatment, specific gene-level changes in cancer cells drive critical treatment decisions. In the past, doctors’ offices had to send patients’ DNA samples to a contract lab for sequencing and analysis, which could take valuable weeks that could otherwise be spent on treatment. However, with disease-specific and patient data in the cloud, a hospital’s onsite medical staff would be able to analyse the data in a matter of minutes, allowing staff to tailor a patient’s treatment to his or her specific tumour. This tool would also provide for detection of the genetic conditions that make some people susceptible to certain types of cancers.
One vendor illustrating the convergence of cloud computing and genomics research is Knome, a life sciences company specialising in developing software tools for genomic sequencing. Knome has released kGAP 2.0, a cloud-based genome informatics software engine built to automate the process of finding genetic variants that influence the risk profile of developing cancer, its progression and its treatment response. By leveraging Knome’s data cloud, kGAP 2.0 can simultaneously annotate and interpret hundreds of genomes from multiple sequencing platforms from diverse data sources into a single genomic study, completing in a day what would otherwise require months to compile.
Elsewhere, Amazon Web Services (AWS) and the US National Institutes of Health (NIH) has made available the complete 1000 Genomes Project data on AWS as a publicly available data set. The data set is stored and accessible on the company’s Simple Storage Service (S3) and Elastic Block Store (EBS) public cloud services.
The 1000 Genomes Project is an international research effort in which 75 companies and research institutions have come together to establish the most detailed catalogue of human genetic material. The project, which started in 2008, has grown to 200 terabytes of anonymous genomic data, including DNA sequenced from more than 1,700 individuals. This makes one of the largest collections of human genetics data publicly available to researchers to collaborate worldwide to study diseases such as cancer.
Prior to the data being put in the AWS cloud, researchers who wanted access to these public data sets had to download them from government data centres to their own systems, or have the data physically shipped to them on discs. This was impractical: the 200 terabytes of genetic data are comparable to 16 million filing cabinets of text or more than 30,000 standard DVDs – most research labs do not have this computing and storage bandwidth internally to run sufficient analyses.
Now, with the public data sets placed in the AWS cloud, researchers and labs of all sizes and budgets have a simple way to access the 1000 Genomes Project data and can start analysing subsets of the data without the added financial investment that would normally be required in provisioning the hardware necessary for such cancer studies. ‘We believe that cloud-computing will accelerate the pace of new genomic discoveries by creating an open ecosystem allowing more investigators access to these important data sets,’ says Dion.
The goal of the partnership is to provide the computational resources to perform complex analysis
Large hardware vendors are also working with research institutions to improve cancer treatment. For example, Dell has donated its high-performance cloud computing resources, powered by Dell PowerEdge Blade Servers, PowerVault Storage Arrays, and Dell Force10 Network infrastructure, to research treatments for paediatric cancer. The company has gifted its computing muscle and data exchange capabilities to the Neuroblastoma and Medulloblastoma Translational Research Consortium (NMTRC) and the Translational Genomics Research Institute (TGen). The goal of the partnership is to provide cancer researchers with the computational resources to perform complex analysis on cancer patient genomics to develop treatments faster, so that personalised therapies can be better targeted to individuals in the hope of improving clinical outcomes and increasing survival rates. These institutions will use the donated assets to conduct the world’s first personalised medicine clinical trial for neuroblastoma, a rare but deadly cancer affecting children.
The neuroblastoma clinical trial is designed to determine the ability of genetic testing to identify the most effective treatment for pediatric neuroblastoma patients. As a result of the donated resources, the computing power of TGen’s existing gene sequencing system increased processing capacity by 1,200% – reducing sequencing time from many weeks to a few days. This technology will allow investigators to scan a huge amount of clinical experience in a timely fashion to assist with the identification of targeted treatment for neuroblastoma.
Moving forward, Dell’s donated cloud architecture will provide TGen, and the paediatric oncology community as a whole, with a collaborative platform to access the clinical trial data globally, that is scalable to tens of thousands of patients, and capable of being replicated across multiple clinical trial sites for a number of pediatric cancers.