Cancer records database now available

Published: 30-Sep-2005

A repository of data on more than 22,000 cancer cases, the first major output of the Clinical e-Science Framework (CLEF), an e-Science project funded by the Medical Research Council (MRC), is now available to clinical and medical researchers. Dr Catalina Hallett demonstrated the new database at the e-Science All Hands meeting in Nottingham, UK on 20 September.


A repository of data on more than 22,000 cancer cases, the first major output of the Clinical e-Science Framework (CLEF), an e-Science project funded by the Medical Research Council (MRC), is now available to clinical and medical researchers. Dr Catalina Hallett demonstrated the new database at the e-Science All Hands meeting in Nottingham, UK on 20 September.

Patient records contain information that could be of use to medical research, but in order to make it accessible to researchers, CLEF has had to develop techniques to extract the relevant data from written text and then present it in such a way that it can be compared with data from scientific and other databases. The project has also implemented stringent access control, authentication and secure transmission protocols using sophisticated encryption standards to protect against accidental disclosures, to ensure secure and ethical access to the databank.

A team led by Professor David Ingram at University College, London built the repository using a new method for importing and structuring data so that users can do population queries over longitudinal data sets. The repository supports the large-scale analysis of patient records in a Grid environment, and can 'handle complex queries while retaining the critical semantic, structural and medico-legal integrity of the data'.

The process was developed in part by a team led by Professor Alan Rector, CLEF's director, at the University of Manchester. It structures the source data in multiple steps, enabling users to put 'complex clinical questions' to the repository. Firstly, data is structured in a longitudinal format, then by clinical context, and finally by the type of data. Previously, the retrieval of similarly complex data would have required more time-consuming manual search and data analysis, but using the work of Professor Rob Gaizauskas' team from the University of Sheffield, the system is able to ex-tract key medical information from clinical records that are in a narrative format, such as medical letters, discharge summaries, radiology reports, etc.

'Once fully deployed, the repository will lead to previously unthinkable, rapid advances in healthcare research by enabling researchers to analyse data stored in a wide range of geographically-spread databases, online,' said Rector.

Furthermore, a generic WYSIWYM ('What you see is what you mean') interface, developed by Professor Donia Scott's team at The Open University, enables users to pose 'complex clinical queries in natural language and receive answers in plain English text or simple tables and graphs'.

CLEF's future work includes extending its database and 'refining its use of knowledge resources to help both patients and professionals to access the right information and interpret scientific data'.

You may also like