Delving into the archives

Published: 1-Sep-2005

Knowledge is power, but only when it's stored somewhere secure and retrievable. Steve Tongish, director of marketing for Plasmon, looks at the pharmaceutical sector's growing requirement for archival storage.


Knowledge is power, but only when it's stored somewhere secure and retrievable. Steve Tongish, director of marketing for Plasmon, looks at the pharmaceutical sector's growing requirement for archival storage.

Businesses worldwide are beginning to recognise the importance of a secure long-term archival storage strategy for their valuable records. While industries such as insurance, financial services and medical imaging have been managing electronic archives for many years, fundamental market changes are now causing a much broader segment of the medical industry to consider how to manage the growing volume of research, patient and administrative records.These changes are being driven by three main factors: compliance; risk management; and competitive advantage.

Compliance with government and corporate regulations on data retention is mandatory, and more regulations are being put in place to define the length of record retention and a method of data storage that ensures electronic records are authentic and quickly accessible. There are many country-specific regulations that address the storage of legal records, including the Health Insurance Portability and Accountability Act of 1996 (HIPAA) in the US, NHS best practice guidelines in the UK and Interchange of Data (IDA) in the EU. Additional FDA and EU regulations impose controls on the storage of pharmaceutical and medical research information, and there are more general regulations, such as Data Protection and Freedom of Information.

litigation risk

Risk management is another critical consideration for the healthcare industry, given its exposure to potential litigation. It is vital that there is full access to a wide range of information. Failure to produce authentic records within the discovery period defined by a court can mean the difference between winning and losing a judgement. The loss of a high visibility court case can cost millions and can have catastrophic political consequences.

There is a competitive advantage awaiting those manufacturers who can effectively exploit their information. The healthcare industry spends vast sums of money creating its information assets, but seldom utilises them to maximum benefit. A professional data archive that provides access to historic information can slash administration costs, decrease time-to-market for new products and improve service quality. Organisations can realise a tremendous business gain by implementing a sound archival storage strategy.

The lifecycle of data involves several stages, each with unique technology and application requirements. Most data begins life in an 'active' state. While active, it is being created, modified or frequently accessed. At some point in the data lifecycle it moves into a static or 'archive' state, defined by time or triggered by a specific event. Archive data differs from active data in that the authenticity of a given record in the archive is often critical. As a result archives must protect data from modification in order to guarantee their authenticity.

The length of retention also characterises archive data: the longer the required retention period, the greater the need to develop an archive strategy that provides long-term record authenticity in a cost-effective manner.

There is often confusion between a data archive and a back-up. A classic back-up application takes periodic images of active data in order to provide a method of recovering records that have been deleted or destroyed. Most back-ups are retained for only a few days or weeks, as later back-up images supersede previous versions. Since the archived records are static, there is no reason to include them in periodic back-ups.

Active, back-up and archive environments are complementary, but each has different requirements since they service data in different ways at different stages in the lifecycle. The interaction and technology options for these three stages are summarised in table 1.

data management

Because active data is constantly in use, the critical priorities are performance and availability. For many companies, downtime or slow access and read/write performance is simply not acceptable. The only real technology choice for active data is magnetic disk, normally configured in a RAID (redundant array of independent disks) systems to deliver the required performance and availability.

Performance is also an important factor for back-up, but since most back-up operations involve large data sets the ability to stream information quickly to and from the back-up media is a first priority. Fast random access to small data sets during restore operations is typically less important.

As an insurance policy, it is also necessary to minimise back-up expense by reducing the cost of each stored record. The media of choice for back-up and disaster recovery applications has traditionally been magnetic tape since it satisfies the performance and cost criteria of most organisations.

In contrast, archival storage requirements are quite different; media longevity and data authenticity feature much more prominently in archive environments. The storage media used within an archive should have a very stable, long life to avoid frequent data migration over decades of storage. In order to comply with corporate and industry regulations on data authenticity, it is crucial that information be protected from modification.

Unlike back-ups, the performance bottleneck for an archive is not read/write streaming, but the ability to provide fast access to millions of records requested by thousands of users. For data archives, fast random access is typically the most critical performance consideration. Where active and back-up expenses are analysed using short-term acquisition and media costs, it is more appropriate to consider the long-term total cost of ownership for an archive environment.

cost analysis

Analysis over time provides a much more representative view of the total archive cost over many years. Many regulations specify the use of optical storage for data archives because it provides true Write Once recording with fast random access and unmatched media longevity.

While there are traditional roles for magnetic disk, tape and optical technologies, all of these have been evolving in an effort to expand into new markets. Today there are magnetic disk products targeted at both the back-up and archive market; tape products have always played a role in archive environments; and optical storage is gaining broader industry acceptance.

One thing that is clear: no single technology can fully satisfy all data storage requirements at all stages of the data lifecycle. The matrix in table 2 is designed to provide a general comparison of different storage technologies when applied specifically to an archive environment. Eight key archival storage attributes are contrasted against the characteristics of RAID, tape and three optical formats: Digital Versatile Disks (DVD), Magneto Optical (MO) and Ultra Density Optical (UDO). Those shown in bold indicate where the technology provides a good match with the defined attributes.

RAID, tape and optical could all be used for long-term archival storage, but there are trade-offs. For example, RAID offers unmatched performance and high capacity, but is fundamentally a rewritable medium with a life of several years. It cannot be taken off-line for secure vaulting and the long-term operating cost for storage is very high.

long-term options

In contrast, tape is removable, has extremely high capacity and offers fast read/write streaming at a very affordable price. Where tape struggles in some archives is with random access performance. Like RAID it is a rewritable medium and requires special additional maintenance if used for a long-term archive.

DVD provides an interesting contrast to tape and RAID. It is a true Write Once medium with a long life at an affordable cost, but unlike disk and tape it is a consumer grade product with low capacity (9.4GB) and modest performance. Older generation MO meets most of the archive requirements, but its current capacity of 9.1GB means that overall system costs are too high for many environments. The one technology that addresses all the archive attributes is 30GB UDO. This is not surprising since it is the only contemporary technology designed for long-term archival storage. While no technology will fit the needs of every organisation, the attributes of UDO provide an archive solution for a wide range of applications and industries.

One archival storage attribute worth considering in more detail is the total cost of ownership (TCO). For many organisations, financial considerations are just as important as technical attributes. Companies will not invest in new product solutions unless they meet their technical requirements in a cost-effective way.

cost comparison

The bar chart in figure 1 provides a summary of a detailed TCO analysis that compares a range of common archival storage technologies. In order to develop a representative model for the analysis, an actual case scenario was used. The requirement was for a 12TB archive measured over three years of operation. The archive products selected for comparison were magnetic tape (AIT-3), magnetic disk (Centera) and three optical storage technologies: DVD, MO and UDO.

The results of the analysis show that the TCO for AIT, DVD and UDO are very similar and are far lower than that of MO and Centera. Contemporary optical technologies such as DVD and UDO remain very price competitive with tape storage and are much less expensive than the Centera and MO options. The results also reveal that the annual maintenance and operating costs of RAID systems such as Centera are dramatically greater than any tape or optical library configuration. The full version of the Archival Storage TCO Analysis can be downloaded from the website: www.plasmon.co.uk.

The archival storage market is seeing rapid growth as both external compliance and internal policies drive organisations to archive more information for longer periods of time. This pressure is also forcing the medical industry to consider carefully how it will position and manage corp-orate records within a complete data lifecycle.

Archives call for a strategy that enables regulatory compliance, data authenticity, media longevity, quick random access and low TCO. A range of storage technologies can be applied to this challenge, each with its own strengths and weaknesses. While no single technology can satisfy every organisation, those products specially designed for long-term data storage, such as UDO, are best equipped to meet the needs of a professional archive.

You may also like