Piql White Paper - Magnetic Tape Technology in a Multi-node Archival Storage System

The Use of Magnetic Tape Technology in a Multi-node Archival Storage System Whitepaper

2

Magnetic tape technology in a multi-node archival storage system Whitepaper

This paper explores storage media selection for multi-node storage systems used for the long-term archiving and preservation of digital content. Digital preservation, or the steps taken to make digital content accessible, trustworthy and usable into the future is crucial if we are to retain the ever-growing body of digital knowledge and cultural heritage that it is accumulating, for future generations. All kinds of corporations, governments and institutions are now retaining every type of digital record including images, video, documents, records, research data, social media etc. Once digital content is lost, it is gone forever; and it is generally unknown at the time of archiving what of the broad collection of content will have the most value in the future so the typical answer to the difficult questions of assessment are to save as much as possible and to protect that content against loss, corruption and external attack.

The Digital Preservation Coalition’s Digital Preservation Handbook provides guidance on the principals that should be employed when designing or selecting storage systems for preservation 1 :

PRINCIPLES FOR USING IT STORAGE SYSTEMS FOR DIGITAL PRESERVATION 1. Redundancy and diversity

• Make multiple independent copies of digital material and store these in different geographic locations. • Use a combination of online storage systems and offline media. • Use different types of storage technology to spread risk and achieve a balance of data safety, easy access and manageable cost. • Use fixity measures such as checksums to record and regularly monitor the integrity of each copy of the digital material. • If corruption or loss is detected, then use one of the other copies to create a replacement. • Store fixity information alongside the digital materials and also in separate databases or systems.

2. Fixity, monitoring, repair

3. Technology and vendor watch, risk assessment, and proactive

• Understand that storage technologies, products and services all

have a short lifetime.

migrations

• Use technology watch to assess when migrations might be needed. • Keep an eye on the viability of storage vendors or classes of storage solution. • Be proactive in migrating storage before digital material be comes at risk.

4. Consolidation, simplicity, documentation, provenance and

• Minimise the proliferation of legacy media types and storage

systems used for preservation.

audit trails

• Consolidate digital materials onto the minimum number of preservation storage systems (subject to the redundancy requirements above). • Document how digital materials have been acquired and transferred into the storage systems as well as how the storage systems are set-up and operated. • Use this to provide audit information on data authenticity.

1 Digital Preservation Handbook, 2nd Edition, http://handbook.dpconline.org/ Digital Preservation Coalition © 2015 licensed under the Open Government Licence v3.0.

3

The DPC do not provide any guidance on what IT storage media types (e.g. tape, disk RAID arrays, disk object storage) to use in a multi-node architecture. It is necessary then to make one’s own decisions based on the principals and on the information available. Any consideration will of course also consider factors such as: total cost, reliability, space and environmen- tal impact. Tape technology is used extensively within digital preservation systems for reason of its exceptional long-term cost/GB stored, footprint, power efficien- cy, reliability and portability. Where tape performs less well against disk based storage systems is on time to data (latency), so for many archival systems where access times a key, the first or primary node of the architecture will be disk based. Tape technology will continue to be an essential ele- ment of many multi-node digital preservation archi- tectures; in providing a low cost, reliable secondary node. However, a more heterogeneous set of charac- teristics may be advantageous in the overall selection of storage technologies for long-term digital preser- vation and in particular for the crucial final node.

This paper considers the current suitability of tapetechnologyasoneormorenodesof a multi-node preservation storage system. It specifically considers three areas which relate to the principals and factors described above:

1. Tape technology vendors and technologies

2. Tape technology reliability and the environmental factors that affect this 3. The effective lifetime of tape media

Magnetic tape technology in a multi-node archival storage system Whitepaper 4

TAPE TECHNOLOGIES AND SUPPLIERS

The long-term decline of the digital tape market for secondary storage has been countered over the last decade by a steady increase in the use of tape for archiving due to its low environmental footprint, high level of data integrity, and low cost per GB. Linear Tape Open (LTO) tape technology dominates a market in which today in reality there are only two players: • IBM who manufacture all LTO-8 (current generation) tape drives and their own proprietary ‘Enterprise’ TS1150 and TS1155 tape drives • Oracle who manufacture their own proprietary “Enterprise” T10000 series tape drives. Over the past 30 years, there have been numerous digital tape formats. Nearly all have now been re- tired, leaving three formats: Linear Tape Open (LTO), which is an open standard; IBM TS11xx and Oracle T10000 proprietary formats, which are also market- ed as “Enterprise” tape drives. In the past, differences in performance, capacity, and reliability earned the proprietary formats this name but today it can be ar- gued this is no longer true. Since the initial release of LTO-1 in 2000, year after year LTO drives and media have continued to dominate the tape market due to the efforts of the LTO Consortium (Quantum, IBM, and Hewlett Packard Enterprise). For the past several years, the LTO format accounts for over 96% of all tape cartridge shipments. This market dominance has made it more difficult each year for IBM and Oracle to continue to develop and deliver next-generation formats and technology for their proprietary drives.

Rumour and speculation abound about the long- term viability of Oracle’s tape business, their cessa- tion of development on the T10000 drive series at the T10000D and scaling down of their own resourc- es. It is a widely held opinion that IBM will be the sole manufacturer of enterprise-class tape drives and media in the years to come and in fact IBM has been the sole manufacturer of the open LTO format for several years. IBM supply all LTO-8 drives to other tape library manufacturers (Oracle, SpectraLogic) and members of the LTO consortium itself (HPE, Quan- tum). There are currently two manufacturers of current generation LTO-8 media: FUJIFILM and Sony. Media for the proprietary drives from IBM and Oracle is sin- gle sourced.

5

The long-term roadmap for LTO is very healthy 2 :

LTO ULTRIUM ROADMAP

Native

Compressed

Up to 192 TB

480 TB

GEN 12

240 TB

Up to 96 TB

GEN 11

Up to 48 TB

120 TB

GEN 10

480 TB

GEN 9

Up to 24 TB

30 TB

GEN 8

12 TB

15 TB

GEN 7

6 TB

6,25 TB 2,5 TB

GEN 6

3 TB 1,5 TB

GEN 5

PARTIONING ENABLING LTFS | ENCRYPTION | WORM Note: Compressed capacity for Generation 5 assumes 2:1 compression. Compressed capacities for generation 6-12 assumes 2.5:1 compression (achieved with larger compression history buffer). Source: The LTO Program. The LTO Ultrium roadmap is subject to change without notice and represents goals and objectives only. Linear Tape- Open, LTO, the LTO logo, Ultrium, and the Ultrium logo are registered trademarks of Hewlett Packard Enterprise, IBM and Quantum in the US and other countries. Fortuna Data - https://www.fortunadata.com

And we can expect that LTO tape will continue to provide excellent and ever improving cost/GB, reliability and performance for digital archives. What we can’t expect is that tape will be able to provide us with diversity in its supply and technology base.

2 http://www.ltoultrium.com/lto-ultrium-roadmap/

Magnetic tape technology in a multi-node archival storage system Whitepaper 6

TAPE TECHNOLOGIES RELIABILITY

TWO SIGNIFICANT OBSERVATION COME OUT OF THE CERN REPORT:

The reliability of tape technology as measured by the uncorrectable bit error rate (UBER) is su- perior to current disk technologies and con- tinues to improve generation by generation. The UBER of LTO-8 is now stated as 1 in 10-19, an order better than IBM’s TS1155 at 1 in 10-18 and equiv- alent to the current Oracle T10000D. Bit error rates are not the whole story as disk sys- tems provide much greater overall levels of data in- tegrity through technologies such as RAID (Redun- dant Array of Inexpensive Disks) or erasure coding. Despite attempts, RAIT (Redundant Array of Inex- pensive Tapes) has never been widely employed so the reliability of tape technology is generally the native reliability of the technology. This should be more than adequate, as with a reliability figure of 1 in 10-19 a drive would need to run for 130 years at 300MB/sec to encounter an error event. This reli- ability specification however is theoretical and as- sumes a perfect working environment, maintenance of tape drives and media and use within specified duty cycles. A paper from CERN describes how their tape-based archive system having collected over 70 Petabytes of data during the first run of the Large Hadron Collider (LHC) experiment was planned to be shut down and the period used for migrating the com- plete data archive to higher-density tape media. The CERN physics archive comprised over 50 000 tape cartridges of 4 different types from 2 generations of tape drives: 5TB and 1TB cartridges from Oracle, and 4TB and 1TB cartridges from IBM. As new drive generations were arriving on the market and 1TB media was becoming obsolete, the shutdown pe- riod offered an opportunity to migrate or re-pack tape media. Re-packing is a function of “Enterprise” tape technologies whereby older tape media can be reformatted in newer generation drives to achieve higher storage capacities.

1. During the repack exercise, 13 tape cartridges were identified on which a significant portion of the data could not be read due to environmental contamina- tion. Considering that CERN use the highest quality tape media and libraries and maintain high standards in their datacentres this must be a significant risk to any tape archive. CERN subsequently put in place environmental quality measuring devices and today actively monitor a number of factors including air particulates, temperature and humidity. 2. An observed data integrity figure of 1 in 10-16 an- nually. This had been 1 in 10-14 in 2009 and had been improved upon by a comprehensive series of im- provements in operational and maintenance proce- dures. A UBER of 1 in 10-14 relates to a data loss rate of around 90,000 in 100 milllion files written annu- ally, or near to 0.1%. This is quite a different picture to the theoretical UBER figures and improvement as can be seen by reading the paper required consider- able effort. Tape is not infallible, but its positive characteristics outweigh the risks if these are mitigated. CERN have no choice for reasons of cost to keep more than one copy of most of their data. As such they must mitigate risk by whatever procedural methods they can in order to achieve the maximum native reliabil- ity from their tape infrastructure. It is beyond most organisations to take the precautions described, but most will follow a multi-node storage architecture in order to mitigate the risk. It would seem prudent then to also follow the multi-technology principal by employing different technologies in each storage node.

7

TAPE MEDIA LIFETIME

Tape media for most tape technologies has a simi- lar published theoretical lifetime of around 30 years within the defined constraints of: environment, duty cycle (reads and writes) and cartridge re-packing cy- cles (a regular tape drive mount unpack and pack). It is worth however considering the long-term avail- ability of the drives to read the tape cartridges. The LTO roadmap shows that a new generation of technology is released roughly on a 2 to 3 year cycle. LTO, up to LTO-7 provided read compatibility back 2 generations, so LTO-7 could read LTO-5 cartridges. In order to do a media migration a user will need to have tape drives available (and preferably under sup- port) that can read the oldest media in the archive. Our own experience shows that manufacturers only provide concurrent support for two generations of tape drive; it is their interest to get customers to buy the latest drives and media and hence to mi- grate off of old technology. This paper has examined the current state of the digital tape technology market, suppliers and technologies. It has specifically looked at the use of tape within a multi-node archival storage or digital preservation architecture. It has consid- ered issues of single supply, reliability and media lifetime as factors relevant in making a choice of tape for one or multiple nodes within a storage architecture. Principals and guidelines for building a digital preservation storage system suggest that it should have at least three nodes, in three different loca- tions and employing three different technologies. The use of flash storage for archiving is currently CONCLUSIONS

With two generations of backwards read compati- bility, the maximum period from first availability of media to retirement of the compatible drive tech- nology would be 12 years. However, most users will want to start their migration earlier, so effective life is more realistically 9 years. For LTO-8 with only a single generation backwards read capability this has been reduced to 6 years.

It is also interesting to note that CERN keep their tape cartridges for 6-8 years.

It is prudent in a multi-node archival storage archi- tecture to consider using different tape generations at different nodes. A more robust strategy would be to consider, if available, a non-tape-based strategy for the final or tertiary storage node.

cost prohibitive so the choice appears to be be- tween variations of disk technology and digital tape, both of which suffer from a lack of broad diversity in supply and technology. Both technol- ogies require environmental care, frequent media migration and have inherent reliability risks. Ideally a multi-node digital archive could utilise the advantages of disk and tape-based storage technologies and have access to a ‘third node technology’ that should employ a fundamentally different technology base, have a long life-cycle, be robust and require little maintenance.

Rev. 01-20 | Piql AS © 2020 | All rights reserved.

Made with FlippingBook flipbook maker