New-Tech Europe | Sep 2017 | Digital Edition

the simple rule of data centre

power management – actions have

consequences and consequences

require action.

The BA example demonstrates

again that power misunderstanding

is a common problem. Two-thirds of

data centre professionals in Eaton’s

research weren’t fully confident in

power, and until organisations get

to grips with power management

we can expect to see more power-

related outages. There is a profound

concern around skills availability,

that it’s hard to acquire and retain

the relevant expertise or talent,

whether it’s designing for energy

efficiency, managing consumption

on an ongoing basis, or dealing with

power-related failures quickly and

effectively to avoid and mitigate

outages.

Have you tried switching

it off and on again?

Should a full power outage occur

then it’s absolutely imperative to

have a disaster recovery process in

place that clearly defines the steps

to be taken when re-energising the

data centre, detailing which systems

must be brought back online first. In

a full outage situation where people

are in a state of panic and under

pressure to resume normal services,

staggering the re-energisation of

the systems in your data centre may

seem counter intuitive as the goal

is to get back online as quickly as

possible, but such a process helps

to avoid further extension of the

outage. The restoration of a data

centre post going black needs to be

done gently and in a clearly defined

methodical fashion, simply trying to

get everything back up in a hastily

and unplanned way will only cause

in-rush which could cause more

outages, quickly crippling the data

centre again. Power management

is all about understanding the

dependencies between the different

parts of the power system and the

IT load and having appropriate

levels of resilience in the hardware,

software and processes.

Recovering from an outage requires

patience and a systematic process

– two things that were seemingly

missing according to reports on BA’s

outage. No data centre professional

has ever asked ‘have you tried

switching it off and on again?’ The

skill is to pace oneself and follow

each step in turn, controlling and

monitoring a phased restart so

that batches of systems are only

brought online when it’s safe

to do so and one is sure of the

correct phase balancing and loads.

Skipping any steps in the rush to

get back online can create a power

surge, overloading circuits, tripping

breakers and, to put it mildly, cause

chaos.

Resilience and

infrastructure upgrades

Alongside skills andpower processes,

the facilities infrastructure itself

often needs upgrading to meet

today’s efficiency, reliability and

flexibility expectations. Around

half of respondents in Eaton’s

survey report that their core IT

infrastructure needs strengthening,

and this number is closer to two-

thirds when it comes to facilities

such as power and cooling.

Power management is increasingly

becoming a software defined activity;

given the skills gap, software can

play an important role in bridging

the divide between IT and power

by presenting power management

options in dashboard styles that are

familiar to an IT audience, making

it easier to understand and even

automating management of power

infrastructure. This could have

prevented the outage that faced BA

as the automated processes would

have brought systems back online in

a controlled and monitored fashion.

We’ve moved towards more

virtualised environments in data

centres, IT and data centre

professionals are familiar with using

virtualisation to maintain hardware,

so the question is why not use

the same principles in power? It is

important that all power distribution

designs, and associated resiliency

software tools, are compatible with

all the major virtualisation vendors

to ensure future-proofing of the

infrastructure. This approach will

enable data centre professionals

to do concurrent maintenance

to mitigate risks of infrastructure

maintenance and upgrades.

Learning lessons

While we may never fully understand

what happened within BA’s data

centre, it’s near guaranteed that it

won’t be an isolated incident across

the wider data centre industry, even

if it’s unlikely we’ll see anything on

the same scale for a long time. The

issue comes down to either poor

preparation or implementation of

disaster recovery. Better preparation

of the data centre disaster recovery

process would have seen it

designed with resilience in mind,

meaning firstly the DR site should

have kicked in to cover the demand

during the outage and, secondly,

when restarting the hardware and

applications, it should have been

done in a far more controlled

manner. This would have meant

that the reintroduction of power

to systems in a slow and phased

manner, allowed for a smooth and

steady recovery. We, as a data

centre industry, need to make sure

that we all learn lessons from BA’s

high-profile outage and take actions

to ensure that effective power

management is a ‘must have’ and

not a ‘nice to have’.

New-Tech Magazine Europe l 19