Functional Safety 2016

FSA4; Validating Failure Rates and Managing End of Useful Life

Harvey T.Dearden BSc CEng FIET FInstMC FIMechE FIChemE

InstMC Registered Functional Safety Engineer 16/003

Director

SISSuite Ltd

There is a requirement in BS EN 61508 for

‘…assessing whether the demand rates and failure rates

during operation and maintenance are in accordance with assumptions made during the design of the

system.’

(Part 1, Clause 6.2.12 c, see also Figure 8) This requirement propagates into BS EN 61511 as

‘

Discrepancies between expected behaviour and actual behaviour of the SIS shall be analysed and,

where necessary, modifications made such that the required safety is maintained.

’ (Part 1, Clause

16.2.9) Easy to say, but how may we do this on a practicable basis in a real process plant operation?

In principle it is a straightforward matter; you identify the relevant equipment populations and

monitor the failure rates. In practice, the identification of the populations, and analysis of failures

may be less than straightforward.

The point of the monitoring is to identify where a failure rate is higher than anticipated, perhaps

because the assumed intrinsic device failure rate was optimistic or because the specifics of the

equipment deployment increase the failure rate, or the equipment is entering the wear-out phase and

approaching its end of life.

Any failure of equipment within a SIL rated function should be thoroughly analysed to identify the

cause of the failure and the possible implications for other similar equipment items deployed on SIL

rated duties. It would be unrealistic however to expect the same degree of analysis to extend to

equipment not deployed on SIL rated duties, particularly the typically much larger set of equipment

deployed on control and monitoring (rather than protection) duties. Much of this equipment will

‘repaired-by-replacement’ and detailed analysis of any failure is likely to be an unrealistic ambition.

But much of the equipment deployed on control and monitoring may well be the same as that

deployed on SIL rated duties. This wider population set, in providing a broader sample, will be

potentially useful in identifying equipment failure rates.

From the perspective of integrity, it is the undetected dangerous failure rate that matters, and in

principle it would be possible to estimate and monitor the dangerous failure count. Since the split of

safe/dangerous failures in the non-SIL population is unlikely to be available we might estimate the

number of dangerous failures on the basis of the total number of failures and an estimate of the

Safe Failure Fraction (SFF). This estimate might be combined with any explicitly identified dangerous

failures in the SIL population to identify an estimate of the total number of dangerous failures within

the wider population. There are potential difficulties here however; undetected dangerous failures

are only revealed by proof tests, and some proof tests may be at extended intervals, and a given

element type might be subject to a range of test intervals. Note also that the classification as a

safe/dangerous failure may depend on the duty e.g. high/low trip function. Since the SIF population

would typically be a small subset of the broader population, there would likely be marginal

advantage over a simple total failure count. If there is a population of items exclusively dedicated to

SIF duties there may be some advantage in assessing the dangerous failure rate rather than the total

failure rate. Generally it will be much simpler to monitor a count of all failures (whether safe or

dangerous), with the working assumption that this will correlate with the dangerous failure rate. It

should be recognised that broad uncertainties are inevitable, we are not looking for marginal

refinement of failure rates here, rather we are looking for gross departures from expected

performance that should prompt investigation and possible revision of the claimed failure rates.