FSA4; Validating Failure Rates and Managing End of Useful Life
Harvey T.Dearden BSc CEng FIET FInstMC FIMechE FIChemE
InstMC Registered Functional Safety Engineer 16/003
Director
SISSuite Ltd
There is a requirement in BS EN 61508 for
‘…assessing whether the demand rates and failure rates
during operation and maintenance are in accordance with assumptions made during the design of the
system.’
(Part 1, Clause 6.2.12 c, see also Figure 8) This requirement propagates into BS EN 61511 as
‘
Discrepancies between expected behaviour and actual behaviour of the SIS shall be analysed and,
where necessary, modifications made such that the required safety is maintained.
’ (Part 1, Clause
16.2.9) Easy to say, but how may we do this on a practicable basis in a real process plant operation?
In principle it is a straightforward matter; you identify the relevant equipment populations and
monitor the failure rates. In practice, the identification of the populations, and analysis of failures
may be less than straightforward.
The point of the monitoring is to identify where a failure rate is higher than anticipated, perhaps
because the assumed intrinsic device failure rate was optimistic or because the specifics of the
equipment deployment increase the failure rate, or the equipment is entering the wear-out phase and
approaching its end of life.
Any failure of equipment within a SIL rated function should be thoroughly analysed to identify the
cause of the failure and the possible implications for other similar equipment items deployed on SIL
rated duties. It would be unrealistic however to expect the same degree of analysis to extend to
equipment not deployed on SIL rated duties, particularly the typically much larger set of equipment
deployed on control and monitoring (rather than protection) duties. Much of this equipment will
‘repaired-by-replacement’ and detailed analysis of any failure is likely to be an unrealistic ambition.
But much of the equipment deployed on control and monitoring may well be the same as that
deployed on SIL rated duties. This wider population set, in providing a broader sample, will be
potentially useful in identifying equipment failure rates.
From the perspective of integrity, it is the undetected dangerous failure rate that matters, and in
principle it would be possible to estimate and monitor the dangerous failure count. Since the split of
safe/dangerous failures in the non-SIL population is unlikely to be available we might estimate the
number of dangerous failures on the basis of the total number of failures and an estimate of the
Safe Failure Fraction (SFF). This estimate might be combined with any explicitly identified dangerous
failures in the SIL population to identify an estimate of the total number of dangerous failures within
the wider population. There are potential difficulties here however; undetected dangerous failures
are only revealed by proof tests, and some proof tests may be at extended intervals, and a given
element type might be subject to a range of test intervals. Note also that the classification as a
safe/dangerous failure may depend on the duty e.g. high/low trip function. Since the SIF population
would typically be a small subset of the broader population, there would likely be marginal
advantage over a simple total failure count. If there is a population of items exclusively dedicated to
SIF duties there may be some advantage in assessing the dangerous failure rate rather than the total
failure rate. Generally it will be much simpler to monitor a count of all failures (whether safe or
dangerous), with the working assumption that this will correlate with the dangerous failure rate. It
should be recognised that broad uncertainties are inevitable, we are not looking for marginal
refinement of failure rates here, rather we are looking for gross departures from expected
performance that should prompt investigation and possible revision of the claimed failure rates.