A safety instrumented system (SIS) is intended to reduce the risk of a harmful incident. This is achieved using a combination of hardware and software controls implemented on every unit in operation. A layered approach to protection is usually followed. Examples of instrumented controls include hardwired trip systems, interlocks and alarms.
Minimising the risk of failure
Containing the residual risk requires each of the control measures to be effective. During the design phase, teams of engineers and subject matter experts will perform a systematic analysis of the process to identify each possible hazard and then identify what controls need to be in place. The HAZOP is an example of such a technique.
Whichever method is used, it is worth remembering that the SIS itself can fail. We need to eliminate, as far as possible, the risk of underlying process failure coinciding with SIS failure, thereby leading to an incident. There are techniques for quantifying the reliability of SIS systems so that the real risk is adequately understood and mitigated. One example is the Safety Integrity Level (SIL) analysis.
Engineers tend to focus on physical equipment and not people
As instrument and automation engineers, we are trained to be comfortable with physical systems – but less so with systems involving people. When we review the causes of a significant incident, it is tempting to point to a hardware device as the underlying root cause of the failure. We tend to gloss over the importance of humans in the sequence of events that led up to such failure.
The consequences of people getting it wrong
In March 2005, the BP Texas City Refinery experienced a significant safety incident that resulted in 15 fatalities and 180 injuries, after a “geyser of flammable hydrocarbon liquid and vapour erupted from a blowdown stack, creating a huge fire”. Inexperienced operators had continued pumping flammable feedstock into the raffinate tower.
During the engineering design, the HAZOP and LOP (layer of protection) analysis should have picked up the scenario where liquid could be pumped for an extended period into a unit in operation without observing a rise in levels. Whether or not this possibility had been identified, the systems must have failed because, at the time, no alarm alerted the operators of what was happening, and the pump did not trip.
The investigation report made a very insightful observation. It noted that it is easy to identify the physical device that failed and that subsequently led to the incident. Investigators are prone to locate the person most closely associated with the failure of that device, be they operators, maintenance personnel, managers or others. The investigation often recommends a simple technical solution: fix the device, add some more SIL hardware and all will be well.
In the BP Refinery incident, the investigation concluded that there were more underlying problems than just the physical safety integrity system. The issues also lay with poor training and inexperienced people. This, combined with poorly maintained and deteriorating equipment, led to a high-risk situation that was an accident waiting to happen. In addition, while the plant’s deteriorating condition was understood to be a risk, fixing this would have required an extended shutdown, resulting in significant shareholder pain. The record will show that the shutdown did not happen in time.
Is it time to share our lessons learned between IT and OT?
IT managers and CIOs are all too familiar with system failure. Some would argue that this is due to a lack of proper methodology and discipline. However, as with industrial operations, IT projects rarely fail owing only to a technical issue. IT projects are particularly challenging because people need to change how they do things to take advantage of the system.
It occurred to me that this hard-earned experience from the world of IT can also be applied in the operations environment. With the convergence of IT and OT, best practices from the respective disciplines can be shared in ways that previously might not have been obvious.
Poor training and inexperience are disastrous in the world of IT projects – even more so when operating a hazardous refinery. Is it not time to get our heads together and come up with a more holistic solution that incorporates both the physical and engineering aspects, as well as the people factors, to keep our plants running safely and reliably?
About Gavin Halse
Gavin Halse is a chemical process engineer who has been involved in the manufacturing sector since mid-1980. He founded a software business in 1999 which grew to develop specialised applications for mining, energy and process manufacturing in several countries. Gavin is most interested in the effective use of IT in industrial environments and now consults part time to manufacturing and software companies around the effective use of IT to achieve business results.
© Technews Publishing (Pty) Ltd | All Rights Reserved