How root cause analysis helps prevent future problems
- Root cause analysis starts with defining a clear scope.
- After the root causes are clear – decide on effective fixes.
- Consider risk when deciding what kind of resources you want to commit to fixing a problem.
- If the likelihood of a recurrence is high or the consequences are significant, a third-party firm can give you confidence that the real problem has been identified and effectively resolved, and provide any regulatory authorities involved the assurance that the investigation and analysis were objective.
It was a relatively straightforward redesign of some piping to improve the functionality of a metering and pig launching station. The project included designing a pressure equalization runaround – a one-inch line to divert product around a 16-inch valve to allow refilling of the pig trap. We developed the design to meet regulatory requirements, client standards and operations preferences, and provided the drawings to the client’s construction team. We supported the client during construction, helping to resolve a couple of issues and answering several questions along the way. But the runaround posed no problems.
Or so we thought. Months later, a pipe nipple in the runaround failed catastrophically, releasing product to the ground and requiring an immediate and extensive commitment of resources to the cleanup effort.
We were called back in to find out what had gone wrong and to help our client understand how to make sure this would not happen again here or anywhere else – i.e. to perform a root cause analysis and recommend corrective actions to prevent recurrence.
We quickly noticed that the runaround had not been built according to the design we’d provided. First, we’d specified Schedule 160 threaded pipe as per the client’s design standard – but less-robust Schedule 80 had been used instead. Second, our design for the runaround had kept the maximum unsupported length of the one-inch pipe below the upper limit in the standard, but the installed pipe followed a different routing that resulted in a longer unsupported span. Third, the team found evidence that the failed nipple had been over-tightened, possibly initiating fractures in the threads. Lab analysis found indications that the failure resulted from reverse bending fatigue due to vibration or pressure cycling.
So what was the cause of the failure? At the direct level, it was a combination of the overtightening of the threaded joint that initiated cracking, the unsupported length of the piping that allowed stress cycling, and the pipe wall thickness that was insufficient to prevent the crack(s) from growing. But what caused those problems to occur? Was it that the approved design hadn’t been followed? Or deeper yet, was it that the client’s internal management of change process wasn’t used when the design wasn’t followed?
Sometimes it can be hard to know when to stop digging – how many layers to peel back – before concluding that the core or fundamental causes of the problem have been found. However, this is the point you need to reach to truly understand what steps can be taken to reduce the chances of the problem recurring. In this post, we’ll dig into root cause analysis (RCA), which can be defined as the process of discovering the root causes of problems in order to identify appropriate solutions.
Our firm’s work in RCA generally occurs in two kinds of situations – either an actual incident such as the broken runaround pipe and resulting release, or in a near miss, in which an incident almost occurred but was prevented – barely. RCA works on the assumption that the problem or near-miss was preventable, that something went wrong, and that corrective action can set it to rights.
Here’s how we do it.
Define a clear scope for the root cause analysis
We first work with the client to achieve clarity around what they want us to investigate, what resources they are willing to commit, any boundaries for the corrective actions to be considered, what they expect us to deliver, and how long it will take.
We also secure a commitment from the client that that our work can be performed objectively. While collecting and analyzing input from stakeholders is a critical part of the process, we need to be free to draw truth-based conclusions that are not influenced by intracompany or external politics.
Take the investigation as deep – and as wide – as necessary
When applicable, we determine the extent to which conditions are ripe within the system for similar failures, so that quick action and be taken to prevent further incidents.
The objective from there is to reveal the root cause of the problem, which, by definition, involves going below the surface. Don’t be too quick to jump to conclusions. In the runaround story, the early assumption was that the use of Schedule 80 pipe was the root cause, but after further investigation, the evidence showed multiple contributing factors. An investigation that begins with an incident in the field nearly always leads to the office before the roots are exposed. The impact of organizational factors, including management systems, processes, and procedures, as well as human performance issues, must be considered.
Using the multiple tools listed below to dig deeper and wider helps us uncover the branching or layered reasons behind the problem, such as why the approved design wasn’t followed, or why the as-built markups failed to capture the discrepancies.
- Events and Causal Factors Analysis: A timeline-based assessment tool that uses logical dependencies to narrate and map the causes that contributed to a problem
- Fault Tree Analysis: Uses logic gates to depict how systems can fail
- Change Analysis: Compares the non-failure base case with the failure case to understand how changes may have contributed to the failure
- Hazard-Barrier-Target Analysis: Understanding what barriers, physical, procedural, or otherwise, stand between a hazard and a sensitive receptor, and which barriers failed and why
- Management Oversight and Risk Tree (MORT) Analysis: A comprehensive but scalable tool that interrogates what happened against an ideal safety management system
- Why Staircase Analysis: A structured process that starts with the problem event and asks “Why?” multiple times at subsequently deeper organizational levels to work back to the root cause
We know we’re at the root when: - The incident would not have occurred if this cause were not present,
- The cause is not the symptom of any other cause, and
- Fixing the cause will prevent the same problem from happening again.
After the root causes are clear – decide on effective fixes
Rushing into corrective actions is a mistake. Misguided early perceptions of what went wrong can lead to actions that don’t fix the real problems, waste money and time, under- or overreact to the real causes, and leave you vulnerable to recurrence. We don’t move to corrective actions until there is clarity and agreement on the root causes.
As we develop a set of corrective actions to propose to the client, we know we’re ready to share them when we have directly and fully addressed all the root and contributing causes, and the actions have been developed at the highest level practicable on the following list:
(6) Engineering design for minimum hazard
(5) Safety devices
(4) Safety alarms or warnings
(3) Procedures
(2) Training and awareness
(1) Management acceptance of risk without corrective action
Without implementation, even the most excellent set of recommended corrective actions is 0% effective. We work with our clients to develop an implementation plan for the actions using the SMARTER acrostic: specific, measurable, achievable, relevant, time-bound, effective, and reviewed, and ensure that the actions are assigned to responsible individuals. We can also help provide accountability by checking in at predetermined intervals to verify things are on track or help get them back on track.
Getting the help you need
Consider risk when deciding what kind of resources you want to commit to fixing a problem. If the probability of a repeat failure is low and the consequences are minor, then internal analysis is likely adequate. But if the likelihood of a recurrence is high or the consequences are significant, a third-party firm like HT Engineering can give you confidence that the real problem has been identified and effectively resolved, and provide any regulatory authorities involved the assurance that the investigation and analysis were objective.
Our training and history of analyses with multiple operators and segments of the industry have given us a strong understanding of the context and best-practice methodologies for RCA.
Over 50 years of multi-faceted experience with oil and gas companies and their pipelines and facilities means we have a deep understanding of the kind of things that can go wrong in their design, engineering, construction, operation, and maintenance, and what it takes to correct them.
With HT Engineering’s assistance over the decades, our clients have found consistent success in correctly identifying and effectively addressing the causes of their incidents. In service of the welfare of the public and our industry’s shared goal of zero incidents, we would be honored to discuss any incidents or significant near misses that you are considering securing the help of a third party to investigate.