Are We Talking the Same Language? | PART III
REFLECTION ON PART II.
In Part II of this series, I focused on RCA as a system, and not simply a task. This led to discussing proactive vs. reactive RCAs, identifying and quantifying chronic failure candidates to make a business case, organizing an ideal team, and we ended on how to develop a data collection strategy to be able to prevent hearsay from flying as fact.
Now, we will complete this series with an understanding of how all the previous building blocks converge to analyze an undesirable outcome while incorporating HOP principles. This is where the pieces of the puzzle come together and tell an evidence-based story.
Click here to read Part 1 of Are We Talking the Same Language.
Click here to read Part 2 of Are We Talking the Same Language.
Part III - RCA Blue Zone - 6 Steps to Holistic RCA Conclusion
RCA BLUE ZONE WARNING:
This blog will appear lengthy, but it’s mostly graphics and tables, so its easily digestible 😊.
As a refresher, these are the topics we will discuss:
STEP 5 – EVENT RECONSTRUCTION USING A LOGIC TREE
STEP 6 – GERMINATION OF A ‘FAILURE’
STEP 7 - APPLYING THE PRINCIPLES OF HOP TO AN EFFECTIVE RCA SYSTEM
As discussed in Part I of this series, in my opinion the term ‘RCA’ is useless in today’s market (and I’m in the RCA business). There is no universally accepted definition, so all that does is lend itself to confusion. No matter what people are using to solve problems, they will call it ‘RCA’. This ranges from the 5-Whys, to Brainstorming, to Logic Trees, to handwriting on a bar napkin. It’s impossible to compare the effectiveness of such approaches because they range from trial-and-error to very disciplined, evidence-based approaches.
This far we have been discussing Reliability and RCA as holistic systems, representing a methodological approach to analyzing undesirable outcome. As we get into this next section, we will be discussing tools to execute the methods described.
Here we will summarize the pros and cons of the two most common ‘RCA’ tools.
STEP 5 – EVENT RECONSTRUCTION TOOLS
We will start off with a summary of the most common RCA tools. This will provide some familiarity with the differences in RCA tool capabilities. By far, in my experience, 5-Whys is the most popular RCA tool. Its origins are from Toyota Motor Corporation, and its original intent was to help assembly line workers drill down a few levels deeper than the obvious when looking at undesirable outcomes in their work areas.
EXPLORING THE 5-WHYS
Here is an example of a 5-Whys analysis using a customer complaint case history (See Figure 1). We will first apply the 5-Why tool, then we will apply the Logic Tree to the same case and compare the differences in outcomes. This RCA approach saved the company more than $2M/year globally via a 50% reduction in customer complaint over a 2-year period (and that is a conservative number).
Summary of 5-Whys Tool: The "5 Whys" technique is a problem-solving method aimed at uncovering the underlying reasons for an issue by repeatedly asking "why" questions.
The concept is rather simple. The first block is the ‘Event’ or what the problem is. After that, we simply ask ‘Why’ that occurred, five times sequentially, and then we have uncovered our proverbial ‘Root Cause’. No doubt the traditional use of the 5-Why’s is linear. If one considers the 5-Whys to be a valid RCA approach, then this gives credence to the perception that all RCA being linear. However, I am not one of those that believes the 5-Whys is a valid ‘RCA’ approach, given the way I will express it in this blog. The traditional 5-Whys approach does not possess the technical capability to be effective on serious events with significant outcomes.
Summary of Undesirable Outcome/Customer Complaint: A specialty plastics facility experienced nine customer complaints about small black particles in an Eastman material. The product, used in consumer product applications including radios, telephones, toothbrushes and toys, was sold in the form of small plastic pellets.
I laid the 5-Whys expression down horizontally only to save real estate within this paper. Most prefer it top to bottom, but the orientation has no impact on the flow of the content.
EXPLORING THE LOGIC TREE
Now let’s explore the use of a Logic Tree on the same case. First let’s briefly describe what a Logic Tree is, and its pros and cons.
Summary of the Logic Tree Tool: Logic tree development starts with the known facts about the event (top block) and then works back to identify the cause-and-effect relationships that lead to the occurrence.
Each level, top to bottom, in a logic tree depicts a cause-and-effect relationship by asking, “how can?” The customer complaint event described earlier in this article utilized the experience of site RCA team members trained in this root cause analysis approach, to provide the expertise to answer the questions (from hypotheses). They were the Subject Matter Experts (SME) in this case, as should be in all formal RCAs.
Once the hypotheses are developed, each must be supported by hard data, not hearsay. Using this approach, you drill down each leg until you find decisions made that were based on flawed organizational systems (which we will call latent, or systems roots causes), cultural norms and/or socio-technical factors).
Here are some pros and cons of using a Logic Tree expression.
Of course, looking at this listing, many of the ‘cons’ simply represent things that add discipline to properly executing the RCA, but they also require leadership support. So, let’s see what difference/value all this discipline adds, when compared to a simpler tool like the 5-Whys.
A logic tree is a reconstruction tool, so the evidence is what drives the direction when we drill down. As we use the tree to explore, we initially ask the question ‘How Could?’. In this case ‘How could black specs get into the solvent shipment?’ This shift in questioning is significant. If we ask ‘How could a crime occur’ versus ‘Why a crime occurs’, the answers are very different.
In this case, our expert team members hypothesize the only way that can happen is if the specs are coming from 1) storage, 2) tank trucks, 3) loading, and/or the manufacturing process. So, these possibilities have to be proven true or not true with hard evidence and not hearsay. In the Figure 2 expression, all are proven to be true, except for the Loading area. This means that we must continue to drill down on what is known to be true.
Let’s start with the ‘Storage’ leg of the tree and continue drilling down.
As we continue our ‘how can’ questioning, eventually we will come to a human decision-maker (orange nodes). This is when our questioning shifts to ‘Why did the decision make sense at the time?’. This is where we learn the reasoning people used to make their decisions, and the systems that impacted them.
In this case, even though standards for the cartridges had changed, this was not communicated to purchasing. So, the system-level flaw is that a Management of Change (MOC) process was not in place to catch this crack in the system. We also find the cartridges were not being changed at proper intervals because there was no PM in place requiring that task to be completed.
Continuing with our questioning convention, we find that decisions were made to not purchase these filters in the manufacturing process because of their costs. This decision contributed to specs entering the product via the manufacturing process.
We now wrap up with the specs entering the product from the tank trucks that transport the product. In this case:
There was no check step by the truck cleaning company requiring such a cleaning.
There were planning and scheduling inefficiencies that overwhelmed the inspectors, so they had to rush their inspections due to the backlog.
There was a recent reduction in the number of inspectors due to cost-cutting.
Just note, that behind each of these nodes on the tree, there are verification methods, verification outcomes, person responsible, due date, and completion date. Now the systems (or latent) deficiencies have been identified, solutions can be developed to mitigate or eliminate their potential consequences.
COMPARING RESULTS FROM 5-WHYS AND LOGIC TREE APPROACHES
As we can tell, the discipline of the Logic Tree yields a much more comprehensive outcome than the 5-Whys. It is not fair to compare the 5-Whys to the Logic Tree as equals, because they have differing technical capabilities and were developed for different reasons. They are both useful tools, but we have to know when to use certain tools in our toolbox, depending on the projects we are working on.
STEP 6 - GERMINATION OF A FAILURE
Now that we’ve seen a logic tree in practice, let’s delve deeper into understanding formation of the graphical expression.
Figure 6 is a conceptual view of a Logic Tree, stripped of its technical content, to show its conceptual flow and purpose. Here are some summary descriptions of the terms used.
The long and short of it is that systems influence our decisions, and our systems are imperfect. Depending on our organizations, there will be varying degrees of ‘imperfect’. The more imperfect our systems, the greater the risk of undesirable outcomes.
Figure 7 is the same as Figure 6, except with an overlay in the shape of an hourglass. I call this my RCA ‘hourglass approach’. This makes it easier to digest. Everything above the decision (H=Human) in the yellow zone, is an observable consequence of the decision. This zone typically deals with the physics of failure. This is what traditional RCA or RCFA (Roto Cause Failure Analysis) is known for.
Everything below the decision, in the blue zone, deals with the social sciences of reasoning, sense-making, and the like. The reality is that Engineers thrive in the yellow zone as their education was founded on the physical sciences. However, Engineers do not typically fare well in the blue zone, dealing with the ‘soft issues’ (at all). Most are out of their element.
On the flip side, it is the same. Those well-versed in the social sciences typically do not excel in the physical sciences!!
FINISHING THE RCA PUZZLE
As promised at the beginning of PART II, I wanted to now put the pieces of my RCA puzzle together, so we can look at the flow of the entire process we have discussed (see Figure 8).
Step 7 - APPLYING THE PRINCIPLES OF HOP TO AN EFFECTIVE RCA SYSTEM
As we did in Part I of this series, we will again address how the HOP principles align with ‘RCA’ as described in Parts II and III of this series. They all apply and are critical to this overall Reliability and RCA infrastructure. Let’s bring back Todd Conklin’s 5 HOP Principles and review them.
SERIES CONCLUSION
This brings me full circle to why I was intrigued by the CHOLearning Community. While the blue zone from my hourglass analogy (social sciences) has been a critical part of our Human Reliability and RCA processes since the 70’s, I admittedly could learn much more about this zone from this Community…and I have! My interviewing techniques have been improved considerably due to learning from my colleagues in the CHOLearning Community.
I hope that my bringing this Reliability perspective to my CHOLearning colleagues allows them to see where folks like me from heavy manufacturing, are coming from. The ‘H’ (Human) in the middle is what we all have in common and have unity in purpose. We can certainly help each other out in better understanding the yellow and blue zones of Figure 7.
I know many will feel this was not really a blog, but rather a run-on white paper. For a while I have been wanting to express the Reliability Engineering perspective for this Community, and this was an opportunity for me to do so. Thanks to the CHOLearning Board for allowing me to do so, and more importantly, to YOU for having the patience to listen and the open-mindedness to explore differing perspectives.
Do you want to learn more from Bob on this topic? Enjoy this recording of our March 2024 Webinar.
Comments