Are We Talking the Same Language?

Feb 2, 20247 min read

Part 1

As we embark on the 30th Anniversary of the CHOLearning conference (formerly HPRCT) and its theme of IMPACT, I reflect on my journey when I first found out about HPRCT (now CHOLearning). I’ve been so impacted, that when I had the opportunity to join their Board of Directors, I accepted without hesitation. Why was I so confident? Please allow me to tell you, my story.

MY HISTORY WITH CHOLEARNING (FORMERLY HPRCT) AND RELIABILITY ENGINEERING

I had come into HPRCT from the outside world, at the invitation of my long-time friend Bob Nelms. Bob and I came from a field called Reliability Engineering (RE) which my father (Charles J. Latino) played a major role in establishing, in the late 60’s. In 1972, Charles founded and directed the first corporate Reliability Engineering R&D Center for a $16B international chemical company, called Allied Chemical Corporation (today they are more commonly known as Honeywell). His team’s research focused on:

Equipment Reliability
Process Reliability, &
Human Reliability
Maintainability

As a result of his team’s research, they established 3 key principles of Reliability:

Table 1: 3 Key Principles of Reliability Engineering Systems

I mention the above facts because I literally grew up in a Reliability household. In my father’s eyes, Reliability was not a job…it was a way of life!! Some of my chores as a child included doing Preventive Maintenance (PM) tasks on, and around, our house. This would include tasks like changing the A/C filters, changing the batteries in the smoke alarms, changing the oil in the lawn mower, inspecting for loose electrical outlets, and other easy things a kid could do.

I learned that everything about ‘Reliability Engineering’ deals with PROACTION. I was always trained to seek out hazards and risks, to mitigate or eliminate their potential consequences. Over my career, this early indoctrination into Reliability became both an asset, as well as a curse.

I say a curse, because when I would foresee failures occurring due to observable signals (i.e. – people taking short-cuts due to time pressures, using the wrong parts because the right parts were not available, using outdated tools because new technologies weren’t available, following obsolete procedures, etc.), I just couldn’t (and still don’t) understand why everyone wasn’t as frustrated as I was, about acting on these failure precursors. Why are we not paying more attention to the potential risks, rather than focusing on reacting better and faster?

The focus inevitably was always on fixing the failure faster, AFTER it happened. That was sacrilegious to me. But I soon learned I was in the small minority and the task ahead, involving educating people about proaction, was going to be much more challenging than I thought (and it has been!!).

WHEN I FIRST LEARNED WE WEREN'T SPEAKING THE SAME LANGUAGE

I remember coming to my first CHOLearning conference (2016 I think), and what struck me right off the bat was the key differences in dictionaries. I recall listening to Todd Conklin, whom I’d never heard of before, and his consistent use of the term HOP and HPI in his presentation. I had to lean over and ask Bob Nelms what those terms meant, because in my circles, I’d never heard of them.

Figure 1: Todd Conklin and Sidney Dekker Presenting at the 2017 HPRCT Conference

I also picked up on the fact that most of the attendees had a nuke or power generation background, primarily in Safety. So, I got to thinking, Am I in the right place…is this a Safety conference? Are their principles applicable to my field of Reliability Engineering?

Then it was also apparent to me, that what ‘Reliability’ means to me, was not what it meant to this Community. So, there was another big communication gap that I had to learn about, but I was a guest in this Community, so that is to be expected.

Bob Nelms invited me because he was seeking assistance to bring in more Reliability Engineering and Maintenance personnel from my world into the now referred to, CHOLearning community. I also spent over 20 years doing RCA in healthcare, so I wanted to help get patient safety professionals exposed to CHOLearning.

Over the past 8 years, I’ve learned quite a bit about HOP, HPI, HRO and Resilience Engineering, and made countless new friends…however, I still cannot delineate where one of these approaches ends, and where the other begins. To me, there appears to be a great deal of overlap in these principles, as well as between the Reliability principles I noted above. The labels certainly can be confusing. Does this boil down to trying to speak the same language?

CLEARING THE FOG: SEPARATING PRINCIPLES FROM LABELS/ACRONYMS

Oftentimes acronyms can be deceptive because they are interpreted differently by the interpreters. This can result in miscommunication and misunderstandings. I see this with terms like Human and Organizational Performance (HOP), Reliability, and Root Cause Analysis (RCA).

Having been in the RCA business myself for nearly 40 years, I can tell you that if you ask 100 people what ‘RCA’ means, you will get 100 different answers. I expand on why this results in a stigma associated with RCA, in this paper – The Stigma of RCA: What’s in a Name?.

The same goes for HOP. While I don’t purport to be a HOP expert, I can absolutely say that Todd Conklin’s 5 principles of HOP, completely align with my experience with how I see Operationalize Reliability and facilitate RCA.

*But blame is not a pass on accountability either.

I will go as far to say that if an RCA facilitator is NOT applying these principles, they are likely conducting what I call a “Shallow Cause Analysis” or SCA. I can see many complementary parallels between HPI/HOP and our Human Reliability R&D in the 70’s.

WHAT HAS BEEN CHOLEARNING'S ‘IMPACT’ ON ME?

Some may ask, why did I continue coming to CHOLearning Events when there were so many differences I observed from my comfort zone of Reliability/RCA? The question is also my answer! It was my intellectual curiosity that drove me to find out what I could learn from this Community that would enhance my approaches to Reliability/RCA.

I will expand on these in future blogs to demonstrate they all have unity in purpose. However, as a foundation, I want to start out by expressing what an effective Reliability Engineering System looks like. I do not believe that many in the CHOLearning Community are familiar with Reliability Engineering as I know it, so I will start here. From the master process flow that I will describe, I can then associate where the principles of RCA, HOP, HPI, HRO and Resilience Engineering are supported.

So, what does an effective Reliability system look like? I will express this is the form of a process flow diagram. I call this diagram the Prelical SEE Model. ‘SEE’ is an acronym that stands for the primary phases of 1) Strategy, 2) Execution and 3) Evaluation. My emphasis will be on practicality and not on perfection. When models are too rigid in their discipline, they fail. They end up representing ‘Work as Imagined’ versus ‘Work as Done’.

Figure 2: The Strategy Phase of an Operational Reliability Process Flow

Let’s take a cursory look at these elements and summarize.

Table 3: The Strategy Phase of Operational Reliability

Now let’s move to the Execution Phase of the SEE model.

Figure 3: The Execution Phase of an Operational Reliability Process Flow

In this phase we are using our technological systems to properly execute the work from the strategies, in a disciplined manner.

Table 4: The Execution Phase of Operational Reliability

And lastly, we explore the Evaluation Phase of the SEE model. This is when primary or secondary failures occur, and now must be addressed to prevent recurrence. This is where RCA enters the Reliability picture.

For clarity’s sake:

Primary failure is the first signal of an impending failure. These signals are typically picked up by our Predictive Maintenance technicians and include things like elevated temperatures, high vibrations, unacceptable decibel levels, and the like. If caught early enough, further analysis and repairs can be conducted on a scheduled basis and not an emergency basis.

Secondary failure means that we missed the primary signal, and the asset suffered a component-level, functional failure that likely caused an unexpected process interruption. Now it is likely emergency work and much more expensive to recover from.

Figure 4: The Evaluation Phase of an Operational Reliability Process Flow

In this phase, since our strategies and execution will never be flawless, failures will slip through the ‘holes (vulnerabilities) in the proverbial Swiss cheese’ and we will have to react to them. Keep in mind the orange connector line coming in from the left, is from the Execution phase. Now we have either detected signals of impending failure (primary failures) and/or we have experienced a secondary failure or unexpected loss of function.

I will address these in the proper order of sequence in real life. We will first address the primary failures, then the secondary failures.

I would also like to note that we are at higher risk of Safety incidents during unexpected process interruptions, as opposed to steady-state operations. We will expand on that correlation in future blogs and provide supporting data to validate.

Table 5: The Evaluation Phase of Operational Reliability

When we put the pieces of the puzzle together, this is what it looks like as a full process flow diagram. I now want to revisit our 3 Principles of Reliability earlier. If we look at this graphic as if they are columns, we can equate Priority = Criticality. We can equate Strategies = Proaction and lastly, we can equate Focus = Evaluation. I left the slice of Swiss Cheese in as a reminder that none of our systems and strategies are flawless…and therefore will never have American cheese systems 😉.

Figure 5: The Full Prelical SEE Model for Operational Reliability

Before we close on this first blog of my Reliability series, let’s reflect on the HOP principles we mentioned before and see if they align.

While we will again address how HPI, HOP, HRO and Resilience engineering impact RCA, just based on this model, they all apply and are critical to this infrastructure. Let’s bring back Todd Conklin’s 5 HOP Principles and review.

Table 7: Contrasting Conklin’s 5 Principles of HOP to the SEE Reliability Model

In this blog I wanted to lay out what ‘Reliability’ looks like in a manufacturing organization. I did this to provide context as to the big picture. I wanted to show that RCA doesn’t operate in a vacuum, it is a critical element of a much larger picture.

In upcoming blogs:

We will break down what does a true RCA process flow look like (the RED block in our SEE Model)?
Are all RCA approaches created equal (is all RCA the equivalent of the 5-Whys as many would like to have us think)?
How do mechanistic versus adaptive systems play into a Reliability/RCA Approach?
What’s the difference between complex versus complicated in Reliability/RCA?
Is there a correlation between Reliability and Safety?

For now, it is time to go out and make an IMPACT on your organization’s Safety and Reliability, by unifying and applying these principles and call it “(Your Company Name) way of doing business)!!

Whitelist Instructions