Root Cause Analysis – how do you get it to work effectively? Part 4: Bringing it all together
In this four part series of articles, I’ve explained some of the more well-known brands of root cause analysis and how these differ not just in approach but also in complexity.
Some of the techniques do need more expertise in administering them, and so lend themselves to being co-ordinated by more specialised job functions. Some are simpler techniques which are easy to learn and use – yet in my experience I’ve found all too often that these simpler techniques still don’t get used effectively, if at all by the ‘people in the process’.
I’d like in this article then, to try and show how the four techniques I’ve discussed relate to each other.
Start with the basics – and use them well
As you may have already gathered, it’s my firmly held belief that it’s the ‘people in the process’ who are key. They are the life blood of any organisation. They are the ones who generate the revenue and they also – almost always – know more about what’s really going on in their processes than the people who manage them.
Yet they don’t always get consulted properly about their knowledge – in fact the finger of blame is often pointed their way instead when things do go wrong. What sort of a crazy situation is that?!! All that’ll cause is disillusionment about what value they really do have for the organisation. This can then lead to a detachment from responsibility – for any organisation that’s something that can easily work against its very existence.
The real dilemma is that, although people in distinct process areas usually do have the best information to do effective process improvement, they can often be unaware of problems occurring in the downstream processes due to a combination of ‘silo’ management and inadequate cross-departmental communication.
And therefore they may not be aware that problems encountered downstream may in fact originate as a result of their own action. Before a problem can really be solved, there has to be an acknowledgement by the relevant people in the process of where the source of the problem really originated.
So, in order to get the best from our most valued assets we have to educate and involve them to:
- Understand what a problem really is
- Acknowledge a problem can exist in their own area of work
- Share their knowledge of the real process
- Encourage them to identify root causes to their problems
- Participate in the identification and implementation of the most appropriate changes
- Take ownership in controlling their improved processes
First and foremost, get yourself organised with localised problem solving
Localised problem solving doesn’t just happen by itself:
- Ensure your internal continuous improvement and corrective action procedures are integrated with effective root cause analysis methodologies.
- Inspire and educate your ‘people in the process’ to explore their own localised process improvement opportunities using 5-Why or Cause & Effect techniques.
- Develop a means of managing this process – you can’t afford to have loose cannons in the process.
Remember what the focus is for each of the basic techniques. 5 – Why to stay focussed on resolving immediate problems quickly, and Cause & Effect to get an in-depth but broad understanding of potential as well as real problems. Use one or the other according to whether you’re in reactive, or preventative mode.
Neither of these tools used on their own will give an organisation a completely satisfactory and robust mechanism to continue to assess their process capabilities though! And so other, more complex techniques should also be implemented (but managed very carefully).
It’s important to note that the more complex, systemic techniques aren’t intended to be implemented as a safety net to react to current process failure. They were developed – and should really be used – as proactive methods to risk assess and prepare new processes against failure.
Let’s just explore in a bit more detail the relationships between the two most commonly used systemic methods – Failure Mode & Effects Analysis and Fault Tree Analysis.
Fault Tree Analysis
I know this example is a bit flippant but it serves its purpose to explain things. Keep in mind though that this would be just a ‘snippet’ from a more extensive analysis of a complete process.
Let’s look at its main characteristics:
- The top and intermediate events, including their root causes are all stated as negatives (just like Cause & Effect diagrams)
- It recognises and accounts for all the different types of possible causes that have been identified (just like Cause & Effect diagrams)
- It drills down to root cause through an iterative series of logical steps (just like 5 – Why analysis)
- It also recognises that some causes can be independent, or dependent on other things that have to occur at the same time (note the pairs of simultaneous causes to the right of the two AND statements above). This is characteristically unique to Fault Tree Analysis.
- Every root cause should have a defined countermeasure (or statement of control) assigned to it in order for this to become a method of practical value. No point knowing what can go wrong, and just letting it happen!
So you can see that Fault Tree Analysis has a lot in common with the thought processes needed to build a 5 – Why analysis. If you teach your ‘people in the process’ to become effective users of 5 – Why, it’s only a small step then, to harness that local knowledge during the development of Fault Trees!
The only real problem with FTA at this level of detail is that it doesn’t have a mechanism that assesses each possible cause of failure in terms of its probability of occurring – that’s where the Failure Mode & Effects Analysis has the edge.
Failure Mode & Effects Analysis
I should say at this point, FMEA’s are used for a number of different purposes, typically:
- DFMEA’s to assess the functionality of proposed new product or service designs
- PFMEA’s to assess the processability of proposed new product or service processes
Other forms of FMEA structure have been used to risk assess Tooling reliability, Health & Safety risks and many more applications. We’re going to stay on theme with ‘processes’ in this discussion though.
Before anyone thinks that all this is building up to me saying that FMEA stands head and shoulders above every other method, be aware – I’m not! I’ll discuss that as a final consideration below. All the same, it is an extremely powerful technique if used correctly and consistently, and does a good job of protecting organisations from unnecessary risks of failing to meet their customer’s expectations.
As I outlined in article 1, FMEA has two principal elements, one element dealing with establishing the root cause…
…and the other establishing a risk assessment of the likelihood of them happening…
Let’s look at the FMEA’s main characteristics:
In the root cause part of the FMEA, it categorises individual FAILURE MODES and associates these with their respective EFFECTS.
- It also characterises the respective CAUSES for each of the effects. This is very much the basis of a properly expedited Cause & Effect analysis.
- It then goes on to put a SEVERITY score against the Failure modes/Effects using a 1-10 scale where 1 is little or no effect and 10 being a critical effect.
Well, that does indeed look like Cause & Effect analysis doesn’t it. So Cause & Effect analysis can be used as a good source for populating the FMEA.
If you’ve done your ground work and managed to get your ‘people in the process’ comfortable with doing Cause & Effect analyses, then it’s only a natural next step to get them working with the teams who develop the FMEA’s. But beware, this sort of in-depth analysis isn’t for the faint hearted!
Once the FMEA has captured what can fail, and how serious the consequences are for failure, it then sets about establishing the risk of these failures occurring:
- An occurrence score is established with the comparisons from existing, similar products/service failures, again a scale of 1-10 is usually used where 1 is an extremely unlikely occurrence to 10 where occurrence is almost inevitable.
- An assessment of proposed or current CONTROL methods of preventing and detecting failure is documented.
- And then a 1-10 risk score is established to reflect the relative reliability of the controls planned or implemented.
These three risk scores (Severity, Occurrence and Detectability) are then multiplied out to establish individual failure mode/effect risks of failure relative to each other. If the risks are high you deal with them, if they’re below an agreed risk level, you leave them alone. Simple as that.
And so the strength of Failure Mode & Effects Analysis is that it incorporates a very useful risk assessment. The message to take away here is, you don’t need to try fix everything – prioritise the big risks from top down, and you can decide when the law of diminishing returns kicks in.
It appears then, that while the 5 – Why technique is more aligned to FTA, the Cause & Effect analysis is the technique most likely to be of value to building up the FMEA.
Wait though – it’s not quite as simple as that!
Well I did say earlier in this article that we mustn’t get too carried away thinking all we need is to use FMEA and all your worries will be left behind. In fact the FMEA technique does have some weaknesses you need to be aware of:
FMEA is a top-down analysis technique – it looks at what your end product should do, then considers how it could fail to deliver. However, during the phase of establishing root cause, because it just defines a single level of cause–to-effect relationship, it can lure people into not putting enough effort into thoroughly drilling down through intermediate events to the real root causes.
- It only analyses failure modes as independent entities, some failures occur as a result of combinations of events. FMEA can miss these relationships.
- Carried out with serious intent, there can be an element of overkill in an FMEA analysis. There’s always a temptation to just carry on into ridiculous levels of unnecessary detail. You need to know when to stop digging.
Let’s dwell on the first point then – Some organisations use FMEA in two stages – a high level stage which identifies where the risks are most likely to occur, and then interrogates just these highest risks in more detail using Fault Tree Analysis to establish root cause. In this way, less time is wasted on non-events, and FTA provides effective 5–Why style interrogation to establish correct root causes, together with any dependencies between failure modes and their intermediate events.
As for the second point – know when to stop. FMEA development teams need to be ‘policed’ by common sense. If you do get your ‘people in the process’ doing effective 5–Why and Cause & Effect analyses, then their presence on FMEA and FTA teams is a blessing (for some of them at least!). They’ll be your ‘common sense’ police.
And so, refining our last diagram of relationship between the techniques…
Some people balk at the idea of using FMEA’s if their organisation doesn’t already have them adopted as formal quality system tools. I think my discussions above should show that there’s really no need to get into any ‘hissy fits’ about not being able to use these more complex analytical tools. Learn by doing and don’t be frightened if they’re not quite right first time around. They’re a bit like learning to ride a bike, you might fall over a few times but after that you never forget how to do it.
And finally, don’t be afraid to use FMEA even as an independent tool to help you put some logical rationale into your analysis of current processes. Yes, this looks a bit like shutting the stable doors after the horse has bolted – but there could be more than one horse in the stable!
If you’ve not already seen Part 1 of this series of four articles ‘A bit of background’ by all means feel free to look it up on our blog page, where you will also be able to see part 2 (‘5 – Why Analysis’) and part 3 (‘Cause & Effect Analysis’)
I hope you’ve enjoyed this series of articles. If they inspire you to have a go in your own processes – fantastic, let me know how you find the experience. Share the knowledge freely with your friends and colleagues (and don’t forget the Boss!).
Postscript – It may be worth considering that the recent release of ISO9001:2015 incorporates an added emphasis for demonstrating effective risk assessment of your systems and processes. It also expects a mind-shift away from pure reliance on ‘detectability’ towards more robust ‘preventability’. In other words a stronger emphasis on developing processes that deliver defect-free products right first time.