5.1 Explain the network troubleshooting methodology

  • Identify the Problem
    • Gather Information
    • Question Users
    • Identify Symptoms
    • Determine if Anything Has Changed
    • Duplicate the Problem, if Possible
    • Approach Multiple Problems Individually
  • Establish a Theory of Probably Cause
    • Question the Obvious
    • Consider Multiple Approaches
      • Top-to-Bottom/Bottom-To-Top OSI Model
      • Divide and Conquer
  • Test the Theory to Determine the Cause
    • Once the theory is confirmed, determine the next steps to resolve the problem
    • If the theory is not confirmed, re-establish a new theory or escalate
  • Establish a Plan of Action to Resolve the Problem and Identify Potential Effects
  • Implement the Solution or Escalate as Necessary
  • Verify Full System Functionality and, if Applicable, Implement Preventative Measures
  • Document Findings, Actions, Outcomes, and Lessons Learned

Identify the Problem

Every problem has a solution.  How do we find the solution?  We need to focus on a framework for approaching each problem, no matter the cause or who it affects.  Having a framework will help us solve each problem better.

Before we can solve the problem, we need to know what it is.

First, we must gather information about the problem.  What is the problem?  What systems does it affect?  How often does it happen?  Does it happen randomly or at specific time intervals?  What are the symptoms?

We should attempt to duplicate the problem, if we can do so safely and without causing damage.  If we can replicate the problem in a controlled environment, we can observe it and possibly determine its cause.

We can ask users for advice.  Even non-technical users have good information about the problem.  They can explain their observations.  We should be non-judgemental and approachable when speaking with users.

Next, we can identify the symptoms of the problem.  Symptoms can give us clues as to what is causing the problem.

We should also ask if anything has changed.  We can review our change management log and our patch management software to see if updates were installed prior to the problem taking place.  Did the user install a new program or change a configuration?

Multiple problems should be approached individually.  We should not assume that multiple issues have the same cause, but we should not rule it out either.

If you go to the doctor and your stomach hurts and your leg hurts, the doctor is going to consider them separately.  He will ask you if anything changed.  Have you eaten something different today?  Did you take any new medications?  How bad does your stomach hurt?  All the time?  Sometimes?  In the morning?  After you eat?  Your leg and stomach are probably not connected but they might be.

Establish a Theory of Probably Cause

Second, we must come up with a theory for the cause of the problem.  We should look for obvious causes.  Sometimes they are overlooked, but an obvious cause can be quickly investigated, and if correct, it will save a lot of trouble.  Having a second set of eyes on the problem is also good. 

Many times, a problem is caused by a typo in the configuration.  The person who wrote the configuration can’t “see” the typo because he wrote it and no matter how many times he stares at it, he won’t find the problem.  A second set of eyes helps because another user will quickly see what is wrong.

We can divide the problem into multiple smaller problems and solve each one separately.  It is possible that multiple issues contribute to the same problem.  For example, if your basement is flooding, it could be caused by a leaky bathtub or a leaky toilet, or both.  If you only find and plug one leak, your basement will continue to flood.

We can use the OSI model to help us solve the problem, either starting at the top or at the bottom.  The OSI model can help us rule out issues.  For example, if a user is having trouble connecting to the internet, but their computer has obtained a valid IP address and can connect to the router, then the physical, data link, and network layers are working.  We can start investigating at the next layer.  This is a top-to-bottom approach.

Remember that the layers are

  • Layer 1 – Physical
  • Layer 2 – Data Link
  • Layer 3 – Network
  • Layer 4 – Transport
  • Layer 5 – Session
  • Layer 6 – Presentation
  • Layer 7 – Application

If your stomach hurts, the doctor will come up with a theory for the cause.  Is it muscle pain?  Is it nerve pain?  Is it a chemical imbalance in your blood?  Something you ate?

Test the Theory to Determine the Cause

We test our theory.  The theory is what we think caused the problem.  We can test the theory by removing the source of the error.  For example, if a software update caused the error, we should roll back the update.

If the problem is solved, then the theory is correct.  Otherwise, the theory is wrong, and we must find a new theory.  If we find that the cause is beyond our control, we should escalate to another expert.

If you have a stomach ache, the doctor might give you medication.  If the medication makes your stomach ache go away, then the theory is proven.  Otherwise, the doctor needs to find a new medication or treatment option.

Establish a Plan of Action to Resolve the Problem and Identify Potential Effects

If the theory is correct, we should develop a plan to resolve the problem.  Remember that a problem may affect multiple users.  If we rolled out a software update across the organization, and it caused errors, and our theory was that the software update caused the error, we could test our theory by rolling back the update on a single user’s computer.

The plan of action would be to roll back the update across the entire organization.  Before we execute a plan of action, we should identify its potential effects.  The effects can include disruption to the organization’s systems, and financial risk.  What if the software update was necessary to patch a security vulnerability?  Rolling it back would open the organization to risk.

If the doctor gives you stomach medication, that medication may give you side effects that are worse than the illness you’re trying to treat.

Implement the Solution or Escalate as Necessary

Finally, we must implement the solution.  If we don’t have the ability to implement the solution (don’t have permission, don’t have approval for the cost of the solution, don’t have approval to implement a solution that causes downtime, etc.), we must seek approval from a higher level.  This is known as an escalation.

Verify Full System Functionality and, if Applicable, Implement Preventative Measures

After we have implemented the entire solution, we should verify that everything is working and that no new problems have been caused.  We should consider implementing preventative measures to keep the problem from happening again.  That is, we don’t want to resolve just the symptoms with a “band aid” fix; we want to resolve the root cause of the problem.

If you’ve been taking your stomach medication, the doctor will verify that your stomach ache went away.  He might recommend that you go on a diet or stop eating spicy foods so that the stomach ache doesn’t come back.  You don’t want to stay on stomach medication for your entire life.

Document Findings, Actions, and Outcomes

Finally, we should fully document the problem.  Our documentation should include the symptoms of the problem, the actions we took to resolve it, and the outcome.  This documentation can be used by other technicians who encounter the same problem in the future.  We might call this documentation lessons learned.