5.5 Given a scenario, troubleshoot general networking issues

  • Considerations
    • Device Configuration Review
    • Routing Tables
    • Interface Status
    • VLAN Assignment
    • Network Performance Baselines
  • Common Issues
    • Collisions
    • Broadcast Storm
    • Duplicate MAC Address
    • Duplicate IP Address
    • Multicast Flooding
    • Asymmetrical Routing
    • Switching Loops
    • Routing Loops
    • Rogue DHCP Server
    • DHCP Scope Exhaustion
    • IP Setting Issues
      • Incorrect Gateway
      • Incorrect Subnet Mask
      • Incorrect IP Address
      • Incorrect DNS
    • Missing Route
    • Low Optical Link Budget
    • Certificate Issues
    • Hardware Failure
    • Host-Based/Network-Based Firewall Settings
    • Blocked Services, Ports, or Addresses
    • Incorrect VLAN
    • DNS Issues
    • NTP Issues
    • BYOD Challenges
    • Licensed Feature Issues
    • Network Performance Issues


We should make sure that our network continues to operate smoothly.  A few things we can review

  • Device Configuration

    • Are all our devices configured correctly?

    • Do we have a back up of every configuration?

    • If the device received an operating system upgrade, has the configuration been reviewed to ensure that no changes were made?  Has the configuration been reviewed to ensure that it takes advantage of any new features made available by the upgrade?

    • Are changes to the configuration logged and do they follow the change management procedure in our organization?

  • Routing Tables

    • Are all our routing tables accurate and complete?

    • Are new routes being added to the routing table as required?

    • Are the weights for each route accurate or do they need to be updated?

    • Are our routing tables protocols secure?

  • Interface Status

    • Does each router and switch interface have a detailed description about what is connected to it?

    • Is each interface operational, and at the correct speed and duplex setting?  If not, we must investigate to see if any devices are offline or disconnected.

    • Are unused interfaces shut down and assigned to a VLAN that we are not using?

  • VLAN Assignment

    • Is each interface assigned to the correct VLAN?

    • Does each VLAN have a detailed description?

    • Can larger VLANs be broken into smaller sub VLANs?

    • Can we separate the network into more VLANs or eliminate some unused VLANs?

  • Network Performance Baselines

    • Is the network performing at, better than, or worse than the previously established baseline?

Let’s look at some other areas that can cause network disruptions.

  • Collisions

    • A collision happens when two devices send a packet down the same ethernet cable at the same time.  The packets collide and must be resent.

    • Collisions only happen on hubs and half-duplex links.

    • Removing hubs and ensuring that all links operate at 1 Gbps or higher will eliminate the potential for a collision

  • Broadcast Storm

    • A broadcast storm happens when too many broadcast packets are sent at the same time.  Remember that a broadcast packet is one that a switch forwards to all the members of the broadcast domain (all of the devices in the VLAN where the broadcast packet originated).
    • A broadcast storm can be created by a loop in the switch.
    • We break up larger VLANs (VLANs with many devices) into smaller VLANs.  This will reduce the size of the broadcast domain.
    • Cheaper or misconfigured network hardware such as low-end switches and hubs can create broadcast storms.

  • Duplicate MAC Address

    • Two devices should not have the same MAC address, ever.  If it happens, then one device is rogue.  Or in other words, one of the devices has a spoofed MAC address.  You must identify the rogue device and remove it.

  • Duplicate IP Address

    • Two devices should not have the same IP address.  If they do, then that is likely because somebody statically configured the same IP address on two devices, or statically configured an IP address that is already assigned by DHCP.

    • The solution is to change the IP address on one device.

  • Multicast Flooding

    • A flood of multicast traffic happens because devices are sending traffic to the multicast address and because the switch is forwarding it. 

    • Large volumes of multicast traffic can happen if the router forwards multicast traffic from the internet.  This should be disabled.

  • Asymmetrical Routing

    • Asymmetrical routing happens when traffic leaves the router from one interface and returns from another interface.

    • If we have an SD-WAN with multiple internet connections, the most efficient path might be different in each direction.  A router may be configured to send and receive traffic using different paths.

    • However, a router or firewall needs to be able to see the traffic in both directions so that it can analyse it and so that it can enforce filtering rules.  If traffic only passes in one direction, the firewall won’t know which side originated the connection.

  • Switching Loops

    • A Switching Loop happens when a physical cable is connected to two ports on the same switch, or when multiple switches are connected in a loop.  A loop will cause the switch to crash.

    • If we enable Spanning Tree Protocol on a switch, the switch will detect loops and shut down the affected ports.  Upon receiving an alert that a port is shut down, we can physically trace the cables to determine whether a loop exists and remove them.

    • When cables are neatly labelled and organized, and when access to the switches is provided only to authorized individuals, the risk of a switching loop is greatly reduced.

  • Routing Loops

    • A Routing Loop happens because two or more routers think that the other router is the destination of the traffic.

    • Below, Router A receives a packet addressed to 10.1.3.4.  The final destination is Router C. 

    • There are two scenarios here.  First, let’s say that the routers have learned their routes from OSPF or another routing algorithm, and the link between Router B and Router C has been broken, but Router A hasn’t learned about it yet.  Router A checks its routing table and determines that the next hop router is Router B, so it forwards the packet to Router B. 

    • Why not forward directly to Router C?  Well, the link between Router A and Router C may be slow (so it may have a higher administrative cost).

    • Router B is thinks that the next hop router is Router A.  It might be misconfigured, or it might know that the link between itself and Router C is down.  It forwards the packet to Router A, which forwards it to Router B.  The packet travels back and forth until it is dropped by one of the routers.  Remember that a packet has a field that tells the router how many times it has been forwarded.  A packet will be dropped after being forwarded 30 times.  This prevents undeliverable packets from destroying the entire internet.

    • After a few seconds, Router A should learn the new route from Router B.  Or in other words, Router A will learn that Router B no longer has a route to Router C, and then therefore, Router A must forward the traffic to Router C directly.

    • Let’s think about a second scenario.  Router A has a statically configured route to send Router C’s traffic to Router B, and Router B has a statically configured route to send Router C’s traffic to Router A.  Now we will have a permanent loop between the two routers when either of them receives traffic with a destination of Router C.  this loop can be removed by deleting the static route.

  • Rogue DHCP Server

    • A Rogue DHCP server happens when somebody installs a rogue DHCP server on our network.  We can identify that it is happening because devices will receive DHCP IP addresses that are not correct.

    • We can prevent a rogue DHCP server from being connected by enforcing DHCP snooping on all of our switches and by enforcing port security.

  • DHCP Scope Exhaustion

    • DHCP Scope Exhaustion happens when we run out of IP addresses.  It simply means that there are more devices than available IP addresses.  New devices requesting DHCP addresses will be unable to connect.

    • We can fix DHCP Scope Exhaustion by ensuring that our range of DHCP addresses is wide enough. 

    • If we have 10,000 potential devices, then we should have a range of at least 10,000 IP addresses.  We should choose a class of network that provides us with enough addresses to accommodate all the potential devices connecting to it.

    • DHCP Scope Exhaustion can be caused by having many devices connect for brief periods.  For example, the Wi-Fi at an airport sees many different devices, each for a short time.  If we see 100,000 unique devices per week, we don’t need 100,000 IP addresses.  The average traveller only connects for a few hours.  Thus, we can set a DHCP range of 10,000 IP addresses and reduce the lease time to one day, or even twelve hours.  Now, used DHCP addresses will expire quickly and be returned to the pool.

    • A hacker can connect to the network and request a DHCP address, change his MAC address electronically, reconnect, and request a new DHCP address.  A hacker can encode these actions in a script and use up all the available addresses.  Then legitimate users will not be able to connect.  We can reduce the risk of this by verifying the identity of each device connecting to our network.  We can also enforce a username and password on our guest Wi-Fi.

  • IP Setting Issues

    • If our device has the wrong Gateway, Subnet Mask, IP Address, or DNS we must check why. 

    • If the device is assigned these settings through a DHCP server, and they are not correct, then the DHCP server may be misconfigured.

    • If the device is assigned these settings statically, then we must configure it correctly. 

    • A device might have the wrong settings if it is connected in the wrong VLAN. 

  • Missing Route

    • A missing route is when a router does not know the destination for a piece of traffic.  It checks the routing table but does not have a rule matching the destination.

    • A missing route happens when the router is misconfigured or not able to learn the route via a routing protocol.  We might check the settings on the routing protocol or configure a static route.

  • Low Optical Link Budget

    • This means that the loss on our fiber optic cable is too high.

    • Remember that a fiber optic connection has a transmitter and a receiver.  The transmitter operates at a certain power level and the receiver measures at a certain power level.  The difference is known as the dynamic range and is measured in dB.  It is the gain in the signal strength between the transmitter and the receiver.

    • Our cable has loss.  For example, if the dynamic range on our fiber optic transmitter/receiver is 10 dB and our fiber optic cable has a loss of 5 dB, then our loss budget is 5 dB, which is acceptable.

    • But if our cable loss is 15 dB and our dynamic range is 10 dB, then our loss budget is -5 dB, which is not acceptable.  That means that we do not have a good enough transport medium to send the signal.

    • When our budget is too low, we need to either use more powerful transmitters, more sensitive receivers, or repair the fiber optic cable so that it performs better. 

    • A good fiber optic cable installation requires proper planning to ensure that all the components will perform within the recommended range.

  • Certificate Issues

    • A certificate issue will prevent a device from connecting securely or from connecting at all.  Without valid certificates, a client and a server cannot negotiate a secure connection.

    • Reasons why a certificate issue could occur and how to correct them

      • The certificate is not present on one of the devices – it must be reinstalled

      • The certificate has expired – a new certificate must be installed

      • The date or time on the device is not correct and now the device incorrectly thinks that the certificate has expired – the date and time should be corrected

      • The certificate has been revoked by the issuer – the administrator should verify the cause of the revocation and correct it

  • Hardware Failure

    • A hardware failure happens when a device fails.  When it fails, we must replace it.

    • We should ensure that our network has been configured so that critical hardware devices are redundant.  That is, there should not be a single point of failure.

    • We should adequately inspect and maintain network hardware so that we can reduce the risk that a component will fail while in use.

    • We should replace aging hardware to reduce the risk that one will fail while in use.

  • Host-Based/Network-Based Firewall Settings

    • A misconfigured firewall will could block legitimate traffic, or worse, allow malicious traffic through.

    • We should verify that each firewall allows only the permitted traffic through, and no other traffic. 

    • The firewall on the Windows computers should be configured automatically through a group policy so that local users are unable to change its settings.

    • We should regularly audit the firewall rules to ensure that they are working correctly.

  • Incorrect VLAN

    • When a device is on the wrong VLAN, it will not be able to reach the resources that it requires, and other devices will not be able to reach it. 

    • A device that has an ethernet connection but has an IP address in the wrong subnet might be in the wrong VLAN.

    • We might have misconfigured the VLAN on the switch port that the device is connected to, or we might have connected the device to the wrong switch port.

  • DNS Issues

    • If we are unable to reach a specific website or hostname from a device, but we can ping its IP address, then the DNS might not be correct or functional.

    • We must verify that the device has the correct DNS configuration.  If not, we must configure the correct DNS server address on the device.

    • We must verify that the DNS is reachable from the device.  We should verify that it is not blocked by a firewall.  If it is blocked, we should unblock it or try a new DNS server.

    • We must verify that the DNS has an entry for the hostname that we are trying to reach, and that it is replying with the correct information.  If the DNS server does not have the correct information, we should attempt to correct it.  If we are not able to correct the issue, we should choose a more authoritative DNS.

  • NTP Issues

    • NTP allows network devices to synchronize their time. 

    • When a device does not have the correct time, then it is either set to ignore the NTP server or it cannot reach the NTP server.

    • We can check whether the NTP server is reachable or whether it is blocked by a firewall or router.

    • We should also verify that the device is configured to obtain the time from an NTP server that is reachable.  An NTP issue can be caused by a DNS misconfiguration, if the device is unable to resolve the hostname of the NTP server.

  • BYOD Challenges

    • BYOD means Bring Your Own Device.

    • If you allow users to bring their own devices to work, you must ensure that those devices have adequate security measures to protect your network.  That might mean enforcing security policies and data compartmentalization on them.

  • Licensed Feature Issues

    • Many advanced routing and switching features are available with the purchase of an additional license.

    • For example, if you purchase a Cisco switch, like a Cisco 3750, it will function as a normal switch right out of the box.  If you want to use it as a layer 3 switch (to be able to route packets between VLANs without a separate router), then you need a license.  If you want it to manage your wireless access points, then you need a license.  If you want advanced security features, then you need a license. 

      Once you purchase the license, you can activate it on the device and start using the features right away.  Some features may be available for a limited time as a trial.  Once the trial expires, the features are deactivated. 

      You should not activate a feature as a trial in a production network.  If you do, then the network will be disrupted once the trial is over.

    • Cisco Meraki hardware is Cisco’s line of cloud-managed hardware.  Each Cisco Meraki switch, router, and wireless access point comes with a one-year license that allows you to connect the device to the cloud.  To continue using Cisco Meraki hardware, you must renew your license each year.  If your license expires, your hardware stops working.