Fortinet black logo

Handbook

Troubleshooting IS-IS

6.0.0
Copy Link
Copy Doc ID 4afb0436-a998-11e9-81a4-00505692583a:737200
Download PDF

Routing loops

Normally in routing, a path between two addresses is chosen and traffic is routed along that path from one address to the other. When there's a routing loop, that normal path doubles back on itself which creates a loop. When there are loops, the network has problems.

A routing loop occurs when a normally functioning network has an outage and one or more routers are offline. When packets encounter this, an alternate route is attempted to maneuver around the outage. During this phase it's possible for a route to be attempted that involves going back a hop and trying a different hop forward. If that hop forward is also blocked by the outage, a hop back and possibly the original hop forward may be selected. You can see if this continues, how it can consume not only network bandwidth but also many resources on the affected routers. The worst part is, this situation will continue until the network administrator changes the router settings or the downed routers come back online.

Routing loop effect on the network

In addition to this “traffic jam” of routed packets, every time the routing table for a router changes, that router sends an update out to all of the IS-IS routers connected to it. In a network loop, it's possible for a router to change its routes very quickly as it tries and fails along these new routes. This can quickly result in a flood of updates being sent out, which can effectively grind the network to a halt until the problem is fixed.

How to spot a routing loop

Any time network traffic slows down, you'll ask yourself if it's a network loop or not. Often slowdowns are normal. They're not a full stoppage and normal traffic resumes in a short period of time.

If the slowdown is a full halt of traffic, or a major slowdown doesn't return to normal quickly, you need to do serious troubleshooting quickly.

Some methods to troubleshoot your outage include:

If you're not running SNMP or link health monitoring, or if you have Fortinet routers that aren't Fortinet products in your network, you can use networking tools, such as ping and traceroute, to define the outage on your network and begin to fix it.

Checking your logs

If your routers log events to a central location, it can be easy to check the logs for your network for any outages.

On the FortiGate, go to Log & Report > Log & Archive Access. You'll want to look at both event logs and traffic logs. Events to look for will generally fall under CPU and memory usage, interfaces going offline (due to link health monitoring), and other similar system events.

Once you've found and fixed your network problem, you can go back to the logs and create a report to better see how things developed during the problem. This type of forensic analysis can better help you prepare for next time.

Using SNMP network monitoring

If your network had no problems one minute and slows to a halt the next, chances are something changed to cause that problem. Most of the time an offline router is the cause and once you find that router and bring it back online, things will return to normal.

If you can enable a hardware monitoring system such as SNMP or sFlow on your routers, you can be notified of the outage and where it's located, as soon as it happens.

Ideally you can configure SNMP on all your FortiGate routers and be alerted to all outages as they occur.

To use SNMP to detect potential routing loops - GUI:
  1. Go to System > SNMP.
  2. Enable SNMP Agent.
  3. Optionally, enter the Description, Location, and Contact Info for this device for easier location of the problem report.
  4. In either SNMP v1/v2c section or SNMP v3 section, as appropriate, select Create New.
  5. Enter the Community Name that you want to use.
  6. In Hosts, add the IP address where you will be monitoring the FortiGate. You can add up to 8 different addresses.
  7. Ensure that ports 161 and 162 (SNMP queries and traps) are allowed through your security policies.
  8. In the SNMP Events section, select the events you want to be notified about. For routing loops, this should include CPU usage too high, Available memory is low, and possibly Available log space is low. If there are problems, the log will fill up quickly, and the FortiGate device’s resources will be overused.
  9. Select OK.
  10. Configure SNMP host (manager) software on your administration computer. This will monitor the SNMP information sent out by the FortiGate. Typically, you can configure this software to alert you about outages or CPU spikes that may indicate a routing loop.
Using link health monitoring

Another tool available to you on a FortiGate is the link health monitor. You can detect possible routing loops with link health monitors. You can configure the FortiGate to ping a gateway at regular intervals to ensure it's online and working. When the gateway isn't accessible, that interface is marked as down.

For more information about link health monitoring, see Link health monitor.

Looking at the packet flow

If you want to see what is happening on your network, look at the packets traveling on the network. In this situation, you're looking for routes that have metrics higher than 15, since that indicates that they're unreachable. Ideally, if you debug the flow of the packets and record the routes that are unreachable, you can create an accurate picture of the network outage.

Action to take on discovering a routing loop

Once you've mapped the problem on your network and determined it's in fact a routing loop, there are a number of steps you can take to correct it.

  1. Get any offline routers back online. This may be a simple reboot or you may have to replace hardware. Often, this first step will restore your network to its normal operation, once the routing tables finish being updated.
  2. Change your routing configuration on the edges of the outage. Even if step 1 brought your network back online, you should consider making changes to improve your network before the next outage occurs. These changes can include configuring features like holddowns and triggers for updates, split horizon, and poison reverse updates.

Split horizon and poison reverse updates

Split horizon is best explained with an example. You have three routers linked serially, let us call them routerA, routerB, and routerC. RouterA is linked only to routerB, routerC is linked only to routerB, and routerB is linked to both routerA and routerC. To get to routerC, routerA must go through routerB. If the link to routerC goes down, it's possible that routerB will try to use routerA’s route to get to routerC. This route is A-B-C, so it won't work. However, if routerB tries to use it, this begins an endless loop. This situation is called a split horizon because from routerB’s point of view, the horizon stretches out in each direction but in reality it only is on one side.

Poison reverse is the method used to prevent routes from running into split horizon problems. Poison reverse “poisons” routes away from the destination that use the current router in their route to the destination. This “poisoned” route is marked as unreachable for routers that can't use it. In IS-IS, this means that route is marked with a distance of 16.

Routing loops

Normally in routing, a path between two addresses is chosen and traffic is routed along that path from one address to the other. When there's a routing loop, that normal path doubles back on itself which creates a loop. When there are loops, the network has problems.

A routing loop occurs when a normally functioning network has an outage and one or more routers are offline. When packets encounter this, an alternate route is attempted to maneuver around the outage. During this phase it's possible for a route to be attempted that involves going back a hop and trying a different hop forward. If that hop forward is also blocked by the outage, a hop back and possibly the original hop forward may be selected. You can see if this continues, how it can consume not only network bandwidth but also many resources on the affected routers. The worst part is, this situation will continue until the network administrator changes the router settings or the downed routers come back online.

Routing loop effect on the network

In addition to this “traffic jam” of routed packets, every time the routing table for a router changes, that router sends an update out to all of the IS-IS routers connected to it. In a network loop, it's possible for a router to change its routes very quickly as it tries and fails along these new routes. This can quickly result in a flood of updates being sent out, which can effectively grind the network to a halt until the problem is fixed.

How to spot a routing loop

Any time network traffic slows down, you'll ask yourself if it's a network loop or not. Often slowdowns are normal. They're not a full stoppage and normal traffic resumes in a short period of time.

If the slowdown is a full halt of traffic, or a major slowdown doesn't return to normal quickly, you need to do serious troubleshooting quickly.

Some methods to troubleshoot your outage include:

If you're not running SNMP or link health monitoring, or if you have Fortinet routers that aren't Fortinet products in your network, you can use networking tools, such as ping and traceroute, to define the outage on your network and begin to fix it.

Checking your logs

If your routers log events to a central location, it can be easy to check the logs for your network for any outages.

On the FortiGate, go to Log & Report > Log & Archive Access. You'll want to look at both event logs and traffic logs. Events to look for will generally fall under CPU and memory usage, interfaces going offline (due to link health monitoring), and other similar system events.

Once you've found and fixed your network problem, you can go back to the logs and create a report to better see how things developed during the problem. This type of forensic analysis can better help you prepare for next time.

Using SNMP network monitoring

If your network had no problems one minute and slows to a halt the next, chances are something changed to cause that problem. Most of the time an offline router is the cause and once you find that router and bring it back online, things will return to normal.

If you can enable a hardware monitoring system such as SNMP or sFlow on your routers, you can be notified of the outage and where it's located, as soon as it happens.

Ideally you can configure SNMP on all your FortiGate routers and be alerted to all outages as they occur.

To use SNMP to detect potential routing loops - GUI:
  1. Go to System > SNMP.
  2. Enable SNMP Agent.
  3. Optionally, enter the Description, Location, and Contact Info for this device for easier location of the problem report.
  4. In either SNMP v1/v2c section or SNMP v3 section, as appropriate, select Create New.
  5. Enter the Community Name that you want to use.
  6. In Hosts, add the IP address where you will be monitoring the FortiGate. You can add up to 8 different addresses.
  7. Ensure that ports 161 and 162 (SNMP queries and traps) are allowed through your security policies.
  8. In the SNMP Events section, select the events you want to be notified about. For routing loops, this should include CPU usage too high, Available memory is low, and possibly Available log space is low. If there are problems, the log will fill up quickly, and the FortiGate device’s resources will be overused.
  9. Select OK.
  10. Configure SNMP host (manager) software on your administration computer. This will monitor the SNMP information sent out by the FortiGate. Typically, you can configure this software to alert you about outages or CPU spikes that may indicate a routing loop.
Using link health monitoring

Another tool available to you on a FortiGate is the link health monitor. You can detect possible routing loops with link health monitors. You can configure the FortiGate to ping a gateway at regular intervals to ensure it's online and working. When the gateway isn't accessible, that interface is marked as down.

For more information about link health monitoring, see Link health monitor.

Looking at the packet flow

If you want to see what is happening on your network, look at the packets traveling on the network. In this situation, you're looking for routes that have metrics higher than 15, since that indicates that they're unreachable. Ideally, if you debug the flow of the packets and record the routes that are unreachable, you can create an accurate picture of the network outage.

Action to take on discovering a routing loop

Once you've mapped the problem on your network and determined it's in fact a routing loop, there are a number of steps you can take to correct it.

  1. Get any offline routers back online. This may be a simple reboot or you may have to replace hardware. Often, this first step will restore your network to its normal operation, once the routing tables finish being updated.
  2. Change your routing configuration on the edges of the outage. Even if step 1 brought your network back online, you should consider making changes to improve your network before the next outage occurs. These changes can include configuring features like holddowns and triggers for updates, split horizon, and poison reverse updates.

Split horizon and poison reverse updates

Split horizon is best explained with an example. You have three routers linked serially, let us call them routerA, routerB, and routerC. RouterA is linked only to routerB, routerC is linked only to routerB, and routerB is linked to both routerA and routerC. To get to routerC, routerA must go through routerB. If the link to routerC goes down, it's possible that routerB will try to use routerA’s route to get to routerC. This route is A-B-C, so it won't work. However, if routerB tries to use it, this begins an endless loop. This situation is called a split horizon because from routerB’s point of view, the horizon stretches out in each direction but in reality it only is on one side.

Poison reverse is the method used to prevent routes from running into split horizon problems. Poison reverse “poisons” routes away from the destination that use the current router in their route to the destination. This “poisoned” route is marked as unreachable for routers that can't use it. In IS-IS, this means that route is marked with a distance of 16.