Troubleshooting RIP
This section is about troubleshooting RIP.
Routing loops
Normally in routing, a path between two addresses is chosen and traffic is routed along that path from one address to the other. When there's a routing loop, that normal path doubles back on itself, creating a loop. When there are loops, the network has problems getting information to its destination. Loops also prevent the network from returning to the source to report the inaccessible destination.
A routing loop occurs when a normally functioning network has an outage and one or more routers are offline. When packets encounter this, they attempt an alternate route maneuver around the outage. During this phase, it's possible for a route to be attempted that involves going back a hop, and trying a different hop forward. If that hop forward is also blocked by the outage, a hop back, and possibly the original hop forward, may be selected. If this continues, it can consume not only network bandwidth but also many resources on the affected routers. The worst part is this situation will continue until the network administrator changes the router settings or the downed routers come back online.
Effect of routing loops on the network
In addition to this “traffic jam” of routed packets, every time the routing table for a router changes, that router sends an update out to all of the RIP routers connected to it. In a network loop, it's possible for a router to change its routes very quickly as it tries and fails along these new routes. This can quickly result in a flood of updates being sent out, which can effectively grind the network to a halt until the problem is fixed.
How to spot a routing loop
Anytime network traffic slows down, you'll ask yourself if it's a network loop. Slowdowns are often normal, aren't a full stoppage, and normal traffic resumes in a short period of time.
If the slow down is a full halt of traffic or a major slowdown that doesn't return to normal quickly, you need to do serious troubleshooting quickly.
If you're not running SNMP, link health monitoring, or you have non-Fortinet routers in your network, you can use networking tools, such as ping and traceroute, to define the outage on your network and begin to fix it. Ping, traceroute, and other basic troubleshooting tools are largely the same between static and dynamic and are covered in Advanced static routing.
Check your logs
If your routers log events to a central location, it can be easy to check your network logs for any outages.
In the FortiGate GUI, go to Log & Report. You should look at both event logs and traffic logs. Events to look for generally fall under CPU and memory usage, interfaces going offline (due to link health monitoring), and other similar system events.
Once you have found and fixed your network problem, you can go back to the logs and create a report to better see how things developed during the problem. This type of forensics analysis can better help you prepare for next time.
Use SNMP network monitoring
If your network had no problems one minute and slows to a halt the next, chances are something changed to cause that problem. Most of the time an offline router is the cause and once you find that router and bring it back online, things will return to normal.
If you can enable a hardware monitoring system such as SNMP or sFlow on your routers, you can be notified of the outage and its location as soon as it happens.
Ideally, you can configure SNMP on all FortiGate routers and be alerted to all outages as they occur.
To use SNMP to detect potential routing loops - GUI:
- Go to System > SNMP.
- Enable SMTP Agent and select Apply.
- Under SNMPv1/v2c or SNMPv3 as appropriate, select Create New.
SNMP v3 - Select the events for which you want notification. For routing loops this should include CPU usage too high, Available memory is low, and possibly Available log space is low. If there are problems the log will fill up quickly and the FortiGate device’s resources will be overused.
- Configure SNMP host (manager) software on your administration computer. This will monitor the SNMP information sent out by the FortiGate. Typically you can configure this software to alert you to outages or CPU spikes that may indicate a routing loop.
Optionally, enter the Description, Location, and Contact Info for this device for easier location of the problem report.
User Name |
Enter the SNMP user ID. |
Security Level |
Select authentication or privacy as desired. Select the authentication or privacy algorithms to use and enter the required passwords. |
Notification Host |
Enter the IP addresses of up to 16 hosts to notify. |
Enable Query |
Select. The Port should be 161. Ensure that your security policies allow ports 161 and 162 (SNMP queries and traps) to pass. |
SNMP v1/v2
Hosts |
Enter the IP addresses of up to 8 hosts to notify. |
Queries |
Enable v1 and/or v2 as needed. The Port should be 161. Ensure that your security policies allow port 161 to pass. |
Traps |
Enable v1 and/or v2 as needed. The Port should be 162. Ensure that your security policies allow port 162 to pass. |
Use link health monitoring
Another tool available to you on a FortiGate is the link health monitor. You can detect possible routing loops with link health monitors. You can configure the FortiGate to ping a gateway at regular intervals to ensure it's online and working. When the gateway isn't accessible, that interface is marked as down.
For more information about link health monitoring, see Link health monitor.
Use email alerts for failed gateways
You can detect possible routing loops with email alerts.
To configure notification of failed gateways - GUI:
- Go to Log & Report > Report > Local and enable Email Generated Reports.
- Enter your email details.
- Select Apply.
You might also want to log CPU and Memory usage because a network outage will cause your CPU activity to spike.
If you have VDOMs configured, you will have to enter the basic SMTP server information in the Global section, and the rest of the configuration within the VDOM that includes this interface.
After this configuration, when this interface on the FortiGate can't connect to the next router, the FortiGate brings down the interface and alert you with an email about the outage.
Look at the packet flow
If you want to see what is happening on your network, look at the packets traveling on the network. This is the same idea as police pulling over a car and asking the driver where they have been and what the conditions were like.
The method used in the troubleshooting sections Debugging IPv6 on RIPng and on debugging the packet flow also apply here. In this situation, you're looking for routes that have metrics higher than 15, as this indicates they are unreachable.
Ideally, if you debug the flow of the packets and record the routes that are unreachable, you can create an accurate picture of the network outage.
Action to take on discovering a routing loop
Once you've mapped the problem on your network and determined that it's a routing loop, there are a number of steps you can take to correct it:
- Get any offline routers back online. This may be a simple reboot or you may have to replace hardware. Often, this first step will restore your network to its normal operation once the routing tables are updated.
- Change your routing configuration on the edges of the outage. Even if step 1 brought your network back online, you should consider making changes to improve your network before the next outage occurs. These changes can include configuring features like holddowns and triggers for updates, split horizon, and poison reverse updates.
Holddowns and triggers for updates
One of the potential problems with RIP is the frequent routing table updates that are sent every time there's a change to the routing table. If your network has many RIP routers, these updates can start to slow your network down. Also, if you have a particular route that has bad hardware, it might be going up and down frequently, which will generate an overload of routing table updates.
One of the most common solutions to this problem is to use holddown timers and triggers for updates. These slow down the updates that are sent out and help prevent a potential flood.
Holddown timers
The holddown timer activates when a route is marked down. Until the timer expires, the router doesn't accept any new information about that route. This is very useful if you have a flapping route because it'll prevent your router from sending out updates and being part of the problem in flooding the network. The potential downside is if the route comes back up before the timer expires, that route will be unavailable for that period of time. This is only a problem if this is a major route used by the majority of your traffic. Otherwise, this is a minor problem as traffic can be re-routed around the outage.
Triggers
Triggered RIP is an alternate update structure that is based around limiting updates to only specific circumstances. The most basic difference is that the routing table is only updated when a specific request is sent to update, instead of every time the routing table changes. Updates are also triggered when a unit is powered on, which can include the addition of new interfaces or devices to the routing structure, or devices returning to being available after being unreachable.
Split horizon and poison reverse updates
Split horizon is best explained with an example. If there are three routers linked serially, called routerA, routerB, and routerC. RouterA is only linked to routerB, RouterC is only linked to routerB, and routerB is linked to both routerA and routerC. To get to routerC, routerA must go through routerB. If the link to routerC goes down, it's possible that routerB will try to use routerA’s route to get to routerC. This route is A-B-C, so it'll loop endlessly between routerA and routerB.
This situation is called a split horizon because from routerB’s point of view the horizon stretches out in each direction but in reality it's only on one side. Poison reverse is the method used to prevent routes from running into split horizon problems. Poison reverse “poisons” routes away from the destination that use the current router in their route to the destination. This poisoned route is marked as unreachable for routers that can't use it. In RIP, this means that the route is marked with a distance of 16.
Debugging IPv6 on RIPng
The debug commands are very useful to see what is happening on the network at the packet level. There are a few changes to debugging the packet flow when debugging IPv6.
The following CLI commands specify both IPv6 and RIP, so only RIPng packets will be reported. The output from these commands will show you the RIPng traffic on your FortiGate unit, including RECV, SEND, and UPDATE actions.
The addresses are in IPv6 format.
diagnose debug enable
diagnose ipv6 router rip level info
diagnose ipv6 router rip all enable
These three commands:
- Turn on debugging, in general
- Set the debug level to information, which is a verbose reporting level
- Turn on all RIP router settings
Part of the information displayed from the debugging is the metric (hop count). If the metric is 16, that destination is unreachable since the maximum hop count is 15.
In general, you should see an update announcement, followed by the routing table being sent out, and a reply received in response.
For more information, see Testing the IPv6 RIPng information.