Fortinet black logo

Handbook

Troubleshooting OSPF

6.0.0
Copy Link
Copy Doc ID 4afb0436-a998-11e9-81a4-00505692583a:809463
Download PDF

Troubleshooting OSPF

As with other dynamic routing protocols, OSPF has some issues that may need troubleshooting from time to time. For basic troubleshooting, see the Troubleshooting Handbook.

Clearing OSPF routes from the routing table

If you think the wrong route has been added to your routing table and you want to check it out, you first have to remove that route from your table before seeing if it's added back in or not. You can clear all or some OSPF neighbor connections (sessions) using the execute router clear ospfCLI command. The exec router clear command is much more limiting for OSPF than it is for BGP. For more information, see BGP.

For example, if you have routes in the OSPF routing table and you want to clear the specific route to IP address 10.10.10.1, you'll have to clear all the OSPF entries. Enter the following CLI command:

execute router clear ospf process

Checking the state of OSPF neighbors

In OSPF, each router sends out link state advertisements to find other routers on its network segment and to create adjacencies with some of those routers. This is important because routing updates are only passed between adjacent routers. If two routers you believe to be adjacent are not, that can be the source of routing failures.

To identify this problem, you need to check the state of the OSPF neighbors of the FortiGate. Use the get router info ospf neighbor all CLI command to see all the neighbors for the FortiGate. You'll see output in the form of the following:

FGT1 # get router info ospf neighbor

OSPF process 0:

Neighbor ID Pri State Dead Time Address Interface

10.0.0.2 1 Full/ - 00:00:39 10.1.1.2 tunnel_wan1

10.0.0.2 1 Full/ - 00:00:34 10.1.1.4 tunnel_wan2

The important information here is the State column. Any neighbors that are not adjacent to the FortiGate are reported in this column as something other than Full. If the state is Down, that router is offline.

Passive interface problems

A passive OSPF interface doesn't send out any updates. This means it can't be a DR, BDR, or an area border router among other things. It depends on other neighbor routers to update its link state table.

Passive interfaces can cause problems when they're not receiving the routing updates you expect from their neighbors. This results in the passive OSPF interface on the FortiGate having an incomplete or out-of-date link state database, and it won't be able to properly route its traffic. It's possible that the passive interface is causing a hole in the network where no routers are passing updates to each other, however, this is a rare situation.

If a passive interface is causing problems, there are simple methods to determine it's the cause. The easiest method is to make it an active interface, and if the issues disappear, then that was the cause. Another method is to examine the OSPF routing table and related information to see if it's incomplete compared to other neighbor routers. If this is the case, you can clear the routing table, reset the device, and allow it to repopulate the table.

If you can't make the interface active for some reason, you have to change your network to fix the hole by adding more routers, or changing the relationship between the passive router’s neighbors to provide better coverage.

Timer problems

A timer mismatch is when two routers have different values set for the same timer. For example, if one router declares a router dead after 45 seconds and another waits for 4 minutes, that difference in time results in the two routers being out of synch for that period of time. One will still see the offline router as being online.

The easiest method to check the timers is to check the configuration on each router. Another method is to sniff some packets, and read the timer values in the packets themselves from different routers. Each packet contains the hello interval and dead interval periods, so you can compare them easily enough.

BFD

Bidirectional Forwarding Detection (BFD) is a protocol that you can use to quickly locate hardware failures in the network. Routers running BFD communicate with each other and if a timer runs out on a connection then that router is declared down. BFD then communicates this information to the routing protocol and the routing information is updated. For more information about BFD, see BFD.

Authentication issues

OSPF has a number of authentication methods you can choose from. You may encounter problems with routers not authenticating as you expect. This will likely appear simply as one or more routers that have a blind spot in their routing and they won't acknowledge a router. This can be a problem if that router connects areas to the backbone, as it'll appear to be offline and unusable.

To confirm this is the issue, the easiest method is to turn off authentication on the neighboring routers. With no authentication between any routers, everything should flow normally.

Another method to confirm that authentication is the problem is to sniff packets and look at their contents. The authentication type and password are right in the packets which makes it easy to confirm they are what you expect during real time. It's possible that one or more routers isn't configured as you expect and may be using the wrong authentication. This method is especially useful if there are a group of routers with these problems since it may be only one router causing the problem that's seen in multiple routers.

Once you have confirmed the problem is related to authentication, you can decide how to handle it. You can turn off authentication and take your time to determine how to get your preferred authentication type back online. You can try another type of authentication, such as text instead of md5, which may have more success and still provide some level of protection. The important part is that once you confirm the problem, you can decide how to fix it properly.

DR and BDR election issues

You can force a particular router to become the DR and BDR by setting its priorities higher than any other OSPF routers in the area. This is a good idea when those routers have more resources to handle the traffic and extra work of the DR and BDR roles, since not all routers may be able to handle all of that traffic.

However, if you set all the other routers so they don't have a chance at being elected (give them a priority of 0), you can run into problems if the DR and BDR go offline. The good part is that you'll have some warning generally as the DR goes offline and the BDR is promoted to the DR position. However, if the network segment with both the DR and BDR goes down, your network won't have a way to send hello packets, send updates, or perform the other tasks that the DR performs.

The solution to this is to always allow routers to have a chance to be promoted, even if you set their priority to 1. In that case, they'll be the last choice but if there are no other candidates, you want that router to become the DR. Most networks will have already alerted you to the equipment problems, so this will be a temporary measure to keep the network traffic moving until you can find and fix the problem and get the real DR back online.

Troubleshooting OSPF

As with other dynamic routing protocols, OSPF has some issues that may need troubleshooting from time to time. For basic troubleshooting, see the Troubleshooting Handbook.

Clearing OSPF routes from the routing table

If you think the wrong route has been added to your routing table and you want to check it out, you first have to remove that route from your table before seeing if it's added back in or not. You can clear all or some OSPF neighbor connections (sessions) using the execute router clear ospfCLI command. The exec router clear command is much more limiting for OSPF than it is for BGP. For more information, see BGP.

For example, if you have routes in the OSPF routing table and you want to clear the specific route to IP address 10.10.10.1, you'll have to clear all the OSPF entries. Enter the following CLI command:

execute router clear ospf process

Checking the state of OSPF neighbors

In OSPF, each router sends out link state advertisements to find other routers on its network segment and to create adjacencies with some of those routers. This is important because routing updates are only passed between adjacent routers. If two routers you believe to be adjacent are not, that can be the source of routing failures.

To identify this problem, you need to check the state of the OSPF neighbors of the FortiGate. Use the get router info ospf neighbor all CLI command to see all the neighbors for the FortiGate. You'll see output in the form of the following:

FGT1 # get router info ospf neighbor

OSPF process 0:

Neighbor ID Pri State Dead Time Address Interface

10.0.0.2 1 Full/ - 00:00:39 10.1.1.2 tunnel_wan1

10.0.0.2 1 Full/ - 00:00:34 10.1.1.4 tunnel_wan2

The important information here is the State column. Any neighbors that are not adjacent to the FortiGate are reported in this column as something other than Full. If the state is Down, that router is offline.

Passive interface problems

A passive OSPF interface doesn't send out any updates. This means it can't be a DR, BDR, or an area border router among other things. It depends on other neighbor routers to update its link state table.

Passive interfaces can cause problems when they're not receiving the routing updates you expect from their neighbors. This results in the passive OSPF interface on the FortiGate having an incomplete or out-of-date link state database, and it won't be able to properly route its traffic. It's possible that the passive interface is causing a hole in the network where no routers are passing updates to each other, however, this is a rare situation.

If a passive interface is causing problems, there are simple methods to determine it's the cause. The easiest method is to make it an active interface, and if the issues disappear, then that was the cause. Another method is to examine the OSPF routing table and related information to see if it's incomplete compared to other neighbor routers. If this is the case, you can clear the routing table, reset the device, and allow it to repopulate the table.

If you can't make the interface active for some reason, you have to change your network to fix the hole by adding more routers, or changing the relationship between the passive router’s neighbors to provide better coverage.

Timer problems

A timer mismatch is when two routers have different values set for the same timer. For example, if one router declares a router dead after 45 seconds and another waits for 4 minutes, that difference in time results in the two routers being out of synch for that period of time. One will still see the offline router as being online.

The easiest method to check the timers is to check the configuration on each router. Another method is to sniff some packets, and read the timer values in the packets themselves from different routers. Each packet contains the hello interval and dead interval periods, so you can compare them easily enough.

BFD

Bidirectional Forwarding Detection (BFD) is a protocol that you can use to quickly locate hardware failures in the network. Routers running BFD communicate with each other and if a timer runs out on a connection then that router is declared down. BFD then communicates this information to the routing protocol and the routing information is updated. For more information about BFD, see BFD.

Authentication issues

OSPF has a number of authentication methods you can choose from. You may encounter problems with routers not authenticating as you expect. This will likely appear simply as one or more routers that have a blind spot in their routing and they won't acknowledge a router. This can be a problem if that router connects areas to the backbone, as it'll appear to be offline and unusable.

To confirm this is the issue, the easiest method is to turn off authentication on the neighboring routers. With no authentication between any routers, everything should flow normally.

Another method to confirm that authentication is the problem is to sniff packets and look at their contents. The authentication type and password are right in the packets which makes it easy to confirm they are what you expect during real time. It's possible that one or more routers isn't configured as you expect and may be using the wrong authentication. This method is especially useful if there are a group of routers with these problems since it may be only one router causing the problem that's seen in multiple routers.

Once you have confirmed the problem is related to authentication, you can decide how to handle it. You can turn off authentication and take your time to determine how to get your preferred authentication type back online. You can try another type of authentication, such as text instead of md5, which may have more success and still provide some level of protection. The important part is that once you confirm the problem, you can decide how to fix it properly.

DR and BDR election issues

You can force a particular router to become the DR and BDR by setting its priorities higher than any other OSPF routers in the area. This is a good idea when those routers have more resources to handle the traffic and extra work of the DR and BDR roles, since not all routers may be able to handle all of that traffic.

However, if you set all the other routers so they don't have a chance at being elected (give them a priority of 0), you can run into problems if the DR and BDR go offline. The good part is that you'll have some warning generally as the DR goes offline and the BDR is promoted to the DR position. However, if the network segment with both the DR and BDR goes down, your network won't have a way to send hello packets, send updates, or perform the other tasks that the DR performs.

The solution to this is to always allow routers to have a chance to be promoted, even if you set their priority to 1. In that case, they'll be the last choice but if there are no other candidates, you want that router to become the DR. Most networks will have already alerted you to the equipment problems, so this will be a temporary measure to keep the network traffic moving until you can find and fix the problem and get the real DR back online.