HA network gateway with keepalived and conntrackd

Why bother?

I’ve always got long-running connections from my network out to the internet (mainly IRC) and historically, updating my router with the latest security fixes and/or kernels has been a real pain because it requires a reboot of the router. This meant everything got disconnected. Also, occasionally, I’d be playing with the router and something would break. Again, everything would get disconnected. I wanted a way to be able to update or work on my router without losing connectivity.

Introduction and sick ASCII network diagram

This is going to be a basic ACTIVE/BACKUP high availability (HA) setup for a network gateway. Both network gateways are running Ubuntu Server 16.04 LTS and have three NICs. The first NIC (eth0) is connected to the modem. The second NIC (eth1) is connected to the switch that feeds the LAN. The third NIC (eth2) is connected directly to the third NIC on the other router. The direct connection is for conntrackd. Documentation for conntrackd suggests that its synchronization can be bandwidth intensive and they recommend a dedicated interface. However, it is not necessary and you can just as easily use the LAN interface for conntrackd. My routers have three NICs so I went ahead and connected them directly.

         [___Modem 10.0.0.1/24____]
         |                        |
  --eth0--     WAN Virtual IP     --eth0--
  10.0.0.11       10.0.0.10       10.0.0.12
  |                                      |
[master]--10.10.1.1--eth2--10.10.1.2--[backup]
  |                                      |
 eth1          LAN Virtual IP           eth1
 192.168.0.1     192.168.0.3     192.168.0.2
  |                                      |
--------------------------------------------
|            LAN 192.168.0.0/24            |
--------------------------------------------

There are two components to this setup: conntrackd and keepalived. keepalived is responsible for creating the virtual IP addresses on the network and moving it between routers in the event of a failover. conntrackd keeps track of the state of network connections and synchronizes them between routers. This means that in the event of a failover, no connections are lost. conntrackd is not strictly necessary, keepalived can still move the IP between routers in the event that one fails, it just comes at the cost of losing all of your existing connections. conntrackd wasn’t particularly difficult to set up or configure, so I highly recommend using it.

Setting up the firewalls

Since my routers are Ubuntu, we’re going to use iptables-persistent to manage the firewall. It’s worth noting that since iptables traverses the chains linearly, rules that will process the most traffic should come before rules that process less traffic. For example, since these are gateways and they’ll be managing the traffic for all of the systems behind them, the first rule in the forward chain should be accepting traffic for any connection that is already being tracked by the firewall.

Also, you’ll note that the policy for the filter chains is set to DROP. This means that the firewall on the routers will drop any packet that is not part of a connection that it knows about. The reason for this is the following scenario: You’ve got an active TCP stream going (e.g., a large download) and your master router dies. The other end of the connection sends some more data not knowing the master is down. keepalived starts the failover process and moves the virtual IP to the backup. BUT conntrackd hasn’t been able to commit the external cache on the backup yet. The backup doesn’t know about this stream and sees what looks like an errant TCP packet. The TCP/IP stack on the backup router sends an RST to tell the remote host that it doesn’t know WTF is going on and the connection dies. With a default DROP policy, that errant packet is silently ignored by the backup router, TCP retransmission kicks in, the backup router has time to update its internal cache, and the connection continues uninterrupted.

master# apt install iptables-persistent
backup# apt install iptables-persistent
*nat
:PREROUTING ACCEPT [0:0]
:INPUT ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
:POSTROUTING ACCEPT [0:0]
-A POSTROUTING -o eth0 -j MASQUERADE
COMMIT
*filter
:INPUT DROP [0:0]
:FORWARD DROP [0:0]
:OUTPUT DROP [0:0]
:LOGGING - [0:0]
# Accept any traffic on the loopback interface.
-A INPUT -i lo -j ACCEPT
# Accept any traffic destined for this server that's part of an already-tracked connection.
-A INPUT -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
# Accept any multicast VRRP traffic destined for 224.0.0.0/8 (this is how keepalived communicates).
-A INPUT -d 224.0.0.0/8 -p vrrp -j ACCEPT
# Accept any multicast traffic destined for 225.0.0.50 (this is how conntrackd communicates).
-A INPUT -d 225.0.0.50 -j ACCEPT
# Accept any traffic on the interface with the direct connection between routers.
-A INPUT -i eth2 -j ACCEPT
# Jump to the LOGGING chain.
-A INPUT -j LOGGING
# Accept any traffic for systems behind the NAT that's part of an already-tracked connection.
-A FORWARD -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
# Allow outbound connections from systems behind the NAT.
-A FORWARD -s 192.168.0.0/24 -i eth1 -o eth0 -m conntrack --ctstate NEW -j ACCEPT
# Allow any outbound traffic on the loopback interface.
-A OUTPUT -o lo -j ACCEPT
# Allow outbound multicast VRRP traffic destined for 224.0.0.0/8 (this is how keepalived communicates).
-A OUTPUT -d 224.0.0.0/8 -p vrrp -j ACCEPT
# Allow outbound multicast traffic destined for 225.0.0.50/8 (this is how conntrackd communicates).
-A OUTPUT -d 225.0.0.50 -j ACCEPT
# Allow outbound ICMP on the WAN interface.
-A OUTPUT -o eth0 -p icmp -j ACCEPT
# Allow outbound DNS on the WAN interface.
-A OUTPUT -o eth0 -p tcp -m tcp --dport 53 -j ACCEPT
-A OUTPUT -o eth0 -p udp -m udp --dport 53 -j ACCEPT
# Allow outbound NTP on the WAN interface.
-A OUTPUT -o eth0 -p udp -m udp --dport 123 -j ACCEPT
# Allow outbound HTTP/HTTPS on the WAN interface.
-A OUTPUT -o eth0 -p tcp -m multiport --dports 80,443 -j ACCEPT
# Allow any outbound traffic on the interface with the direct connection between routers.
-A OUTPUT -o eth2 -j ACCEPT
# Log (to syslog) any traffic that's going to be dropped with a limit of 2 entries per minute.
-A LOGGING -m limit --limit 2/min -j LOG --log-prefix "DROP: " --log-level 7
COMMIT

Load the iptables rules on each router so that everything is ready for the installation.

master# iptables-restore < /etc/iptables/rules.v4
backup# iptables-restore < /etc/iptables/rules.v4

Installing and configuring conntrackd

master# apt install conntrackd
master# cp /etc/conntrackd/conntrackd.conf{,.original}

We’re going to use the basic FTFW[1] sync configuration provided with conntrackd. My understanding is that FTFW provides a reliable form of synchronization of connection states between conntrackd on each system.

master# gunzip /usr/share/doc/conntrackd/examples/sync/ftfw/conntrackd.conf.gz
master# cp /usr/share/doc/conntrackd/examples/sync/ftfw/conntrackd.conf /etc/conntrackd/

The primary-backup.sh script provided by conntrackd is what will be triggered by notifications from keepalived. It will force a synchronization of the connection states between routers when failing over or failing back.

master# cp /usr/share/doc/conntrackd/examples/sync/primary-backup.sh /etc/conntrackd/

backup# apt install conntrackd
backup# cp /etc/conntrackd/conntrackd.conf{,.original}
backup# gunzip /usr/share/doc/conntrackd/examples/sync/ftfw/conntrackd.conf.gz
backup# cp /usr/share/doc/conntrackd/examples/sync/ftfw/conntrackd.conf /etc/conntrackd/
backup# cp /usr/share/doc/conntrackd/examples/sync/primary-backup.sh /etc/conntrackd/

The basic conntrackd configuration is pretty much ready to do, there are just a few things that you need to update. First, set the appropriate interface for communication between conntrackd on the routers. Update the IPv4_interface and the Interface. Second, add the local addresses of the interfaces on the router to the ignore list. This is because it doesn’t really make sense to synchronize connections that are made directly to a specific router with the other router. We’ll do the same configuration on the backup, substituting in the appropriate values. Note that the files below are not complete, they are changes to the default FTFW conntrackd.conf that we extracted.

Multicast {
  ...
  IPv4_interface 10.10.1.1
  ...
  Interface eth2
  ...
}

General {
  ...
  Filter From Userspace {
    ...
    Address Ignore {
      IPv4_address 127.0.0.1
      IPv4_address 10.0.0.11
      IPv4_address 192.168.0.1
      IPv4_address 10.10.1.1
    }
  }
}
Multicast {
  ...
  IPv4_interface 10.10.1.2
  ...
  Interface eth2
  ...
}

General {
  ...
  Filter From Userspace {
    ...
    Address Ignore {
      IPv4_address 127.0.0.1
      IPv4_address 10.0.0.12
      IPv4_address 192.168.0.2
      IPv4_address 10.10.1.2
    }
  }
}
master# systemctl restart conntrackd
backup# systemctl restart conntrackd

Verify that the connections are being tracked:

master# conntrackd -s
cache internal:
current active connections:              148
connections created:                     172    failed:            0
connections updated:                      40    failed:            0
connections destroyed:                    24    failed:            0

cache external:
current active connections:                1
connections created:                       1    failed:            0
connections updated:                       0    failed:            0
connections destroyed:                     0    failed:            0

traffic processed:
                   0 Bytes                         0 Pckts

multicast traffic (active device=enp6s0):
                5844 Bytes sent                  360 Bytes recv
                  90 Pckts sent                   18 Pckts recv
                   0 Error send                    0 Error recv

message tracking:
                   0 Malformed msgs                    0 Lost msgs

The internal cache is the set of connections being tracked by the master. The external cache is the set of connections being tracked by the backup. If you want to view the connections being handled by either, you can use conntrackd -i for the internal cache or conntrackd -e for the external cache. If you’ve configured conntrackd to track UDP, even when the backup is not the master, it will have one active connection for conntrackd’s multicast broadcasting.

Installing and configuring keepalived

We’ll be using keepalived to provide the virtual gateway IP.

Side note: This is where things got difficult in my setup. I’ve always run the modems in bridge mode so that my router was assigned the public IP address from the ISP, but I could not find any documentation on using keepalived in a situation where the WAN interface (and therefore WAN virtual IP) were dynanically assigned.[2] In my virtualized test setup, although both the master and backup got their WAN addresses via DHCP I had control of the range–and more than one IP address–so I could set the virtual IP to anything in that range and it wasn’t an issue. I worked around this by disabling the modem’s Bridge Mode which put the routers on a private network with the modem as the gateway. Then I configured the WAN virtual IP address on the modem’s subnet and set the modem’s DMZ host to be the WAN virtual IP address. It’s not as clean as I’d like, but without some method of setting that WAN virtual IP via DHCP it was the best solution I could come up with.

master# apt install keepalived
backup# apt install keepalived

Both routers need to share the same basic configuration with a few minor differences:

  • The state needs to be BACKUP instead of MASTER.
  • The priority needs to be lower than the priority of the MASTER.
vrrp_sync_group router-cluster {
    group {
        router-cluster-wan
        router-cluster-lan
    }
    notify_master "/etc/conntrackd/primary-backup.sh primary"
    notify_backup "/etc/conntrackd/primary-backup.sh backup"
    notify_fault "/etc/conntrackd/primary-backup.sh fault"
}

vrrp_instance router-cluster-wan {
    state MASTER
    interface eth0
    virtual_router_id 10
    priority 100
    virtual_ipaddress {
        10.0.0.10/24 brd 10.0.0.255 dev eth0
    }
}

vrrp_instance router-cluster-lan {
    state MASTER
    interface eth1
    virtual_router_id 11
    priority 100
    virtual_ipaddress {
        192.168.0.3/24 brd 192.168.0.255 dev eth1
    }
}
vrrp_sync_group router-cluster {
    group {
        router-cluster-wan
        router-cluster-lan
    }
    notify_master "/etc/conntrackd/primary-backup.sh primary"
    notify_backup "/etc/conntrackd/primary-backup.sh backup"
    notify_fault "/etc/conntrackd/primary-backup.sh fault"
}

vrrp_instance router-cluster-wan {
    state BACKUP
    interface eth0
    virtual_router_id 10
    priority 50
    virtual_ipaddress {
        10.0.0.10/24 brd 10.0.0.255 dev eth0
    }
}

vrrp_instance router-cluster-lan {
    state BACKUP
    interface eth1
    virtual_router_id 11
    priority 50
    virtual_ipaddress {
        192.168.0.3/24 brd 192.168.0.255 dev eth1
    }
}

Finally, the routers are ready to go, but there’s still one major change that needs to be made on the network. That is to change the default gateway on all of the LAN hoststo the virtual IP address configured in keepalived. In this case, 192.168.0.3 is the virtual IP address that will move between routers as needed. I use (mostly) DHCP on my network, so I updated the configuration to change the DHCP router option from 192.168.0.1 to 192.168.0.3. Be sure to update any hosts that are configured with static addresses to use the new virtual IP address for the gateway otherwise you will not have the benefit of HA routing.


References


Footnotes

1 I can’t find documentation on this anywhere, but I think that FTFW stands for “Fault Tolerant FireWall”. It’s the protocol that conntrackd uses to reliably transfer state.

2 I saw that the OpenWRT wiki hosted a high availability recipe that notes, “DHCP dynamic WAN IP is possible with keepalived, but requires extra scripting and is not going to be described here.”

Leave a Reply

Your email address will not be published. Required fields are marked *