Making iptables changes atomically and not dropping packets.

I'm working on rolling out iptables rules to all of our servers at work. It's not a totally simple task, as many things can go wrong.

The first problem is the one where you can shoot yourself in the foot. Install a new set of rules for testing on a remote server, and suddenly your ssh session stops responding. I covered how to work around that in a previous post.

Another problem is ensuring you make your firewall changes atomically. All rules pushed in a single step. In linux, if you have a script with many lines of 'iptables' invocations, running it will make one rule change per iptables command. And what if you write your rules like this?

# Flush rules so we can install our new ones.
iptables -F

# First rule, drop input by default
iptables -P INPUT DROP

# Other rules here...
iptables -A INPUT ... -j ACCEPT
iptables -A INPUT ... -j ACCEPT
If your server is highly trafficked, then the delay between the 'DROP' default and accept rules can mean dropped traffic. That sucks. This is an example of a race condition. Additionally, there's a second race condition earlier in the script where, depending on the default rule for INPUT, we may drop or accept all traffic for a very short period. Bad.

One other problem I thought could occur was a state tracking problem with conntrack. If previously we weren't using conntrack, what would happen to existing connections when I set default deny and only allowed connections that were established? Something like this:

iptables -P INPUT DROP
iptables -A INPUT -i eth0 -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT
iptables -A INPUT -p tcp -m tcp --dport 22 --syn -j ACCEPT
I did some testing with this, and I may be wrong here, but it does not drop my existing sessions as I had predicted. This is a good thing. Turns out, when this runs, the 'conntrack' table is populated with existing connections from the network stack. This further helps us not drop traffic when pushing new firewall rules. You can view the current conntrack table in the file /proc/net/ip_conntrack.

What options do we have for atomically applying a bunch of rules so we don't drop traffic? The iptables tool set comes with 'iptables-save' which lets you save your existing iptables rules to a file. I was unable to find any documentation on the exact format of this file, but it seems easy enough to read. The output includes rules and counters for each table and chain. Counters are optional.

All the documentation I've read indicates that using 'iptables-restore' will apply all of the rules atomically. This lets us set a pile of rules all at once without any race conditions.

So I generate an iptables-restore file and use iptables-restore to install it. No traffic dropped. I'm generating it with a shell script, so there was one gotcha - I basically take iptables commands and output them to a file. I do this with a shell function I wrote, called 'addrule'. However, I have some rules like this:

addrule -A INPUT -p tcp -m limit --limit 5/min -j LOG --log-prefix "Denied TCP: " --log-level debug
I quoted the argument in the addrule invocation, but we need to also produce a quoted version in our iptables-restore rule file, otherwise --log-prefix will get set to 'Denied' and we'll also fail because 'TCP:' is not an option iptables expects. It appears to be safe to quote all arguments in the iptables-restore files except for lines declaring chain counters (like ':INPUT ACCEPT [12345:987235]'), defining tables (like '*filter'), or the 'COMMIT' command. Instead of quoting everything, I just quote everything with spaces in an argument.

The fix makes my 'addrule' function look like this:


addrule() {
  while [ $# -gt 0 ] ; do
    # If the current arg has a space in it, output "arg"
    if echo "$1" | grep -q ' '  ; then
      echo -n "\"$1\""
      echo -n "$1"
    [ $# -gt 1 ] && echo -n " "
  done >> $rulefile
  echo >> $rulefile

# So this:
#   addrule -A INPUT -j LOG --log-prefix "Hello World"
# will output this to the $rulefile
#   -A INPUT -j LOG --log-prefix "Hello World"
So now the quoted arguments stay quoted. All of that madness is in the name of being able to simple replace 'iptables' with 'addrule' and you're good to go. No extra formatting changes necessary.

One last thing I did was to make sure iptables-restore didn't reject my file, and if it did, to tell me:

if iptables-restore -t $rulefile ; then
  echo "iptables restore test successful, applying rules..."
  iptables-restore -v $rulefile
  rm $rulefile
  echo "iptables test failed. Rule file:" >&2
  echo "---" >&2
  cat $rulefile >&2
  rm $rulefile
  exit 1
Throw this script into puppet and we've got automated firewall rule management that won't accidentally drop traffic on rule changes.

Resetting your firewall (iptables) during testing

Ever configured a firewall remotely? Ever blocked yourself and had to get physical hands to fix it?

Kind of sucks.

So you're going to start playing with some new firewall rules, but you learned from the past and now you have a cron(8) or at(8) job that will reset the firewall rules to permissive every so often, just in case you lock yourself out.

I used to do that. Until I realized today that I'm frankly too lazy to wait the N minutes I'll have to wait for my at(8) job to kick in.

Now I sniff packets and have a script trigger from that.

On the remote server, I'll use ngrep to watch for a specific payload in an icmp echo packet. This works because bpf(4) gets packets before the firewall has a chance to filter them, meaning even if you deny all packets, bpf(4) (libpcap, tcpdump, ngrep, etc) will still see those packets. Here's the script I use on the remote server:

# Look for any icmp echo packets containing the string 'reset-iptables'
ngrep -l -Wnone -d any 'reset-iptables' 'icmp and icmp[icmptype] = icmp-echo' \
| grep --line-buffered '^I ' \
| while read line ; do 
  iptables -F
  iptables -P INPUT ACCEPT
  iptables -P OUTPUT ACCEPT
  iptables -P FORWARD ACCEPT

The ngrep line will output this whenever it sees a matching packet:

remotehost% ngrep -l -Wnone -d any 'reset-iptables' 'icmp and icmp[icmptype] = icmp-echo'
interface: any
filter: (ip) and ( icmp and icmp[icmptype] = icmp-echo )
match: reset-iptables
We'll grep for just the 'I' line, then trigger a full firewall reset.

I couldn't figure out how to use ping(8) and set a specific payload, so I'll use scapy.

workstation% echo 'sr1(IP(dst="")/ICMP(type="echo-request")/"reset-iptables")' | sudo scapy
Now, if I accidentally lock myself out through firewall rule changes, I can trivially reset them using that 'echo | scapy' onliner.

Obviously, I don't keep the reset script running after the firewall rules are tested and known-good, but it's a great instant-gratification means to solving the locked-out problem you may face when testing new firewall rules.