Running OpenDNSSEC with 50000 zones

Getting OpenDNSSEC to run as a signer for a very large amount of zones is today not a trivial task. Running the software with a thousand zones is not a big deal. In this blog post I will outline how you can reach an even larger amount of zones – 50000 – using version 1.2.0 of OpenDNSSEC and SoftHSM. Each zone contains about 20 resource records.

Configuration

First off, you should consider the amount keys that are going to be used. If all of the 50000 zones were to use their own keys for signing, this would amount to a very large number of keys. SoftHSM cannot currently really handle this amount of keys. A simple workaround for this is to use the SharedKeys statement in policy for the zones we’re going to sign (kasp.xml). This will make all zones placed under this policy to share the keys generated, and we will have a very small number of them. Many registrars might not want to do this, since they consider to each customer to be their own entity, and as such should own their own private key. Since the keys are all placed in the same storage a key compromise of one key will make all keys compromised. So having different keys for all zones don’t make much sense in this sense. This decision must be up to the registrar together with the customers to decide – but remember that the since the zones are under the management by the registrar in this case, the registrar has all the power it needs to change any zone content anyway. In this example, and for the convenience of not splitting up the zones in different policies and HSMs I will just go ahead and run this with SharedKeys.

Some other hints on configuring a policy for a large amount of zones is to have long signature lifetimes and longer resign intervals. So a nice signature lifetime could be perhaps 20 days, with a refresh of 10 days. Also, minimize the amount of key rollovers to a minimum. If you want to do KSK rollovers, you most likely want to synchronize these rollovers with your parent. For most parent zones, this would most likely mean a lot of messages going to your parent registry, for each step of the rollover. You would probably only like to roll the key in case of some sort of key compromise. In that case, set the lifetime of the KSK to a really high value, for example 10 years. ZSK rollovers is easier to do, I would propose a ZSK key rollover every 3 months.

The enforcer part of OpenDNSSEC uses a database backend to keep track of all the zones and their keys. The default configuration is to use SQLite3 for this purpose, but in order to avoid blocking and to increase performance we have to use MySQL. The choice of database is done at compile time. The enforcer is also the component in the current version of OpenDNSSEC that has the hardest time to cope with this amount of domains as it does not have any queueing mechanisms. More on this later.

The auditor is a useful component of OpenDNSSEC. The purpose of the auditor is to make a good audit of the signed output of the system so that no errors occur, and that the signer actually fulfills the policy of the system. This is most useful for debugging purposes, and for high-security zones. It might not be as useful under these conditions though, even if you want to do an initial test of the system using the auditor. But for now, just disable the auditor – it puts a big load on the system.

Hardware

The hardware is also important. To maximize the performance of the CPU you want many cores, since you can do parallel signing of zones. I have set the WorkerThreads in conf.xml for the signer to 8, but only running 4 cores – this doesn’t maximize the CPU but will make it work hard. Memory is also an important factor. The signer caches all signatures in memory. So the signer process will need about 2GB of memory to cache all signature for 50000 zones. Also make sure that there is enough memory for file cacheing. I would recommend having at least 8GB memory for running OpenDNSSEC in total, but you could probably do with 4GB. If you are also running BIND or NSD for loading the zones in DNS, you should account for this memory usage as well. The disk space for storing all unsigned and signed zones and all of the temporary files is just below 3GB.

Importing

So now that we have a system and a configuration, we should start to import the zones into the system, and generate signer configurations. Both the enforcer and the signer uses the zonelist.xml file to keep track of all zones. But the first batch of importing all zones will not immediately sync the database and the zonelist.xml file. Import all zones like this (for each zone):

ods-ksmutil zone add --zone $zone --policy $policy --no-xml

When you’re done you have all zones configured in the database, but you must now also export them to zonelist.xml:

ods-ksmutil zonelist export > /etc/opendnssec/zonelist.xml

Version 1.2.0 together with MySQL adds three informational lines at the top of zonelist.xml, you must removed those lines in the file. This is fixed in version 1.2.1.

Running the enforcer

We still have not started to run the system. You can start the system with ods-control start, but we would like to avoid this, as this starts both the enforcer and the signer at the same time. To have more control we would just like to run the enforcer once, and see what happens:

ods-enforcerd -1

The -1 flag makes the enforces run through all zones and then exit. When the enforcer is done we should have a signer configuration for all zones in /var/opendnssec/signconf.

When you run the enforcer in this fashion it will generate a lot of syslog messages, and your syslog daemon might think that this is some sort of denial of service attack. My version of rsyslog complains like this:
imuxsock begins to drop messages from pid 27991 due to rate-limiting
You could probably fix this, or just let it be.

Running the signer

When the enforcer is done you may want to start signing all zones. Just start the signer:

ods-signer start

This might take a while. A one line status report for each signed zone is logged to syslog.

Once the signer is up and running, you will have no problem of just let it handle its stuff. It will resign all the zones as mandated by the policy.

However, it is likely that you want to have a DNS server running, serving all zones. So you need to tell that server to reload the new signed zone. This is done through the in conf.xml. Make sure to just reload only the zone that has been signed, and not all the zones.

Sending keys

Then there is the publishing of the DS to the parent. If you want to you can use which is also in conf.xml – this is a simple enough interface. The program you specify here is receiving the current set of DNSKEY records to be published to the parent. Using this interface you can make your own conversion to DS, or whatever you need to provide for your parent. Since we are using SharedKeys in this example, you already know what keys to use for any new zones you add to the system. This process is entirely depending on your setup, and how you communicate with your parent. There is an example e-mail client in the source code for OpenDNSSEC, and you also have an EPP-client if this is what you need. If you have 50000 zones, there will be a lot of keys sent to your parent, so you might want to give this process some extra thought.

Adding and removing zones

You probably don’t have a static set of zones to sign. To be fair, OpenDNSSEC 1.2 does not have an easy process for adding and removing zones when you run it with this amount of zones. It is however easy enough if you have some hundreds of zones. But today what happens when you import a new zone to your system is that the zone is imported into its database – and when the enforcer runs the next time (either manually or as a scheduled daemon) it runs through all of the zones to determine which signer configurations that has to be updated. This takes a while.

Again, to import a new zone, do like this:

ods-ksmutil zone add --zone $zone_name --no-xml

If you only want to import one new zone, skip the –no-xml flag. Without it, the ksmutil will synchronize the zonelist.xml file by itself.

Once the enforcer runs again, there will be a new signer configuration file that the signer can pick up and run with.

You delete a zone like this:

ods-ksmutil zone delete $zone_name

Same problems as with adding a zone, so you might want to use the –no-xml flag if doing larger batch jobs.

My proposal for adding and removing zones today is to do this by larger batches, maybe a couple of times per day. By doing this you minimize the amount of work that the enforcer has to do.

Backups and Failover

If you run this kind of set up in production, this is one of the most central components in your DNS environment, so you probably at least want to have a backup. What is most important is to have a backup of the keys (in this case the SoftHSM database and configuration) and the KASP database. You can automatically replicate the KASP database with the MySQL replication features, or just simply use mysqldump to export all database content to a backup file. So what is missing now is the signature in all zones. These are stored in the tmp directory, so if necessary you can copy all files in this directory to a secondary machine.

Future work

As you might have guessed we are still some way from handling this amount of zones without carefully considering how we set up the system. We clearly is a need to improve on the zonelist.xml handling, and we are also going to address the communication between the enforcer and the signer.

When adding a new zone it does not immediately take effect. You will probably want to runt the enforcer for just the new zones, so they get a signer configuration as soon as possible.

Also when you initially imported all the zones you might have discovered that this is getting increasingly slower the more zones you have in the system. We are currently investigating this behavior, as this has to be improved as well.

If you have tried to set up OpenDNSSEC to handle 50000 zones you might discover that there is a high load on the system once the signer gets around to sign all zones. When its time to resign all of them, you will get exactly the same load, as the timing is the same. So what we need is Resign Jitter, so that the resigning of the zones is spread out over time. I have experimented with this, and it seems to work fine. This is also needed in order for new zones to get signed in time – we can’t have a system that is blocking the queue for new zones. The latter is already in place, so a new feature is to have the jitter available as an optional configuration.

Same as with resigning jitter, there should probably be some way to spread out the KSK rollovers so that they all don’t happen at the same time. We might consider adding some sort of Rollover Jitter.

If you have any questions or want to discuss the development of features that are needed to make OpenDNSSEC handle even larger amount of zones, you are more than welcome to participate on the OpenDNSSEC users list.

Patrik Wallström, .SE

Comments are closed.