Migrating Mess With DNS to use PowerDNS
About 3 years ago, I announced Mess With DNS in
this blog post, a playground
where you can learn how DNS works by messing around and creating records.
I wasn’t very careful with the DNS implementation though (to quote the release blog
post: “following the DNS RFCs? not exactly”), and people started reporting
problems that eventually I decided that I wanted to fix.
the problems
Some of the problems people have reported were:
- domain names with underscores weren’t allowed, even though they should be
- If there was a CNAME record for a domain name, it allowed you to create other records for that domain name, even if it shouldn’t
- you could create 2 different CNAME records for the same domain name, which shouldn’t be allowed
- no support for the SVCB or HTTPS record types, which seemed a little complex to implement
- no support for upgrading from UDP to TCP for big responses
And there are certainly more issues that nobody got around to reporting, for
example that if you added an NS record for a subdomain to delegate it, Mess
With DNS wouldn’t handle the delegation properly.
the solution: PowerDNS
I wasn’t sure how to fix these problems for a long time – technically I
could have started addressing them individually, but it felt like there were
a million edge cases and I’d never get there.
But then one day I was chatting with someone else who was working on a DNS
server and they said they were using PowerDNS: an open
source DNS server with an HTTP API!
This seemed like an obvious solution to my problems – I could just swap out my
own crappy DNS implementation for PowerDNS.
There were a couple of challenges I ran into when setting up PowerDNS that I’ll
talk about here. I really don’t do a lot of web development and I think I’ve never
built a website that depends on a relatively complex API before, so it was a
bit of a learning experience.
challenge 1: getting every query made to the DNS server
One of the main things Mess With DNS does is give you a live view of every DNS
query it receives for your subdomain, using a websocket. To make this work, it
needs to intercept every DNS query before they it gets sent to the PowerDNS DNS
server:
There were 2 options I could think of for how to intercept the DNS queries:
- dnstap:
dnsdist(a DNS load balancer from the PowerDNS project) has
support for logging all DNS queries it receives using
dnstap, so I could put dnsdist in front of PowerDNS
and then log queries that way - Have my Go server listen on port 53 and proxy the queries myself
I originally implemented option #1, but for some reason there was a 1 second
delay before every query got logged. I couldn’t figure out why, so I
implemented my own very simple proxy instead.
challenge 2: should the frontend have direct access to the PowerDNS API?
The frontend used to have a lot of DNS logic in it – it converted emoji domain
names to ASCII using punycode, had a lookup table to convert numeric DNS query
types (like 1) to their human-readable names (like A), did a little bit of
validation, and more.
Originally I considered keeping this pattern and just giving the frontend (more
or less) direct access to the PowerDNS API to create and delete, but writing
even more complex code in Javascript didn’t feel that appealing to me – I
don’t really know how to write tests in Javascript and it seemed like it
wouldn’t end well.
So I decided to take all of the DNS logic out of the frontend and write a new
DNS API for managing records, shaped something like this:
GET /recordsDELETE /records/<ID>DELETE /records/(delete all records for a user)POST /records/(create record)POST /records/<ID>(update record)
This meant that I could actually write tests for my code, since the backend is
in Go and I do know how to write tests in Go.
what I learned: it’s okay for an API to duplicate information
I had this idea that APIs shouldn’t return duplicate information – for example
if I get a DNS record, it should only include a given piece of information
once.
But I ran into a problem with that idea when displaying MX records: an MX
record has 2 fields, “preference”, and “mail server”. And I needed to display
that information in 2 different ways on the frontend:
- In a form, where “Preference” and “Mail Server” are 2 different form fields (like
10andmail.example.com) - In a summary view, where I wanted to just show the record (
10 mail.example.com)
This is kind of a small problem, but it came up in a few different places.
I talked to my friend Marco Rogers about this, and based on some advice from
him I realized that I could return the same information in the API in 2
different ways! Then the frontend just has to display it. So I started just
returning duplicate information in the API, something like this:
{
values: {'Preference': 10, 'Server': 'mail.example.com'},
content: '10 mail.example.com',
...
}
I ended up using this pattern in a couple of other places where I needed to
display the same information in 2 different ways and it was SO much easier.
I think what I learned from this is that if I’m making an API that isn’t
intended for external use (there are no users of this API other than the
frontend!), I can tailor it very specifically to the frontend’s needs and
that’s okay.
challenge 3: what’s a record’s ID?
In Mess With DNS (and I think in most DNS user interfaces!), you create, add, and delete records.
But that’s not how the PowerDNS API works. In PowerDNS, you create a zone,
which is made of record sets. Records don’t have any ID in the API at all.
I ended up solving this by generate a fake ID for each records which is made of:
- its name
- its type
- and its content (base64-encoded)
For example one record’s ID is brooch225.messwithdns.com.|NS|bnMxLm1lc3N3aXRoZG5zLmNvbS4=
Then I can search through the zone and find the appropriate record to update
it.
This means that if you update a record then its ID will change which isn’t
usually what I want in an ID, but that seems fine.
challenge 4: making clear error messages
I think the error messages that the PowerDNS API returns aren’t really intended to be shown to end users, for example:
Name 'new32site.island358.messwithdns.com.' contains unsupported characters(this error encodes the space as32, which is a bit disorienting if you don’t know that the space character is 32 in ASCII)RRset test.pear5.messwithdns.com. IN CNAME: Conflicts with pre-existing RRset(this talks about RRsets, which aren’t a concept that the Mess With DNS UI has at all)Record orange.beryl5.messwithdns.com./A '1.2.3.4$': Parsing record content (try 'pdnsutil check-zone'): unable to parse IP address, strange character: $(mentions “pdnsutil”, a utility which Mess With DNS’s users don’t have
access to in this context)
I ended up handling this in two ways:
- Do some initial basic validation of values that users enter (like IP addresses), so I can just return errors like
Invalid IPv4 address: "1.2.3.4$ - If that goes well, send the request to PowerDNS and if we get an error back, then do some hacky translation of those messages to make them clearer.
Sometimes users will still get errors from PowerDNS directly, but I added some
logging of all the errors that users see, so hopefully I can review them and
add extra translations if there are other common errors that come up.
I think what I learned from this is that if I’m building a user-facing
application on top of an API, I need to be pretty thoughtful about how I
resurface those errors to users.
challenge 5: setting up SQLite
Previously Mess With DNS was using a Postgres database. This was problematic
because I only gave the Postgres machine 256MB of RAM, which meant that the
database got OOM killed almost every single day. I never really worked out
exactly why it got OOM killed every day, but that’s how it was. I spent some
time trying to tune Postgres’ memory usage by setting the max connections /
work-mem / maintenance-work-mem and it helped a bit but didn’t solve the
problem.
So for this refactor I decided to use SQLite instead, because the website
doesn’t really get that much traffic. There are some choices involved with
using SQLite, and I decided to:
- Run
db.SetMaxOpenConns(1)to make sure that we only open 1 connection to
the database at a time, to preventSQLITE_BUSYerrors from two threads
trying to access the database at the same time (just setting WAL mode didn’t
work) - Use separate databases for each of the 3 tables (users, records, and
requests) to reduce contention. This maybe isn’t really necessary, but there
was no reason I needed the tables to be in the same database so I figured I’d set
up separate databases to be safe. - Use the cgo-free modernc.org/sqlite, which translates SQLite’s source code to Go.
I might switch to a more “normal” sqlite implementation instead at some point and use cgo though.
I think the main reason I prefer to avoid cgo is that cgo has landed me with difficult-to-debug errors in the past. - use WAL mode
I still haven’t set up backups, though I don’t think my Postgres database had
backups either. I think I’m unlikely to use
litestream for backups – Mess With DNS is very far
from a critical application, and I think daily backups that I could recover
from in case of a disaster are more than good enough.
challenge 6: upgrading Vue & managing forms
This has nothing to do with PowerDNS but I decided to upgrade Vue.js from
version 2 to 3 as part of this refresh. The main problem with that is that the
form validation library I was using (FormKit) completely changed its API
between Vue 2 and Vue 3, so I decided to just stop using it instead of learning
the new API.
I ended up switching to some form validation tools that are built into the
browser like required and oninvalid (here’s the code).
I think it could use some of improvement, I still don’t understand forms very well.
challenge 7: managing state in the frontend
This also has nothing to do with PowerDNS, but when modifying the frontend I
realized that my state management in the frontend was a mess – in every place
where I made an API request to the backend, I had to try to remember to add a
“refresh records” call after that in every place that I’d modified the state
and I wasn’t always consistent about it.
With some more advice from Marco, I ended up implementing a single global
state management store
which stores all the state for the application, and which lets me
create/update/delete records.
Then my components can just call store.createRecord(record), and the store
will automatically resynchronize all of the state as needed.
challenge 8: sequencing the project
This project ended up having several steps because I reworked the whole
integration between the frontend and the backend. I ended up splitting it into
a few different phases:
- Upgrade Vue from v2 to v3
- Make the state management store
- Implement a different backend API, move a lot of DNS logic out of the frontend, and add tests for the backend
- Integrate PowerDNS
I made sure that the website was (more or less) 100% working and then deployed
it in between phases, so that the amount of changes I was managing at a time
stayed somewhat under control.
the new website is up now!
I released the upgraded website a few days ago and it seems to work!
The PowerDNS API has been great to work on top of, and I’m relieved that
there’s a whole class of problems that I now don’t have to think about at all,
other than potentially trying to make the error messages from PowerDNS a little
clearer. Using PowerDNS has fixed a lot of the DNS issues that folks have
reported in the last few years and it feels great.
If you run into problems with the new Mess With DNS I’d love to hear about them here.
