Vincent Bernat: Compressing embedded files in Go
Go’s embed feature lets you bundle static assets into an executable, but it
stores them uncompressed. This wastes space: a web interface with documentation
can bloat your binary by dozens of megabytes. A proposition to optionally
enable compression was declined because it is difficult to handle all use
cases. One solution? Put all the assets into a ZIP archive! 🗜️
Code
The Go standard library includes a module to read and write ZIP archives. It
contains a function that turns a ZIP archive into an io/fs.FS
structure that can replace embed.FS in most contexts.1
package embed import ( "archive/zip" "bytes" _ "embed" "fmt" "io/fs" "sync" ) //go:embed data/embed.zip var embeddedZip []byte var dataOnce = sync.OnceValue(func() *zip.Reader { r, err := zip.NewReader(bytes.NewReader(embeddedZip), int64(len(embeddedZip))) if err != nil { panic(fmt.Sprintf("cannot read embedded archive: %s", err)) } return r }) func Data() fs.FS { return dataOnce() }
We can build the embed.zip archive with a rule in a Makefile. We specify the
files to embed as dependencies to ensure changes are detected.
common/embed/data/embed.zip: console/data/frontend console/data/docs common/embed/data/embed.zip: orchestrator/clickhouse/data/protocols.csv common/embed/data/embed.zip: orchestrator/clickhouse/data/icmp.csv common/embed/data/embed.zip: orchestrator/clickhouse/data/asns.csv common/embed/data/embed.zip: mkdir -p common/embed/data && zip --quiet --recurse-paths --filesync $@ $^
The automatic variable $@ is the rule target, while $^ expands to all
the dependencies, modified or not.
Space gain
Akvorado, a flow collector written in Go, embeds several static assets:
- CSV files to translate port numbers, protocols or AS numbers, and
- HTML, CSS, JS, and image files for the web interface, and
- the documentation.

Breakdown of the space used by each component before (left) and after (right) the introduction of embed.zip.
Embedding these assets into a ZIP archive reduced the size of the Akvorado
executable by more than 4 MiB:
$ unzip -p common/embed/data/embed.zip | wc -c | numfmt --to=iec 7.3M $ ll common/embed/data/embed.zip -rw-r--r-- 1 bernat users 2.9M Dec 7 17:17 common/embed/data/embed.zip
Performance loss
Reading from a compressed archive is not as fast as reading a flat file. A
simple benchmark shows it is more than 4× slower. It also allocates some
memory.2
goos: linux goarch: amd64 pkg: akvorado/common/embed cpu: AMD Ryzen 5 5600X 6-Core Processor BenchmarkData/compressed-12 2262 526553 ns/op 610 B/op 10 allocs/op BenchmarkData/uncompressed-12 9482 123175 ns/op 0 B/op 0 allocs/op
Each access to an asset requires a decompression step, as seen in this flame
graph:
CPU flame graph comparing the time spent on CPU when reading data from embed.zip (left) versus reading data directly (right). Because the Go testing framework executes the benchmark for uncompressed data 4 times more often, it uses the same horizontal space as the benchmark for compressed data. The graph is interactive.
While a ZIP archive has an index to quickly find the requested file, seeking
inside a compressed file is currently not possible.3 Therefore, the files
from a compressed archive do not implement the io.ReaderAt or io.Seeker
interfaces, unlike directly embedded files. This prevents some features, like
serving partial files or detecting MIME types when serving files over HTTP.
For Akvorado, this is an acceptable compromise to save a few mebibytes from an
executable of almost 100 MiB. Next week, I will continue this futile adventure
by explaining how I prevented Go from disabling dead code elimination! 🦥
-
You can safely read multiple files concurrently. However, it does
not implementReadDir()andReadFile()methods. ↩︎ -
You could keep frequently accessed assets in memory. This
reduces CPU usage and trades cached memory for resident memory. ↩︎ -
SOZip is a profile that enables fast random access in a compressed
file. However, Go’sarchive/zipmodule does not support it. ↩︎
