Allgemein

Vincent Bernat: Compressing embedded files in Go

Vincent Bernat: Compressing embedded files in Go

Go’s embed feature lets you bundle static assets into an executable, but it
stores them uncompressed. This wastes space: a web interface with documentation
can bloat your binary by dozens of megabytes. A proposition to optionally
enable compression
was declined because it is difficult to handle all use
cases. One solution? Put all the assets into a ZIP archive! 🗜️

Code

The Go standard library includes a module to read and write ZIP archives. It
contains a function that turns a ZIP archive into an io/fs.FS
structure that can replace embed.FS in most contexts.1

package embed

import (
  "archive/zip"
  "bytes"
  _ "embed"
  "fmt"
  "io/fs"
  "sync"
)

//go:embed data/embed.zip
var embeddedZip []byte

var dataOnce = sync.OnceValue(func() *zip.Reader {
  r, err := zip.NewReader(bytes.NewReader(embeddedZip), int64(len(embeddedZip)))
  if err != nil {
    panic(fmt.Sprintf("cannot read embedded archive: %s", err))
  }
  return r
})

func Data() fs.FS {
  return dataOnce()
}

We can build the embed.zip archive with a rule in a Makefile. We specify the
files to embed as dependencies to ensure changes are detected.

common/embed/data/embed.zip: console/data/frontend console/data/docs
common/embed/data/embed.zip: orchestrator/clickhouse/data/protocols.csv 
common/embed/data/embed.zip: orchestrator/clickhouse/data/icmp.csv
common/embed/data/embed.zip: orchestrator/clickhouse/data/asns.csv
common/embed/data/embed.zip:
    mkdir -p common/embed/data && zip --quiet --recurse-paths --filesync $@ $^

The automatic variable $@ is the rule target, while $^ expands to all
the dependencies, modified or not.

Space gain

Akvorado, a flow collector written in Go, embeds several static assets:

  • CSV files to translate port numbers, protocols or AS numbers, and
  • HTML, CSS, JS, and image files for the web interface, and
  • the documentation.
Breakdown of space used by each package before and after introducing embed.zip. It is displayed as a treemap and we can see many embedded files replaced by a bigger one.

Breakdown of the space used by each component before (left) and after (right) the introduction of embed.zip.

Embedding these assets into a ZIP archive reduced the size of the Akvorado
executable by more than 4 MiB:

$ unzip -p common/embed/data/embed.zip | wc -c | numfmt --to=iec
7.3M
$ ll common/embed/data/embed.zip
-rw-r--r-- 1 bernat users 2.9M Dec  7 17:17 common/embed/data/embed.zip

Performance loss

Reading from a compressed archive is not as fast as reading a flat file. A
simple benchmark shows it is more than 4× slower. It also allocates some
memory.2

goos: linux
goarch: amd64
pkg: akvorado/common/embed
cpu: AMD Ryzen 5 5600X 6-Core Processor
BenchmarkData/compressed-12     2262   526553 ns/op   610 B/op   10 allocs/op
BenchmarkData/uncompressed-12   9482   123175 ns/op     0 B/op    0 allocs/op

Each access to an asset requires a decompression step, as seen in this flame
graph:

🖼 Flame graph when reading data from embed.zip compared to reading data directly

CPU flame graph comparing the time spent on CPU when reading data from embed.zip (left) versus reading data directly (right). Because the Go testing framework executes the benchmark for uncompressed data 4 times more often, it uses the same horizontal space as the benchmark for compressed data. The graph is interactive.

While a ZIP archive has an index to quickly find the requested file, seeking
inside a compressed file is currently not possible.3 Therefore, the files
from a compressed archive do not implement the io.ReaderAt or io.Seeker
interfaces, unlike directly embedded files. This prevents some features, like
serving partial files or detecting MIME types when serving files over HTTP.


For Akvorado, this is an acceptable compromise to save a few mebibytes from an
executable of almost 100 MiB. Next week, I will continue this futile adventure
by explaining how I prevented Go from disabling dead code elimination! 🦥


  1. You can safely read multiple files concurrently. However, it does
    not implement ReadDir() and ReadFile() methods. ↩︎

  2. You could keep frequently accessed assets in memory. This
    reduces CPU usage and trades cached memory for resident memory. ↩︎

  3. SOZip is a profile that enables fast random access in a compressed
    file. However, Go’s archive/zip module does not support it. ↩︎