Zum Inhalt springen

JSONB DeTOASTing (read amplification)

PostgreSQL limits tuple sizes to a quarter of the block size, generally capping at 2KB. In document data modeling, where documents represent business transactions, sizes often exceed this limit. Storing entire transaction documents as a single JSONB can lead to compression and splitting via TOAST (The Oversized-Attribute Storage Technique). While this is suitable for static documents that are infrequently accessed, it is less optimal for queries on documents. Let’s take an example to detect this issue.

I create a user profile table similar to the previous post, but adding a bio field with large text:


create table users (
  id bigserial primary key,
  data jsonb not null
);

INSERT INTO users (data)
SELECT
  jsonb_build_object(
    'name', 'u' || n::text,
    'bio', (SELECT string_agg(chr(trunc(random() * (126 - 33) + 33)::int), '')   FROM generate_series(1, 5000)),
    'email', jsonb_build_array(
      'u' || n::text || '@compuserve.com'
    ),
    'login', jsonb_build_object(
      'last', to_char(current_timestamp, 'YYYY-MM-DD'),
      'count', 1
    )
  )
FROM generate_series(1, 100000) n
;
vacuum analyze users
;

I check the size of the table, and also the TOAST overflow:

with users_cte as (
  select * from pg_class
  where oid = 'users'::regclass
)
select oid, relname, relpages, reltuples, reltoastrelid
 from users_cte
union all
select oid, relname, relpages, reltuples, reltoastrelid
 from pg_class
where oid = (select reltoastrelid from users_cte)
;

  oid  |    relname     | relpages | reltuples | reltoastrelid 
-------+----------------+----------+-----------+---------------
 32314 | users          |      736 |    100000 |         32318
 32318 | pg_toast_32314 |    71430 |    300000 |             0

The table contains 100000 rows across 736 pages, with three chunks per row stored externally in the TOAST table. This results in a total of 736 + 71430 = 72166 pages. Each chunk is approximately (71430 * 8192) / 300000 = 1951 bytes, ensuring that tuples remain under 2KB.

I use EXPLAIN ANALYZE to query all documents, which shows the number of pages accessed. To retrieve the JSON document, I apply SERIALIZE, as EXPLAIN does not fetch it by default:

explain (analyze, verbose, buffers, wal, serialize text, costs off)
select id, data
from users
;

                                QUERY PLAN                                
--------------------------------------------------------------------------
 Seq Scan on public.users (actual time=0.040..10.973 rows=100000 loops=1)
   Output: id, data
   Buffers: shared read=736
 Planning Time: 0.038 ms
 Serialization: time=2482.359 ms  output=509831kB  format=text
   Buffers: shared hit=384887 read=72253
 Execution Time: 2504.164 ms

The table scan read 736 pages from the base table (shared read=736), while reading the JSONB content accessed 72253 pages from the TOAST (shared read=72253). Each TOAST page was read an average of 384887 / 72253 = 5 times, fortunately staying in the shared buffer cache (shared hit=384887) but accessing to shared buffers costs CPU and lightweight locks.

We observe a fivefold read amplification when querying the JSONB column, as it requires de-toasting.

In a document database, you can retrieve the entire document or access specific fields for filtering and sorting. For example, I include a projection of the „username“ field.

explain (analyze, verbose, buffers, wal, serialize text, costs off)
select id, data
, (data->>'username')
from users
;

                                QUERY PLAN                                 
---------------------------------------------------------------------------
 Seq Scan on public.users (actual time=0.085..532.367 rows=100000 loops=1)
   Output: id, data, (data ->> 'username'::text)
   Buffers: shared hit=384887 read=72989
 Planning Time: 0.039 ms
 Serialization: time=2276.025 ms  output=510222kB  format=text
   Buffers: shared hit=457140
 Execution Time: 2819.235 ms

PostgreSQL lacks optimization for accessing JSONB, which may lead to multiple de-toasting. As indicated in the Output of the EXPLAIN VERBOSE, the scan has two projections in addition to the whole document, and access many times to the TOAST pages as indicated by shared hit=384887 read=72989. Retrieving the whole document causes an additional shared hit=457140.

I continue by projecting one more field, „login.last“:

explain (analyze, verbose, buffers, wal, serialize text, costs off)
select id, data
, (data->'login'->>'last')
, (data->>'username')
from users
;

                                         QUERY PLAN                                          
---------------------------------------------------------------------------------------------
 Seq Scan on public.users (actual time=0.079..855.246 rows=100000 loops=1)
   Output: id, data, ((data -> 'login'::text) ->> 'last'::text), (data ->> 'username'::text)
   Buffers: shared hit=842027 read=72989
 Planning Time: 0.040 ms
 Serialization: time=2261.679 ms  output=511589kB  format=text
   Buffers: shared hit=457140
 Execution Time: 3128.017 ms

Even when both projections occur in the same scan, the JSONB document is de-toasted twice: once for each field, resulting in shared hit=842027 read=72989.

To avoid retrieving the entire document, I project only the fields I need. I also run the query with an additional field, „login.count“, but exclude „data“ from the SELECT statement:

explain (analyze, verbose, buffers, wal, serialize text, costs off)
select id
, (data->'login'->>'count')::int
, (data->'login'->>'last')
, (data->>'username')
from users
;

                                                                                                                                       QUERY PLAN                                                                   
-----------------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on public.users (actual time=0.087..1149.260 rows=100000 loops=1)
   Output: id, (((data -> 'login'::text) ->> 'count'::text))::integer, ((data -> 'login'::text) ->> 'last'::text), (data ->> 'username'::text)
   Buffers: shared hit=1299167 read=72989
 Planning Time: 0.042 ms
 Serialization: time=21.837 ms  o

I saved the reads for serializing the result, but the scan indicated shared hit=1299167 read=72989. It accessed only the necessary data from disk: 72989 pages from the base and TOAST tables. However, to read only three fields, it accessed the buffers 1299167 / 72989 = 18 times, leading to excessive CPU usage and potential lightweight lock contention during concurrent access.

In PostgreSQL, JSONB is a datatype for a single column, to store and get the entire document. In contrast, a document database like MongoDB reads and writes individual fields, keeping the document in memory for quick access, like an object cache. Its WiredTiger storage engine decompresses the on-disk representation when loading to memory and compresses it during cache eviction or checkpoint.

It is not recommended to use PostgreSQL as a document database. Instead, fields requiring individual access should be normalized into SQL columns rather than being embedded in a JSONB document. To identify suboptimal designs, examine the shared buffer hits using EXPLAIN ANALYZE with SERIALIZATION. If you choose a document model, then use a document database.

Schreibe einen Kommentar

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind mit * markiert