Dwi Wahyudi
Senior Software Engineer (Ruby, Golang, Java)
In this article, we’re going to write a custom SQL function to collect data from an array in JSON stored in PostgreSQL database.
Overview
PostgreSQL has supported JSON data type since a long time ago. It’s clearly documented and explained in their online docs: https://www.postgresql.org/docs/15/functions-json.html, the difference between these is not within the scope of this article. What we’re going to do now is to try to collect data from array inside a JSON data.
JSON and JSONB are mostly the same, except that JSONB is optimized for read operations.
Now let say that we have a database table (delivery_transactions
) with a JSONB field (delivery_proofs
). It has array JSONB data type like this:
{"{\"url\": \"https://www.example.com/simple.jpeg\", \"name\": \"example.jpeg\"}","{\"url\": \"https://www.example.com/simple.jpeg\", \"name\": \"example.jpeg\"}","{\"url\": \"https://www.example.com/simple.jpeg\", \"name\": \"example.jpeg\"}","{\"url\": \"https://www.example.com/simple.jpeg\", \"name\": \"example.jpeg\"}"}
Few years ago, when I saw these, I had some questions about this, and got the answers some moments later:
- Why is it like this? The JSON data looks weird.
- But that’s how the JSON data is serialized inside a PostgreSQL column/field.
- And why is it enclosed with curly brace
{}
not[]
, it’s array, right?- And that’s how PostgreSQL store the data. https://www.postgresql.org/docs/15/arrays.html
In the JSON function documentations above, there’s a function to check the data type of the JSON.
SELECT pg_typeof(delivery_proofs) FROM delivery_transactions dt WHERE dt.id = 1;
The result will be jsonb[]
. Since this is an array, we can further check it with:
SELECT pg_typeof(delivery_proofs[1]) FROM delivery_transactions dt WHERE dt.id = 1;
The result will be jsonb
.
PostgreSQL has
json_typeof
andjsonb_typeof
functions, but it will only work for a single JSON/JSONB data.
As usual we can access any field of the JSON with ->
SELECT delivery_proofs[1]->'url' AS URL FROM delivery_transactions dt WHERE dt.id = 1;
Collecting the Data
Now here’s the task, collect all of the url from the array. We have to know one thing before proceeding, PostgreSQL array and JSON array are 2 different things. We cannot immediately iterate such jsonb[]
data with FOR ... LOOP
. We must transform the serialized data to use []
first.
There’s a function for that: array_to_jsonb(json)
Why do we do this? Because we want to use jsonb_array_elements(json)
function for the iteration to work.
Now we can create a function for this operation:
CREATE FUNCTION obtain_deliv_proofs() RETURNS text[] LANGUAGE plpgsql AS
$$
DECLARE
urls text[];
json_elem jsonb;
BEGIN
FOR json_elem IN SELECT jsonb_array_elements(array_to_json(delivery_proofs)::jsonb) FROM delivery_transactions dt WHERE dt.id = 1
LOOP
urls = ARRAY_APPEND(urls, (json_elem -> 'url')::text);
END LOOP;
return urls;
END$$;
And call it with SELECT obtain_deliv_proofs();
We can do this in application layer, but we must gather whole JSON data and collecting them one by one. If we need to do this for many records, the processing memory will be quite large if whole JSON data is large. With this function we can just pass a parameters for dt.id
and get the urls collection.