Dwi Wahyudi
Senior Software Engineer (Ruby, Golang, Java)
Sometimes PostgreSQL just doesn’t want to pick the indexes that we have created. In this article, we will cover such topic.
Overview
Index in relational database is a very important thing to create and maintain in order to speedup our query. In absence of indexes, PostgreSQL will just do sequential scans which is too costly for most queries. This is known to all software developers.
- In most cases an index for field
x
in tablea
will be used for a query like:SELECT * FROM a WHERE x = ?;
orSELECT * FROM a WHERE x IN (?, ?)
orSELECT * FROM a WHERE x = ? AND y = ? AND z = ?
- The more efficient index is the composite index of
(x, y, z)
in tablea
. - Query like
SELECT * FROM a WHERE y = ?
will not use such index.
- The more efficient index is the composite index of
But there are some certain cases where PostgreSQL doesn’t pick the index, eventhough the index is already created. Why?
Datatype Mismatch
One of the most common cases of PostgreSQL not picking our index is because the datatype mismatch. We create the index for a field with a certain type but we query it with different type.
Let say we have an index for our timestamp in purchases
table for a field invoice_time
, with the type of TIMESTAMP WITHOUT TIME ZONE
,
Query like this will pick the index in most cases: SELECT * FROM purchases WHERE invoice_time > ?;
But if query with these:
SELECT * FROM purchases WHERE DATE(invoice_time) > ?;
,DATE()
function will convert each record toDATE
.SELECT * FROM purchases WHERE invoice_time AT TIME ZONE 'Asia/Jakarta' < ?;
,AT TIME ZONE
will convert each record toTIMESTAMP WITH TIME ZONE
.SELECT * FROM purchases WHERE DOW(invoice_time) = ?;
,DOW()
will convert each record to integer.
The index will not be used for all of these queries, because data types don’t match. We make PostgreSQL convert invoice_time
data type on each record to the data type in query in filtering step. In other words, PostgreSQL is doing sequential scans which will ultimately takes long time to complete.
If let say, we still want to query by DATE(invoice_time)
we’ll need to create for such index: CREATE INDEX purchases_inv_date ON purchases ( (invoice_time::DATE) );
and so on.
Functions
Functions like LOWER()
and UPPER()
will also make index not used.
Let say we have an index on name
field on users
table, a query like this will not use the index: SELECT * FROM users WHERE LOWER(name) = ?;
Just like datatype mismatch above, PostgreSQL will convert each records’ name with LOWER()
so it can satisfy the query.
We can create another index for this: CREATE INDEX lower_name ON users (LOWER(name));
LIKE
and ILIKE
LIKE
operator can use the default BTREE index if we’re searching for a string without wildcard on the beginning, example: name LIKE 'johnson%'
or name ~ '^johnson';
ILIKE
operator in most case won’t use the default BTREE index. PostgreSQL provides pgtrgm for this case. pgtrgm provides better supports for LIKE
operator too https://www.postgresql.org/docs/current/pgtrgm.html
LIMIT
Clause
With LIMIT
clause, PostgreSQL will sometimes just do sequential scans in order to retrieve the rows. Let say we have an index of blood_type
in our patients
table which contains 2 million rows. Because blood_type
here will only have 4 kind of values, if we LIMIT
the query, especially if the LIMIT
number is low, then PostgreSQL will just do sequential scans, finding the required rows, scanning them one by one until PostgreSQL finds the correct amount (specified by LIMIT
).
This has a good and bad implications though, let say we query it with SELECT * FROM patients WHERE blood_type = 'A' LIMIT 50;
, in a table with 2 million rows, we will be in luck if PostgreSQL manages to find them after let say scanning among 1000 rows, it will be more efficient this way than using the index (and it has a lots of blood_type
A in there).
We will be in bad luck if whole table got blood_type
A rows less than 50 (PostgreSQL scanning whole 2 million rows).
We can help PostgreSQL to do better estimation, first of all we can ANALYZE
the table so that PostgreSQL can determine the best execution plans of queries, but this will be covered in the future article.
Other Conditions
There are other conditions that will make an index to be not used.
- When table is small. PostgreSQL will just scan the table if table rows are so few. So it is best to test index usage in table with plenty of rows.
- When the query returns a huge percentage amount of rows, so that it is more efficient for PostgreSQL to just do sequential scans.