Dwi Wahyudi

Senior Software Engineer (Ruby, Golang, Java)

About
Posts

Sometimes PostgreSQL just doesn’t want to pick the indexes that we have created. In this article, we will cover such topic.

Overview

Index in relational database is a very important thing to create and maintain in order to speedup our query. In absence of indexes, PostgreSQL will just do sequential scans which is too costly for most queries. This is known to all software developers.

In most cases an index for field x in table a will be used for a query like:
- SELECT * FROM a WHERE x = ?; or
- SELECT * FROM a WHERE x IN (?, ?) or
- SELECT * FROM a WHERE x = ? AND y = ? AND z = ?
  - The more efficient index is the composite index of (x, y, z) in table a.
  - Query like SELECT * FROM a WHERE y = ? will not use such index.

But there are some certain cases where PostgreSQL doesn’t pick the index, eventhough the index is already created. Why?

Datatype Mismatch

One of the most common cases of PostgreSQL not picking our index is because the datatype mismatch. We create the index for a field with a certain type but we query it with different type.

Let say we have an index for our timestamp in purchases table for a field invoice_time, with the type of TIMESTAMP WITHOUT TIME ZONE,

Query like this will pick the index in most cases: SELECT * FROM purchases WHERE invoice_time > ?;

But if query with these:

SELECT * FROM purchases WHERE DATE(invoice_time) > ?;, DATE() function will convert each record to DATE.
SELECT * FROM purchases WHERE invoice_time AT TIME ZONE 'Asia/Jakarta' < ?;, AT TIME ZONE will convert each record to TIMESTAMP WITH TIME ZONE.
SELECT * FROM purchases WHERE DOW(invoice_time) = ?;, DOW() will convert each record to integer.

The index will not be used for all of these queries, because data types don’t match. We make PostgreSQL convert invoice_time data type on each record to the data type in query in filtering step. In other words, PostgreSQL is doing sequential scans which will ultimately takes long time to complete.

If let say, we still want to query by DATE(invoice_time) we’ll need to create for such index: CREATE INDEX purchases_inv_date ON purchases ( (invoice_time::DATE) ); and so on.

Functions

Functions like LOWER() and UPPER() will also make index not used.

Let say we have an index on name field on users table, a query like this will not use the index: SELECT * FROM users WHERE LOWER(name) = ?;

Just like datatype mismatch above, PostgreSQL will convert each records’ name with LOWER() so it can satisfy the query.

We can create another index for this: CREATE INDEX lower_name ON users (LOWER(name));

`LIKE` and `ILIKE`

LIKE operator can use the default BTREE index if we’re searching for a string without wildcard on the beginning, example: name LIKE 'johnson%' or name ~ '^johnson';

ILIKE operator in most case won’t use the default BTREE index. PostgreSQL provides pgtrgm for this case. pgtrgm provides better supports for LIKE operator too https://www.postgresql.org/docs/current/pgtrgm.html

`LIMIT` Clause

With LIMIT clause, PostgreSQL will sometimes just do sequential scans in order to retrieve the rows. Let say we have an index of blood_type in our patients table which contains 2 million rows. Because blood_type here will only have 4 kind of values, if we LIMIT the query, especially if the LIMIT number is low, then PostgreSQL will just do sequential scans, finding the required rows, scanning them one by one until PostgreSQL finds the correct amount (specified by LIMIT).

This has a good and bad implications though, let say we query it with SELECT * FROM patients WHERE blood_type = 'A' LIMIT 50;, in a table with 2 million rows, we will be in luck if PostgreSQL manages to find them after let say scanning among 1000 rows, it will be more efficient this way than using the index (and it has a lots of blood_type A in there).

We will be in bad luck if whole table got blood_type A rows less than 50 (PostgreSQL scanning whole 2 million rows).

We can help PostgreSQL to do better estimation, first of all we can ANALYZE the table so that PostgreSQL can determine the best execution plans of queries, but this will be covered in the future article.

Other Conditions

There are other conditions that will make an index to be not used.

When table is small. PostgreSQL will just scan the table if table rows are so few. So it is best to test index usage in table with plenty of rows.
When the query returns a huge percentage amount of rows, so that it is more efficient for PostgreSQL to just do sequential scans.