PostgreSQL indexes for SaaS apps: choose between btree, GIN, and GiST using real query shapes like filters, search, JSONB, and arrays.

An index changes how PostgreSQL finds rows. Without an index, the database often has to read a lot of the table (a sequential scan) and then discard most of it. With the right index, it can jump straight to the matching rows (an index lookup), then fetch only what it needs.
You notice this early in SaaS because everyday screens are query-heavy. A single click can trigger several reads: the list page, a total count, a couple of dashboard cards, and a search box. When a table grows from thousands to millions of rows, the same query that used to feel instant starts to lag.
A typical example is an Orders page filtered by status and date, sorted by newest first, with pagination. If PostgreSQL has to scan the whole orders table to find paid orders from the last 30 days, every page load does extra work. A good index turns that into a quick hop to the right slice of data.
Indexes aren’t free. Each one buys faster reads for specific queries, but it also makes writes slower (INSERT/UPDATE/DELETE must update indexes) and uses more storage (plus more cache pressure). That trade-off is why you should start from real query patterns, not from index types.
A simple rule that prevents busywork: add an index only when you can point to a specific, frequent query it will speed up. If you build screens with a chat-driven builder like Koder.ai, it helps to capture the SQL behind your list pages and dashboards and use that as your index wish list.
Most indexing confusion goes away when you stop thinking in features (JSON, search, arrays) and start thinking in query shape: what does the WHERE clause do, and how do you expect results to be ordered?
Use B-tree when your query looks like normal comparisons and you care about sort order. It’s the workhorse for equality, ranges, and joins.
Example shapes: filtering by tenant_id = ?, status = 'active', created_at >= ?, joining users.id = orders.user_id, or showing “latest first” with ORDER BY created_at DESC.
GIN (Generalized Inverted Index) is a good fit when one column contains many members and you ask, “does it contain X?” That’s common with JSONB keys, array elements, and full-text vectors.
Example shapes: metadata @> {'plan':'pro'} on JSONB, tags @> ARRAY['urgent'], or to_tsvector(body) @@ plainto_tsquery('reset password').
GiST (Generalized Search Tree) fits questions about distance or overlap, where values behave like ranges or shapes. It’s often used for range types, geometric data, and some “closest match” searches.
Example shapes: overlapping time windows with range columns, some similarity-style searches (for example, with trigram operators), or spatial queries (if you use PostGIS).
A practical way to choose:
Indexes speed reads, but they cost write time and disk. In SaaS, that trade-off matters most on hot tables like events, sessions, and activity logs.
Most SaaS list screens share the same shape: a tenant boundary, a couple of filters, and a predictable sort. B-tree indexes are the default choice here, and they’re usually the cheapest to maintain.
A common pattern is WHERE tenant_id = ? plus filters like status = ?, user_id = ?, and a time range like created_at >= ?. For composite B-tree indexes, put equality filters first (columns you match with =), then add the column you sort by.
Rules that work well in most apps:
tenant_id if every query is tenant-scoped.= filters next (often status, user_id).ORDER BY column last (often created_at or id).INCLUDE to cover list pages without making the key wider.A realistic example: a Tickets page showing the newest items first, filtered by status.
-- Query
SELECT id, status, created_at, title
FROM tickets
WHERE tenant_id = $1
AND status = $2
ORDER BY created_at DESC
LIMIT 50;
-- Index
CREATE INDEX tickets_tenant_status_created_at_idx
ON tickets (tenant_id, status, created_at DESC)
INCLUDE (title);
That index supports both the filter and the sort, so Postgres can avoid sorting a large result set. The INCLUDE (title) part helps the list page touch fewer table pages, while keeping the index keys focused on filtering and ordering.
For time ranges, the same idea applies:
SELECT id, created_at
FROM events
WHERE tenant_id = $1
AND created_at >= $2
AND created_at < $3
ORDER BY created_at DESC
LIMIT 100;
CREATE INDEX events_tenant_created_at_idx
ON events (tenant_id, created_at DESC);
Pagination is where many SaaS apps slow down. Offset pagination (OFFSET 50000) forces the database to walk past many rows. Seek pagination stays fast by using the last seen sort key:
SELECT id, created_at
FROM tickets
WHERE tenant_id = $1
AND created_at < $2
ORDER BY created_at DESC
LIMIT 50;
With the right B-tree index, this stays quick even as the table grows.
Most SaaS apps are multi-tenant: every query must stay inside one tenant. If your indexes don’t include tenant_id, Postgres can still find rows quickly, but it often scans far more index entries than needed. Tenant-aware indexes keep each tenant’s data clustered in the index so common screens stay fast and predictable.
A simple rule: put tenant_id first in the index when the query always filters by tenant. Then add the column you filter or sort by most.
High-impact, boring indexes often look like:
(tenant_id, created_at) for recent-items lists and cursor pagination(tenant_id, status) for status filters (Open, Paid, Failed)(tenant_id, user_id) for “items owned by this user” screens(tenant_id, updated_at) for “recently changed” admin views(tenant_id, external_id) for lookups from webhooks or importsOver-indexing happens when you add a new index for every slightly different screen. Before creating another one, check if an existing composite index already covers the leftmost columns you need. For example, if you have (tenant_id, created_at), you usually don’t also need (tenant_id, created_at, id) unless you truly filter on id after those columns.
Partial indexes can cut size and write cost when most rows aren’t relevant. They work well with soft deletes and “active only” data, for example: index only where deleted_at IS NULL, or only where status = 'active'.
Every extra index makes writes heavier. Inserts must update each index, and updates can touch multiple indexes even when you change one column. If your app ingests lots of events (including apps built quickly with Koder.ai), keep indexes focused on the few query shapes users hit every day.
JSONB is handy when your app needs flexible extra fields like feature flags, user attributes, or per-tenant settings. The catch is that different JSONB operators behave differently, so the best index depends on how you query.
Two shapes matter most:
@>.-> / ->> (often compared with =).If you frequently filter with @>, a GIN index on the JSONB column usually pays off.
-- Query shape: containment
SELECT id
FROM accounts
WHERE tenant_id = $1
AND metadata @> '{"region":"eu","plan":"pro"}';
-- Index
CREATE INDEX accounts_metadata_gin
ON accounts
USING GIN (metadata);
If your JSON structure is predictable and you mostly use @> on top-level keys, jsonb_path_ops can be smaller and faster, but it supports fewer operator types.
If your UI repeatedly filters on one field (like plan), extracting that field and indexing it is often faster and cheaper than a wide GIN.
SELECT id
FROM accounts
WHERE tenant_id = $1
AND metadata->>'plan' = 'pro';
CREATE INDEX accounts_plan_expr
ON accounts ((metadata->>'plan'));
A practical rule: keep JSONB for flexible, rarely filtered attributes, but promote stable, high-usage fields (plan, status, created_at) into real columns. If you’re iterating fast on a generated app, it’s often an easy schema tweak once you see which filters show up on every page.
Example: if you store {"tags":["beta","finance"],"region":"us"} in JSONB, use GIN when you filter by bundles of attributes (@>), and add expression indexes for the few keys that drive most list views (plan, region).
Arrays look tempting because they’re easy to store and easy to read. A users.roles text[] column or projects.labels text[] column can work well when you mostly ask one question: does this row contain a value (or a set of values)? That’s exactly where a GIN index helps.
GIN is the go-to choice for membership queries on arrays. It breaks the array into individual items and builds a fast lookup to rows that contain them.
Array query shapes that often benefit:
@> (array contains)&& (array shares any items)= ANY(...), but @> is often more predictableA typical example for filtering users by role:
-- Find users who have the "admin" role
SELECT id, email
FROM users
WHERE roles @> ARRAY['admin'];
CREATE INDEX users_roles_gin ON users USING GIN (roles);
And filtering projects by a label set (must include both labels):
SELECT id, name
FROM projects
WHERE labels @> ARRAY['billing', 'urgent'];
CREATE INDEX projects_labels_gin ON projects USING GIN (labels);
Where people get surprised: some patterns don’t use the index the way you expect. If you turn the array into a string (array_to_string(labels, ',')) and then run LIKE, the GIN index won’t help. Also, if you need “starts with” or fuzzy matches inside labels, you’re in text search territory, not array membership.
Arrays can also become hard to maintain when they turn into a mini-database: frequent updates, needing metadata per item (who added the label, when, why), or needing analytics per label. At that point, a join table like project_labels(project_id, label) is usually easier to validate, query, and evolve.
For search boxes, two patterns show up again and again: full-text search (find records about a topic) and fuzzy matching (handle typos, partial names, and ILIKE patterns). The right index is the difference between “instant” and “timeouts at 10k users”.
Use full-text search when users type real words and you want results ranked by relevance, like searching tickets by subject and description. The usual setup is to store a tsvector (often in a generated column) and index it with GIN. You search with @@ and a tsquery.
-- Tickets: full-text search on subject + body
ALTER TABLE tickets
ADD COLUMN search_vec tsvector
GENERATED ALWAYS AS (
to_tsvector('simple', coalesce(subject,'') || ' ' || coalesce(body,''))
) STORED;
CREATE INDEX tickets_search_vec_gin
ON tickets USING GIN (search_vec);
-- Query
SELECT id, subject
FROM tickets
WHERE search_vec @@ plainto_tsquery('simple', 'invoice failed');
-- Customers: fuzzy name search using trigrams
CREATE INDEX customers_name_trgm
ON customers USING GIN (name gin_trgm_ops);
SELECT id, name
FROM customers
WHERE name ILIKE '%jon smth%';
What to store in the vector: only the fields you actually search. If you include everything (notes, internal logs), you pay in index size and write cost.
Use trigram similarity when users search names, emails, or short phrases and you need partial matches or typo tolerance. Trigrams help with ILIKE '%term%' and similarity operators. GIN is usually faster for “does it match?” lookups; GiST can be a better fit when you also care about ordering by similarity.
Rules of thumb:
tsvector for relevance-based text search.Pitfalls worth watching:
ILIKE '%abc') force scans.If you’re shipping search screens quickly, treat the index as part of the feature: search UX and index choice need to be designed together.
Start with the exact query your app runs, not a guess. A “slow screen” is usually one SQL statement with a very specific WHERE and ORDER BY. Copy it from logs, your ORM debug output, or whatever query capture you already use.
A workflow that holds up in real apps:
EXPLAIN (ANALYZE, BUFFERS) on the same query.=, >=, LIKE, @>, @@), not just the column names.EXPLAIN (ANALYZE, BUFFERS) with realistic data volume.Here’s a concrete example. A Customers page filters by tenant and status, sorts by newest, and paginates:
SELECT id, created_at, email
FROM customers
WHERE tenant_id = $1 AND status = $2
ORDER BY created_at DESC
LIMIT 50;
If EXPLAIN shows a sequential scan and a sort, a B-tree index that matches the filter and sort often fixes it:
CREATE INDEX ON customers (tenant_id, status, created_at DESC);
If the slow part is JSONB filtering like metadata @> '{"plan":"pro"}', that points to GIN. If it’s full-text search like to_tsvector(...) @@ plainto_tsquery(...), that also points to a GIN-backed search index. If it’s a “closest match” or overlap-style operator set, that’s where GiST is often the fit.
After adding the index, measure the trade-off. Check index size, insert and update time, and whether it helps the top few slow queries or only one edge case. In fast-moving projects (including ones built on Koder.ai), this re-check helps you avoid piling up unused indexes.
Most index problems aren’t about choosing B-tree vs GIN vs GiST. They’re about building an index that looks right, but doesn’t match how the app queries the table.
Mistakes that tend to hurt most:
tenant_id and created_at, but the index starts with created_at, the planner may skip it.status, is_active, or a boolean often does little because it matches too many rows. Pair it with a selective column (like tenant_id or created_at) or skip it.ANALYZE hasn’t run recently, the planner can choose bad plans even when the right index exists.A concrete example: your Invoices screen filters by tenant_id and status, then sorts by created_at DESC. An index on just status will barely help. A better fit is a composite index that starts with tenant_id, then status, then created_at (filter first, sort last). That single change often beats adding three separate indexes.
Treat every index as a cost. It has to earn its keep in real queries, not just in theory.
Index changes are easy to ship and annoying to undo if they add write cost or lock a busy table. Before you merge, treat it like a small release.
Start by deciding what you’re optimizing. Pull two short rankings from your logs or monitoring: the queries that run most often, and the queries with the highest latency. For each, write down the exact shape: filter columns, sort order, joins, and operators used (equals, range, IN, ILIKE, JSONB operators, array contains). This prevents guessing and helps you pick the right index type.
Pre-ship checklist:
After you add the index, verify it helped in the real plan. Run EXPLAIN (ANALYZE, BUFFERS) on the exact query and compare before vs after. Then watch production behavior for a day:
If you’re building with Koder.ai, it’s worth keeping the generated SQL for one or two slow screens next to the change, so the index matches what the app actually runs.
Picture a common admin screen: a Users list with tenant scoping, a few filters, sort by last active, and a search box. This is where indexes stop being theory and start saving real time.
Three query shapes you’ll usually see:
-- 1) List page with tenant + status filter + sort
SELECT id, email, last_active_at
FROM users
WHERE tenant_id = $1 AND status = $2
ORDER BY last_active_at DESC
LIMIT 50;
-- 2) Search box (full-text)
SELECT id, email
FROM users
WHERE tenant_id = $1
AND to_tsvector('simple', coalesce(name,'') || ' ' || coalesce(email,'')) @@ plainto_tsquery($2)
ORDER BY last_active_at DESC
LIMIT 50;
-- 3) Filter on JSON metadata (plan, flags)
SELECT id
FROM users
WHERE tenant_id = $1
AND metadata @> '{"plan":"pro"}'::jsonb;
A small but intentional index set for this screen:
(tenant_id, status, last_active_at DESC).tsvector column with a GIN index.GIN (metadata) when you use @> a lot, or an expression B-tree like ((metadata->>'plan')) when you mostly filter a single key.Mixed needs are normal. If one page does filters + search + JSON, avoid cramming everything into one mega index. Keep the B-tree for sorting/pagination, then add one specialized index (often GIN) for the expensive part.
Next steps: pick one slow screen, write down its top 2-3 query shapes, and review each index by purpose (filter, sort, search, JSON). If an index doesn’t clearly match a real query, drop it from the plan. If you’re iterating quickly on koder.ai, doing this review as you add new screens can prevent index sprawl while your schema is still changing.
An index lets PostgreSQL find matching rows without reading most of the table. For common SaaS screens like lists, dashboards, and search, the right index can turn a slow sequential scan into a fast lookup that scales better as the table grows.
Start with B-tree for most app queries because it’s best for = filters, range filters, joins, and ORDER BY. If your query is mainly about containment (JSONB, arrays) or text search, then GIN is usually the next thing to consider; GiST is more for overlap and “nearest/closest” style queries.
Put the columns you filter with = first, then put the column you sort by last. That order matches how the planner can walk the index efficiently, so it can both filter and return rows in the right order without an extra sort.
If every query is scoped by tenant_id, putting tenant_id first keeps each tenant’s rows grouped together inside the index. That usually reduces the amount of index and table data PostgreSQL has to touch for everyday list pages.
INCLUDE lets you add extra columns to support index-only reads for list pages without making the index key wider. It’s most useful when you filter and sort by a few columns but you also display a couple of extra fields on the screen.
Use a partial index when you only care about a subset of rows, like “not deleted” or “active only.” It keeps the index smaller and cheaper to maintain, which matters on hot tables that get lots of inserts and updates.
Use a GIN index on the JSONB column when you frequently query with containment like metadata @> '{"plan":"pro"}'. If you mostly filter on one or two specific JSON keys, an expression B-tree index on (metadata->>'plan') is often smaller and faster.
GIN is a great fit when your main question is “does this array contain X?” using operators like @> or &&. If you need per-item metadata, frequent edits, or analytics per label/role, a join table is usually easier to maintain and index well.
For full-text search, store a tsvector (often as a generated column) and index it with GIN, then query with @@ for relevance-style search. For fuzzy matching like ILIKE '%name%' and typo tolerance, trigram indexes (often GIN) are typically the right tool.
Copy the exact SQL your app runs and run EXPLAIN (ANALYZE, BUFFERS) to see where time is spent and whether you’re scanning, sorting, or doing expensive filters. Add the smallest index that matches the query’s operators and sort order, then rerun the same EXPLAIN to confirm it’s actually used and improves the plan.