Learn the cron + database pattern to run scheduled background jobs with retries, locking, and idempotency - without standing up a full queue system.

Most apps need work to happen later, or on a schedule: sending follow-up emails, running a nightly billing check, cleaning up old records, rebuilding a report, or refreshing a cache.
Early on, it’s tempting to add a full queue system because it feels like the “right” way to do background jobs. But queues add moving parts: another service to run, monitor, deploy, and debug. For a small team (or a solo founder), that extra weight can slow you down.
So the real question is: how do you run scheduled work reliably without standing up more infrastructure?
A common first attempt is simple: add a cron entry that hits an endpoint, and have that endpoint do the work. It works until it doesn’t. Once you have more than one server, a deploy at the wrong time, or a job that takes longer than expected, you start seeing confusing failures.
Scheduled work usually breaks in a few predictable ways:
The cron + database pattern is a middle path. You still use cron to “wake up” on a schedule, but you store job intent and job state in your database so the system can coordinate, retry, and record what happened.
It’s a good fit when you already have one database (often PostgreSQL), a small number of job types, and you want predictable behavior with minimal ops work. It’s also a natural choice for apps built quickly on modern stacks (for example, a React + Go + PostgreSQL setup).
It’s not a good fit when you need very high throughput, long-running jobs that must stream progress, strict ordering across many job types, or heavy fan-out (thousands of sub-tasks per minute). In those cases, a real queue and dedicated workers usually pay for themselves.
The cron + database pattern runs background work on a schedule without running a full queue system. You still use cron (or any scheduler), but cron doesn’t decide what to run. It just wakes up a worker often (once a minute is common). The database decides which work is due and makes sure only one worker takes each job.
Think of it like a shared checklist on a whiteboard. Cron is the person who walks into the room every minute and says, “Anyone need to do something now?” The database is the whiteboard that shows what’s due, what’s already taken, and what’s done.
The pieces are straightforward:
Example: you want to send invoice reminders every morning, refresh a cache every 10 minutes, and clean up old sessions nightly. Instead of three separate cron commands (each with its own overlap and failure modes), you store job entries in one place. Cron starts the same worker process. The worker asks Postgres, “What is due right now?” and Postgres answers by letting the worker safely claim exactly one job at a time.
This scales gradually. You can start with one worker on one server. Later, you can run five workers across multiple servers. The contract stays the same: the table is the contract.
The mindset shift is simple: cron is only the wake-up call. The database is the traffic cop that decides what’s allowed to run, records what happened, and gives you a clear history when something goes wrong.
This pattern works best when your database becomes the source of truth for what should run, when it should run, and what happened last time. The schema isn’t fancy, but small details (lock fields and the right indexes) make a big difference as load grows.
Two common approaches:
If you expect to debug failures often, keep history. If you want the smallest possible setup, start with one table and add history later.
Here is a PostgreSQL-friendly layout. If you’re building in Go with PostgreSQL, these columns map cleanly to structs.
-- What should exist (the definition)
create table job_definitions (
id bigserial primary key,
job_type text not null,
payload jsonb not null default '{}'::jsonb,
schedule text, -- optional: cron-like text if you store it
max_attempts int not null default 5,
created_at timestamptz not null default now(),
updated_at timestamptz not null default now()
);
-- What should run (each run / attempt group)
create table job_runs (
id bigserial primary key,
definition_id bigint references job_definitions(id),
job_type text not null,
payload jsonb not null default '{}'::jsonb,
run_at timestamptz not null,
status text not null, -- queued | running | succeeded | failed | dead
attempts int not null default 0,
max_attempts int not null default 5,
locked_by text,
locked_until timestamptz,
last_error text,
created_at timestamptz not null default now(),
updated_at timestamptz not null default now()
);
A few details that save pain later:
send_invoice_emails).jsonb so you can evolve it without migrations.Without indexes, workers end up scanning too much. Start with:
(status, run_at)(locked_until)queued and failed)These keep the “find next runnable job” query quick even when the table grows.
The goal is simple: many workers can run, but only one should grab a specific job. If two workers process the same row, you get double emails, double charges, or messy data.
A safe approach is to treat a job claim like a “lease”. The worker marks the job as locked for a short window. If the worker crashes, the lease expires and another worker can pick it up. That’s what locked_until is for.
Without a lease, a worker could lock a job and never unlock it (process killed, server reboot, deploy gone wrong). With locked_until, the job becomes available again when time passes.
A typical rule is: a job can be claimed when locked_until is NULL or locked_until <= now().
The key detail is to claim the job in a single statement (or one transaction). You want the database to be the referee.
Here’s a common PostgreSQL pattern: pick one due job, lock it, and return it to the worker. (This example uses a single jobs table; the same idea applies if you’re claiming from job_runs.)
WITH next_job AS (
SELECT id
FROM jobs
WHERE status = 'queued'
AND run_at <= now()
AND (locked_until IS NULL OR locked_until <= now())
ORDER BY run_at ASC
LIMIT 1
FOR UPDATE SKIP LOCKED
)
UPDATE jobs j
SET status = 'running',
locked_until = now() + interval '2 minutes',
locked_by = $1,
attempts = attempts + 1,
updated_at = now()
FROM next_job
WHERE j.id = next_job.id
RETURNING j.*;
Why it works:
FOR UPDATE SKIP LOCKED lets multiple workers compete without blocking each other.RETURNING hands the row to the worker that won the race.Set the lease longer than a normal run, but short enough that a crash recovers quickly. If most jobs finish in 10 seconds, a 2 minute lease is plenty.
For long tasks, renew the lease while you work (a heartbeat). A simple approach: every 30 seconds, extend locked_until if you still own the job.
WHERE id = $job_id AND locked_by = $worker_idThat last condition matters. It prevents a worker from extending a lease on a job it no longer owns.
Retries are where this pattern either feels calm or turns into a noisy mess. The goal is simple: when a job fails, try again later in a way you can explain, measure, and stop.
Start by making job state explicit and finite: queued, running, succeeded, failed, dead. In practice, most teams use failed to mean “failed but will retry” and dead to mean “failed and we gave up”. That one distinction prevents infinite loops.
Attempt counting is the second guardrail. Store attempts (how many times you tried) and max_attempts (how many times you allow). When a worker catches an error, it should:
attemptsfailed if attempts < max_attempts, otherwise deadrun_at for the next try (only for failed)Backoff is just the rule that decides the next run_at. Pick one, document it, and keep it consistent:
Jitter matters when a dependency goes down and comes back. Without it, hundreds of jobs can retry at once and fail again.
Store enough error detail to make failures visible and debuggable. You don’t need a full logging system, but you do need the basics:
last_error (short message, safe to show in an admin screen)error_code or error_type (helps grouping)failed_at and next_run_atlast_stack (only if you control size)A concrete rule that works well: mark jobs dead after 10 attempts, and backoff exponentially with jitter. That keeps transient failures retrying, but stops broken jobs from burning CPU forever.
Idempotency means your job can run twice and still produce the same final result. In this pattern, it matters because the same row might get picked up again after a crash, a timeout, or a retry. If your job is “send an invoice email”, running it twice isn’t harmless.
A practical way to think about it: split every job into (1) doing work and (2) applying an effect. You want the effect to happen once, even if the work is attempted multiple times.
An idempotency key should come from what the job represents, not from the worker attempt. Good keys are stable and easy to explain, like invoice_id, user_id + day, or report_name + report_date. If two job attempts refer to the same real-world event, they should share the same key.
Example: “Generate daily sales report for 2026-01-14” can use sales_report:2026-01-14. “Charge invoice 812” can use invoice_charge:812.
The simplest guardrail is letting PostgreSQL reject duplicates. Store the idempotency key somewhere that can be indexed, then add a unique constraint.
-- Example: ensure one logical job/effect per business key
ALTER TABLE jobs
ADD COLUMN idempotency_key text;
CREATE UNIQUE INDEX jobs_idempotency_key_uniq
ON jobs (idempotency_key)
WHERE idempotency_key IS NOT NULL;
This prevents two rows with the same key from existing at the same time. If your design allows multiple rows (for history), put the uniqueness on an “effects” table instead, like sent_emails(idempotency_key) or payments(idempotency_key).
Common side effects to protect:
sent_emails row with a unique key before sending, or record a provider message id once sent.delivered_webhooks(event_id) and skip if it exists.(type, date).If you’re building on a Postgres-backed stack (for example, a Go + PostgreSQL backend), these uniqueness checks are fast and easy to keep close to the data. The key idea is simple: retries are normal, duplicates are optional.
Pick one boring runtime and stick to it. The point of the cron + database pattern is fewer moving parts, so a small Go, Node, or Python process that talks to PostgreSQL is usually enough.
Create the tables and indexes. Add a jobs table (plus any lookup tables you want later), then index run_at, and add an index that helps your worker find available jobs fast (for example on (status, run_at)).
Write a tiny enqueue function. Your app should insert a row with run_at set to “now” or a future time. Keep the payload small and predictable (IDs and a job type, not huge blobs).
INSERT INTO jobs (type, payload, status, run_at, attempts, max_attempts)
VALUES ($1, $2::jsonb, 'queued', $3, 0, 10);
running in the same transaction.WITH picked AS (
SELECT id
FROM jobs
WHERE status = 'queued' AND run_at <= now()
ORDER BY run_at
FOR UPDATE SKIP LOCKED
LIMIT 10
)
UPDATE jobs
SET status = 'running', started_at = now()
WHERE id IN (SELECT id FROM picked)
RETURNING *;
Process and finalize. For each claimed job, do the work, then update to done with finished_at. If it fails, record an error message and move it back to queued with a new run_at (backoff). Keep finalization updates small and always run them, even if your process is shutting down.
Add retry rules you can explain. Use a simple formula like run_at = now() + (attempts^2) * interval '10 seconds', and stop after max_attempts by setting status = 'dead'.
You don’t need a full dashboard on day one, but you do need enough to notice problems.
If you’re already on a Go + PostgreSQL stack, this maps cleanly to a single worker binary plus cron.
Imagine a small SaaS app with two bits of scheduled work:
Keep it simple: one PostgreSQL table to hold jobs, and one worker that runs every minute (triggered by cron). The worker claims due jobs, runs them, and records success or failure.
You can enqueue jobs from a few places:
cleanup_nightly job for “today”.send_weekly_report job for the user’s next Monday.send_weekly_report job that runs immediately for a specific date range.The payload is just the minimum the worker needs. Keep it small so it’s easy to retry.
{
"type": "send_weekly_report",
"payload": {
"user_id": 12345,
"date_range": {
"from": "2026-01-01",
"to": "2026-01-07"
}
}
}
A worker can crash at the worst moment: right after it sends the email, but before it marks the job as “done”. When it restarts, it may pick the same job again.
To stop double-sends, give the work a natural dedupe key and store it where the database can enforce it. For weekly reports, a good key is (user_id, week_start_date). Before sending, the worker records “I am about to send report X”. If that record already exists, it skips sending.
This can be as simple as a sent_reports table with a unique constraint on (user_id, week_start_date), or a unique idempotency_key on the job itself.
Say your email provider times out. The job fails, so the worker:
attemptsIf it keeps failing past your limit (like 10 attempts), mark it as “dead” and stop retrying. The job either succeeds once, or it retries on a clear schedule, and idempotency makes retry safe.
The cron + database pattern is simple, but small mistakes can turn it into duplicates, stuck work, or surprise load. Most issues show up after the first crash, deploy, or traffic spike.
Most real-world incidents come from a few traps:
locked_until. If a worker crashes after claiming a job, that row can stay “in progress” forever. A lease timestamp lets another worker safely pick it up later.user_id, invoice_id, or a file key) and fetch the rest when you run.Example: you send a weekly invoice email. If the worker times out after sending but before marking the job done, the same job may be retried and send a duplicate email. That’s normal for this pattern unless you add a guardrail (for example, record a unique “email sent” event keyed by invoice id).
Avoid mixing scheduling and execution in the same long transaction. If you hold a transaction open while doing network calls, you keep locks longer than needed and block other workers.
Watch for clock differences between machines. Use database time (NOW() in PostgreSQL) as the source of truth for run_at and locked_until, not the app server clock.
Set a clear maximum runtime. If a job can take 30 minutes, make the lease longer than that, and renew it if needed. Otherwise another worker may pick it up mid-run.
Keep your job table healthy. If completed jobs pile up forever, queries slow down and lock contention rises. Pick a simple retention rule (archive or delete old rows) before the table becomes huge.
Before you ship this pattern, check the basics. A small omission here usually turns into stuck jobs, surprise duplicates, or a worker that hammers the database.
run_at, status, attempts, locked_until, and max_attempts (plus last_error or similar so you can see what happened).invoice_id).max_attempts.If these are true, the cron + database pattern is usually stable enough for real workloads.
Once the checklist looks good, focus on day-to-day operation.
run_at = now() and clears the lock) and “cancel” (moves to a terminal status). These save time during incidents.status, run_at).If you want to build this kind of setup quickly, Koder.ai (koder.ai) can help you get from schema to a deployed Go + PostgreSQL app with less manual wiring, while you focus on the locking, retries, and idempotency rules.
If you later outgrow this setup, you’ll still have learned the job lifecycle clearly, and those same ideas map well to a full queue system.