Background Jobs That Don't Bring Down the Main App
Extracting heavy work to a worker process — job queues, priority levels, dead letter handling, and resource isolation.
Context
Our platform had background work: report generation, bulk data exports, notification batching. Initially it ran in the same Node.js process as the API. When a large report ran, it consumed CPU and memory; API latency spiked and users noticed. We needed to isolate background work so a runaway job couldn't starve the main app.
Constraints
- Jobs had different priorities — some were user-facing and urgent; others could wait
- Failed jobs needed retry and eventually dead-letter handling — we couldn't lose work
- We had one small team — the solution had to be simple to operate, not a full job orchestration platform
Architecture
We extracted a separate worker process that polls a jobs table in PostgreSQL. The API enqueues jobs with a priority field; the worker processes high-priority jobs first. Each job has retry_count and max_retries; on failure we increment and re-queue with exponential backoff. After max_retries we move to a dead_letter table and alert. The worker runs in a separate container with its own resource limits — if a job goes haywire, it doesn't affect the API. We use a simple advisory lock to prevent duplicate processing when we scale to multiple workers. The key was starting with a table and polling — no Redis, no RabbitMQ — and only adding complexity when we hit limits.
Alternatives considered
- Run jobs on a cron schedule: Too coarse. We needed per-request job creation, retries, and priority — cron can't express that.
- Use Lambda for each job type: Cold starts and connection pooling with PostgreSQL were concerns. A long-running worker with a connection pool was simpler for our workload.
Lessons learned
- Resource isolation matters. A separate process with its own memory and CPU prevents one bad job from taking down the API.
- Dead letter handling is part of the design, not an afterthought. We built the dead_letter table and alerting from day one.
- Start simple. A jobs table and polling got us 90% of the way; we didn't need a message broker until we had multiple consumers.