Integrating an LLM Without Letting It Own Your Architecture
Adding an LLM-powered feature while keeping it a swappable component — API abstraction, timeout handling, output validation, and prompt versioning.
Context
We wanted to add an AI-powered summarization feature to our platform. The product team was excited; the engineering question was how to integrate an LLM without making it a load-bearing dependency. LLM APIs are slow, occasionally return garbage, and providers change. We needed the feature to work reliably while staying flexible enough to swap providers or models without rewriting the app.
Constraints
- LLM APIs have variable latency — we couldn't block the main request path for 5+ seconds
- Output is non-deterministic — we needed validation and fallbacks when the model returned invalid or off-brand content
- Provider lock-in was a risk — we wanted to be able to switch or A/B test models without touching every call site
Architecture
We introduced an abstraction layer: a single LLM service interface that our feature code calls. The implementation talks to the provider (OpenAI, Anthropic, etc.) behind that interface. We added timeouts — if the LLM doesn't respond within 3 seconds, we return a fallback (e.g. a truncated version of the source text) instead of blocking. Output validation runs before we surface anything to the user: we check length, format, and basic content rules. Prompts live in versioned config files so we can iterate without code deploys. The key was treating the LLM as one of many data sources, not the center of the system.
Alternatives considered
- Call the LLM inline in the request path: Would have added 2–5 seconds to every request. Users would have seen spinners; we'd have had no fallback when the API was slow or down.
- Fine-tune our own model for the task: Operational overhead, cost, and complexity we didn't need. A well-prompted general model was sufficient for our use case.
Lessons learned
- Abstract the LLM behind an interface. When we switched providers for cost reasons, we changed one implementation, not dozens of call sites.
- Timeouts and fallbacks are non-negotiable. Users prefer a slightly worse result over an infinite wait.
- Version your prompts. It makes debugging and rollback possible when a prompt change causes regressions.