S/
Open navigation
Back to Articles

May 22, 2026 - 7 min read

Adding AI Features Without Turning the Product Into a Demo

How to structure AI product features around service boundaries, queues, rate limits, state, and failure handling so they survive real users.

AI backend orchestration flow showing user request, app, queue, LLM provider, and saved result

Treat AI as a product dependency

AI features become fragile when the product treats the model response as magic. A model provider is an external dependency. It can be slow, expensive, inconsistent, rate-limited, or unavailable. That means it needs the same engineering respect as payments, email, file storage, or any other service the product depends on.

The backend should own orchestration. It should decide what context is allowed into a prompt, which user or account is eligible to run the action, how much the request may cost, where progress is stored, and what state is shown if the provider fails.

Separate product state from provider output

The product should not be a thin wrapper around a prompt. A durable AI feature usually has its own records: session, request, status, result, error, token usage, and maybe moderation or review state. That state lets users return later, refresh safely, and see progress without depending on one open browser connection.

Use queues for slow and retryable work

Summaries, recommendations, classification, extraction, chat follow-ups, and multi-step analysis can take too long or fail in ways that users need to recover from. A queue gives the system a place to retry, back off, record errors, and continue after the user leaves the page.

Design for limits before launch

Rate limits and cost limits should be part of the first production design. I prefer account-level quotas, token tracking, cooldowns, and clear retry behavior early. These limits protect the user experience and the business model at the same time.

A practical checklist

Before an AI feature goes live, I want explicit eligibility rules, stored workflow state, queue-backed processing for slower work, provider timeout handling, retry limits, token or cost tracking, clear error states, and tests around non-deterministic output.

Where this shows up in my work

This thinking shows up in my AI product backend work and in product labs like Same and Orbit, where automation needs to fit a real workflow and remain understandable when the system has to recover from imperfect output.

AI IntegrationsBackend engineeringProduct Engineering

FAQs

Why should AI workflows often use queues?

Queues make slow or unreliable provider calls retryable, observable, and easier to reconnect to user-facing progress states.

Related work

Related articles