Build a Long-Horizon Agent

Run multi-hour agent loops with cost ceilings, resumable state, and voice input

This cookbook assumes you have an OpenRouter API key and are using the Agent SDK (@openrouter/agent). If you are starting from scratch, read the Agent SDK overview and the callModel reference first.

Goal: Run an agent that can keep working for hours, not seconds — research projects, multi-stage migrations, voice-driven assistants, or background jobs that span days. The same callModel loop works for all of them once you wire up four primitives.

Outcome: A long-horizon agent that:

  • Caps total cost and step count so it always terminates.
  • Persists conversation state so it can be resumed after a crash, deploy, or human approval.
  • Streams progress events so dashboards and UIs stay live during the run.
  • Runs a self-ask loop — research, adversarial review, repeat — until the agent emits a [DONE] sentinel.
  • Optionally accepts voice input via OpenRouter’s Speech-to-Text endpoint and replies with Text-to-Speech.

You can hand this page to your coding agent as the implementation brief. Adapt the storage, ceilings, and surface (CLI, API, queue worker) to your app rather than scaffold a separate project.

Prerequisites

  • Node.js 20+ or Bun
  • An OpenRouter API key in OPENROUTER_API_KEY
  • A project with @openrouter/agent installed
  • A place to persist state — a database, Redis, S3, or the local filesystem
  • Optional: a microphone or audio file for the voice section
$npm install @openrouter/agent @openrouter/sdk zod

1. Set hard ceilings on every run

Long-horizon agents must terminate. Combine multiple stop conditions so the loop ends as soon as the first one fires. The most useful for long runs are maxCost, stepCountIs, and maxTokensUsed.

1import { OpenRouter, tool, stepCountIs, maxCost } from '@openrouter/agent';
2import { z } from 'zod';
3
4const openrouter = new OpenRouter({
5 apiKey: process.env.OPENROUTER_API_KEY,
6});
7
8const searchTool = tool({
9 name: 'search',
10 description: 'Search the web for information',
11 inputSchema: z.object({ query: z.string() }),
12 execute: async ({ query }) => {
13 return { results: await fetchResults(query) };
14 },
15});
16
17const result = openrouter.callModel({
18 model: '~anthropic/claude-opus-latest',
19 input: 'Research the fusion energy landscape and produce a 5-page report.',
20 tools: [searchTool],
21 // Stop on whichever fires first.
22 stopWhen: [stepCountIs(200), maxCost(5)],
23});
24
25const text = await result.getText();

See the Stop Conditions reference for the full list (stepCountIs, hasToolCall, maxTokensUsed, maxCost, finishReasonIs) and how to compose custom predicates.

Long-horizon runs spend real credits. Always set both a step ceiling and a cost ceiling before you start a multi-hour run, and start small while you are iterating.

2. Persist state for resumability

A multi-hour run must survive restarts, deploys, and human approvals. callModel accepts a StateAccessor that loads and saves ConversationState between steps. Back it with whatever storage your app already uses.

1import type { ConversationState, StateAccessor } from '@openrouter/agent';
2import { readFile, rename, writeFile } from 'node:fs/promises';
3
4const fileStateAccessor = (path: string): StateAccessor => ({
5 load: async () => {
6 // Only swallow ENOENT — real I/O or permission errors should surface
7 // instead of silently restarting the agent from scratch.
8 const raw = await readFile(path, 'utf8').catch((err: NodeJS.ErrnoException) => {
9 if (err.code === 'ENOENT') return null;
10 throw err;
11 });
12 return raw ? (JSON.parse(raw) as ConversationState) : null;
13 },
14 // Atomic write: write to a temp file, then rename. POSIX rename is
15 // atomic on the same filesystem, so a crash mid-write cannot leave
16 // a truncated state file that breaks resumption.
17 save: async (state) => {
18 const tmp = `${path}.tmp`;
19 await writeFile(tmp, JSON.stringify(state));
20 await rename(tmp, path);
21 },
22});
23
24const result = openrouter.callModel({
25 model: '~anthropic/claude-opus-latest',
26 input: 'Plan and start a 3-day data migration.',
27 tools: [searchTool],
28 state: fileStateAccessor('./run.json'),
29 stopWhen: [stepCountIs(200), maxCost(5)],
30});
31
32await result.getResponse();

To resume after a crash, deploy, or human review, call callModel again with the same StateAccessor. Pass input: [] to signal “no new user turn — continue from saved state”; the SDK loads the checkpoint and keeps going.

1const resumed = openrouter.callModel({
2 model: '~anthropic/claude-opus-latest',
3 input: [],
4 state: fileStateAccessor('./run.json'),
5 tools: [searchTool],
6 stopWhen: [stepCountIs(200), maxCost(5)],
7});
8
9await resumed.getResponse();

For production, swap the file accessor for one backed by Postgres, Redis, or an object store. See Tool Approval & State for the full StateAccessor and resumption contract.

3. Stream progress instead of waiting

A run that lasts an hour should not block your UI for an hour. callModel returns a result object with several streams you can consume independently:

  • result.getTextStream() — token deltas for the user-facing response.
  • result.getToolCallsStream() — tool calls as they complete.
  • result.getFullResponsesStream() — the full event stream, including tool preliminary results.
  • result.getResponse() — the final, fully-resolved response with usage data.
1const result = openrouter.callModel({
2 model: '~anthropic/claude-opus-latest',
3 input: 'Build a market analysis report on EV charging.',
4 tools: [searchTool],
5 stopWhen: [stepCountIs(100), maxCost(2)],
6});
7
8// Stream tool calls and text deltas concurrently.
9const streamToolCalls = (async () => {
10 for await (const call of result.getToolCallsStream()) {
11 publishToDashboard({ kind: 'tool', name: call.name, args: call.arguments });
12 }
13})();
14
15const streamText = (async () => {
16 for await (const delta of result.getTextStream()) {
17 publishToDashboard({ kind: 'token', delta });
18 }
19})();
20
21await Promise.all([streamToolCalls, streamText]);
22
23const final = await result.getResponse();
24publishToDashboard({ kind: 'done', usage: final.usage });

See the callModel API reference for every stream method and event type.

Wire publishToDashboard to whatever transport you already use — Server-Sent Events, WebSockets, a database table, or a pubsub channel.

4. Loop with adversarial self-review

A single pass through callModel often leaves gaps — unverified citations, missing edge cases, or stale data. Wrap the run in an outer self-ask loop: research, adversarial review, repeat until the agent emits a [DONE] sentinel. Each iteration appends a new user turn to the persisted StateAccessor, so the agent builds on its prior work instead of starting over.

1import { OpenRouter, stepCountIs, maxCost } from '@openrouter/agent';
2
3const openrouter = new OpenRouter({
4 apiKey: process.env.OPENROUTER_API_KEY,
5});
6
7const SELF_ASK_MAX_ITERATIONS = 10;
8const REVIEW_PROMPT = `Review your last response adversarially.
9- Are there gaps, ambiguities, or unverified claims?
10- If the work is complete and every claim is verified, reply with only [DONE].
11- Otherwise list the gaps and keep researching.`;
12
13const state = fileStateAccessor('./run.json');
14let input: string | unknown[] =
15 'Research the fusion energy landscape and produce a 5-page report.';
16let final = '';
17
18for (let i = 0; i < SELF_ASK_MAX_ITERATIONS; i++) {
19 const result = openrouter.callModel({
20 model: '~anthropic/claude-opus-latest',
21 input,
22 state,
23 tools: [searchTool],
24 // Per-iteration ceilings. The outer for-loop adds a third guard.
25 stopWhen: [stepCountIs(50), maxCost(2)],
26 });
27 final = await result.getText();
28 if (final.includes('[DONE]')) break;
29 // Hand the assistant's own output back as an adversarial reviewer turn.
30 input = REVIEW_PROMPT;
31}

The [DONE] sentinel is intentionally cheap: any model can produce it, and a plain String.includes check keeps the control flow obvious. Swap the review prompt or the reviewer model (for example a faster ~anthropic/claude-sonnet-latest critiquing an Opus researcher) without changing the loop. Three layers of ceilings keep cost bounded: SELF_ASK_MAX_ITERATIONS caps the number of review rounds, and each round inherits its own stepCountIs + maxCost budget.

Pair this with the state accessor from step 2 so the loop survives crashes mid-review. On resume, re-enter the loop from the saved state and continue reviewing.

5. Add voice input

Drive the same agent loop from a voice memo, phone call, or push-to-talk app. OpenRouter exposes a dedicated /api/v1/audio/transcriptions endpoint with a single STT model parameter. Hand the transcript to callModel exactly like a text prompt.

1import { OpenRouter as SDK } from '@openrouter/sdk';
2import { OpenRouter, stepCountIs, maxCost } from '@openrouter/agent';
3import { readFile } from 'node:fs/promises';
4
5const sdk = new SDK({ apiKey: process.env.OPENROUTER_API_KEY });
6const agent = new OpenRouter({ apiKey: process.env.OPENROUTER_API_KEY });
7
8const audio = await readFile('./voice-memo.wav');
9const transcription = await sdk.stt.createTranscription({
10 model: 'openai/whisper-1',
11 inputAudio: { data: audio.toString('base64'), format: 'wav' },
12});
13
14const result = agent.callModel({
15 model: '~anthropic/claude-opus-latest',
16 input: transcription.text,
17 stopWhen: [stepCountIs(50), maxCost(2)],
18});
19
20const reply = await result.getText();

For a streaming microphone, capture audio chunks on the client, send them to your server, and call createTranscription once silence is detected. Use the STT cookbook for the full request and response shape.

6. Speak the response back (optional)

For voice-out, pipe the agent’s reply through /api/v1/audio/speech and write the resulting bytes to a file or stream them to the caller.

1import { writeFile } from 'node:fs/promises';
2
3const stream = await sdk.tts.createSpeech({
4 model: 'openai/gpt-4o-mini-tts-2025-12-15',
5 input: reply,
6 voice: 'alloy',
7 responseFormat: 'mp3',
8});
9
10const chunks: Uint8Array[] = [];
11const reader = stream.getReader();
12while (true) {
13 const { done, value } = await reader.read();
14 if (done) break;
15 chunks.push(value);
16}
17
18await writeFile('./reply.mp3', Buffer.concat(chunks));

7. Notify on completion

Long-horizon jobs usually run somewhere the user is not watching. Notify them when the run terminates — by webhook, email, Slack message, or whatever your stack uses. Trigger the notification once getResponse() resolves so the agent has fully completed and ceilings have been honored.

1const final = await result.getResponse();
2
3const webhookUrl = process.env.WEBHOOK_URL;
4if (!webhookUrl) {
5 throw new Error('WEBHOOK_URL env var is required for webhook notifications');
6}
7
8await fetch(webhookUrl, {
9 method: 'POST',
10 headers: { 'Content-Type': 'application/json' },
11 body: JSON.stringify({
12 status: 'completed',
13 usage: final.usage,
14 text: await result.getText(),
15 }),
16});

For agents that pause mid-run (for example, human-in-the-loop approvals), see Add Human-in-the-Loop Controls.

Check your work

A correct long-horizon implementation should pass all of the following:

  • A run with a low maxCost (for example, maxCost(0.10)) returns from callModel once the ceiling is hit, even if the agent has more work queued.
  • Killing the process mid-run and starting a new callModel invocation with the same StateAccessor resumes from the saved ConversationState. The message history grows rather than starting over.
  • getToolCallsStream() and getTextStream() yield events while the agent is still running, not only at the end.
  • Sending a voice file through sdk.stt.createTranscription returns the expected text, and feeding that text into callModel produces a response that references the spoken request.
  • A webhook (or other notification) fires after getResponse() resolves.

Resources