Abstract gradient mesh diagram of many request arrows converging into a single glowing looping node, in purple and cyan tones

Cut MCP Round-Trip Overhead by Looping Inside the Tool

Claude CodeAI CodingMCPTypeScript

MCP round-trip overhead comes from every tool call forcing a full model re-invocation over the whole growing context. The fix is to move the loop INTO the tool. One MCP server tool iterates internally and returns a single result, collapsing N round-trips into 1.

Why MCP round-trip overhead is so expensive#

Here is the part people miss. A tool call is not cheap because the tool is slow. It is expensive because of the loop around it.

The agent loop runs like this: the model emits a tool call, the harness executes it, the result re-enters the context, and the model re-runs over the entire history. Every turn re-reads everything. That re-reading is the agent tool call cost, not the work the tool did.

So the MCP round-trip overhead scales with two things: how many calls you make, and how big the context already is. Loop 50 times and you pay for the full transcript 50 times.

The numbers are brutal. In a 4-server Claude Code setup, MindStudio's teardown measured about 7,000 tokens of overhead per message, and heavy setups cross 50,000 before you type a word. Tool definitions alone can hit around 55,000 tokens once a single server exposes its full toolset, going by one measured teardown of the GitHub MCP server.

Where the tokens go

I have watched a naive MCP token overhead run balloon to 150,000 tokens on a task that, done right, costs 2,000. That gap is not the model thinking harder. It is the loop.

The hero pattern: loop inside one MCP tool#

Most people register one MCP tool per atomic action, then let the model call it in a loop. Don't. Put the loop inside the tool.

Here is a real handler using the MCP TypeScript SDK. The inputSchema takes an array of items.

The handler iterates server-side and returns ONE consolidated result. N round-trips become 1.

fetch-many.ts

import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { z } from "zod";

const server = new McpServer({ name: "fetch-many", version: "1.0.0" });

server.registerTool(
  "fetch_all_statuses",
  {
    title: "Fetch all statuses",
    description: "Fetch status for every URL in one call. Loops server-side.",
    inputSchema: {
      urls: z.array(z.string().url()).min(1).max(200),
    },
  },
  async ({ urls }) => {
    const results: { url: string; status: number | string }[] = [];

    // The loop the model would have driven now runs in-process.
    // The model never sees the 199 intermediate responses.
    for (const url of urls) {
      try {
        const res = await fetch(url, { method: "HEAD" });
        results.push({ url, status: res.status });
      } catch (err) {
        results.push({ url, status: `error: ${(err as Error).message}` });
      }
    }

    const downCount = results.filter((r) => r.status !== 200).length;

    return {
      // Only this summary re-enters the model context.
      content: [
        {
          type: "text",
          text: `Checked ${urls.length} URLs. ${downCount} not 200.`,
        },
      ],
      structuredContent: { results, downCount },
    };
  },
);

The model sends one array. The handler runs the in-process loop. The 199 intermediate HTTP responses never touch the context.

That is the whole trick. Only the summary and structuredContent come back, so the model reasons over a few lines instead of a few hundred.

Tip: Return a short content summary for the model to reason over, and stash the full data in structuredContent. The model reads the summary, your downstream code reads the structured payload.

If you have never built one of these, start with Building an MCP Server From Scratch. The loop-inside pattern is a one-line change to the handler once the server exists.

Batch many operations into one call#

The in-process loop assumes every item runs the same operation. Sometimes you need different operations in one shot. That is the batch MCP tool pattern.

Tools like mcp-batchit expose a single batch_execute that takes an array of operations plus run options, fans them out server-side, and returns one consolidated result. The mcp-batchit project reports a 70-90% cut in operational token overhead for workflows with many small tasks. You can build the same shape on the MCP TypeScript SDK.

batch_execute input

{
  "operations": [
    { "tool": "create_file", "args": { "path": "a.ts", "body": "..." } },
    { "tool": "create_file", "args": { "path": "b.ts", "body": "..." } },
    { "tool": "create_file", "args": { "path": "c.ts", "body": "..." } }
  ],
  "options": { "maxConcurrent": 4, "stopOnError": false }
}

One call, three writes, server-side concurrency. The model pays for one round-trip instead of three.

No inter-op data flow. Operation B cannot read operation A's output. They run independently.
Single downstream server. A batch tool typically fans out to one target, not a mix.
Best for fan-out. Identical or independent ops where ordering and shared state do not matter.

Let the model write the loop with code execution#

There is a third option, and it is the strongest. Let the model write the loop itself, in code, inside one sandboxed turn.

Anthropic's code execution with MCP exposes tools as a typed code API. The model calls them inside a single execution, runs native loops and filters, and only the values it returns or logs re-enter the context.

model-written loop

// The model writes and runs this inside ONE execution turn.
import { listTickets, closeTicket } from "./servers/support";

const tickets = await listTickets({ status: "resolved" });

// Loop and filter run natively. None of these rows hit the context.
let closed = 0;
for (const t of tickets) {
  if (t.ageDays > 30) {
    await closeTicket({ id: t.id });
    closed++;
  }
}

// Only this line's value returns to the model.
console.log(`Closed ${closed} stale tickets of ${tickets.length}.`);

This is where the MCP code execution numbers come from: 150,000 tokens down to 2,000, a 98.7% reduction. Direct tool calls would have paged every ticket through the context. Code mode keeps the data in the sandbox.

Worth keeping distinct: that 98.7% is naive-MCP versus code-execution-MCP. A separate MindStudio study found MCP servers burned 35x more tokens than raw CLI tools per task, with reliability dropping from 100% to 72%. Different comparison, same lesson: round-trips are the tax.

Batch vs in-process loop vs code mode#

Three ways to collapse the loop and kill the MCP round-trip overhead. They are not interchangeable. Pick by how much logic and data flow the task needs.

In-process loop runs fixed logic over a list. No inter-step data flow. The tool author orchestrates. Data stays out of context.
Batch tool runs many independent ops in one call. No inter-step data flow. The batch runner orchestrates. Data stays out of context.
Code mode runs arbitrary logic AND inter-step data flow. The model orchestrates. Data stays in the sandbox, out of context.

Note: Code mode is the only one of the three that supports real inter-step data flow, where step two reads step one's output. The other two trade that flexibility for a dead-simple tool surface.

If you are still picking servers, Best MCP Servers Worth Installing covers which ones already ship batch-style tools. And the wider token math lives in How to Reduce AI Coding Tool Token Usage by 50%.

When NOT to collapse the loop#

Collapsing the loop is not free. You trade the model's per-step judgment for token savings. Sometimes that judgment is the whole point.

Keep the model in the loop when each step's outcome should change the next decision. A migration that branches on what it finds.

The same goes for live problem solving. A debugging session that course-corrects after each failure. Flatten those and you get a tool that charges ahead while blind.

There is a cost ceiling worth naming too. Once a loop runs hundreds of fast, identical iterations, the MCP round-trip overhead dwarfs everything else, and that is exactly when collapsing pays off most. The closer each step gets to a real decision, the weaker the case for flattening it.

Frequently asked questions

Yes, and that is the core tradeoff. Inside an in-process loop the model sees nothing until the tool returns, so it cannot adjust based on item 3 before processing item 4. For mechanical fan-out that is fine. For tasks where each result should steer the next decision, keep the loop in the model's hands or use code mode with explicit branch logic the model wrote.

Skip collapsing when steps depend on the model's reasoning between them, when partial results need human or model review mid-run, or when error handling requires a judgment call rather than a fixed rule. If you cannot write the loop body as deterministic code ahead of time, the model probably needs to stay in the driver's seat.

Do not throw on the first failure and lose the rest. Collect per-item outcomes, mark failures inline, and return both a summary count and the full per-item results in structuredContent. The model reads the summary, decides whether to retry the failed subset, and your downstream code keeps the detail. That preserves partial results without flooding the context.

So the rule is simple. If the loop body is deterministic, push it into the tool or into code mode and stop paying MCP round-trip overhead per iteration.

If the body needs the model to think between steps, leave it alone. For more on where these patterns hold up under real load, see AI Agents in Production: What Actually Works.