ENGINEERING

How We Optimized Our MCP Server Based on Real User Feedback

Ignacio Goyanes

• Feb 20, 2026

We shipped our MCP server and people started using it immediately. Small queries worked great — “What’s my Meta Ads spend this month?” came back in under two seconds with clean data. But the moment someone asked for daily campaign-level data across multiple accounts over a few months, things broke.

The response was too large for the AI’s context window. Claude choked. ChatGPT timed out. The data was there, but no AI could process it.

We collected feedback, watched how people actually used the server, and built an optimization pipeline around what we learned. Here’s how it works.

The problem is bigger than you think

When you query marketing data through MCP, the response is raw tabular data — one row per campaign per day. A seemingly simple question like “Show me my Facebook Ads performance by campaign for the last quarter” can return 5,000+ rows.

AI models have context windows. Claude handles roughly 100K tokens. ChatGPT varies by plan. One token is roughly 3-4 characters. A JSON response with 5,000 rows of marketing data can blow past 500,000 characters — way beyond what any model can process in a single turn.

The naive solution is truncation. Chop the data at some limit, return whatever fits. But that means the AI presents partial data as if it’s complete. A user asks “Which campaign spent the most?” and the AI answers confidently based on 30% of the actual campaigns. The highest spender might be in the 70% that got dropped.

We decided: if the data doesn’t fit, don’t return partial data. Give the AI enough context to ask a smarter question.

The pipeline

The optimizer sits between the raw API response and the MCP tool output. It runs four steps, each conditionally triggered. Most queries never go past step two.

Step 1: Validate and filter

We clean the data before doing anything else. Rows where every dimension is empty get removed — these are artifacts from API pagination or permission issues. Noise, not signal.

If every metric in the response is zero, we flag it as a likely permissions problem. The connection might look healthy, but the access token doesn’t have the right scopes. We surface a warning instead of returning a wall of zeros.

This step alone removes 5-15% of rows in some datasets.

Step 2: Convert to CSV and check the budget

JSON is expensive for tabular data. Every row carries structural overhead:

{
  "campaign_name": { "value": "Summer Sale", "display_name": "Campaign" },
  "spend": { "value": 1234.56, "display_name": "Amount Spent" }
}

Same data in CSV:

campaign_name,spend
Summer Sale,1234.56

Every MCP response uses CSV. AI models parse it just as well as JSON, and it’s 85-99% cheaper in tokens. No format negotiation, no configuration. CSV is always the right choice when the consumer is an AI.

After conversion, we estimate the token cost: characters × 0.3. If it fits within the client’s budget, we return it immediately. This handles the vast majority of queries.

Step 3: Reduce time granularity

When data exceeds the budget and the query includes a time dimension, we try coarser aggregations:

daily → weekly → monthly → all

Each step is a fresh API call. That adds 2-5 seconds per attempt, but the results are worth it. Instead of seeing a partial slice of daily data, you see 100% of your weekly data. Nothing is lost — it’s aggregated differently.

If any level fits, we return it with a notice:

{
  "_meta": {
    "optimizations_applied": [
      {
        "type": "granularity_change",
        "from": "daily",
        "to": "weekly",
        "detail": "Changed time aggregation from daily to weekly"
      }
    ]
  }
}

Step 4: Data summary with re-query instructions

This is the key insight. When data still doesn’t fit after granularity reduction, we don’t truncate. We return a statistical summary of the full dataset and give the AI everything it needs to re-query with precision.

For each dimension:

Unique count — how many distinct values
Top 20 values — sorted by frequency

For each metric:

min, max, average, sum

Plus re-query instructions: filter operators and smart suggestions tailored to the data shape:

{
  "type": "data_too_large",
  "summary": {
    "total_rows": 4194,
    "token_estimate": 45000,
    "token_budget": 30000,
    "dimensions": [
      {
        "field": "campaign_name",
        "unique_count": 157,
        "top_values": ["Brand Campaign", "Holiday Promo", "..."]
      }
    ],
    "metrics": [
      { "field": "spend", "min": 0, "max": 5000, "avg": 850, "sum": 127612 }
    ]
  },
  "requery_instructions": {
    "suggestions": [
      "Filter by campaign_name using CONTAINS or EQUALS",
      "Filter spend > 850 to keep only above-average rows",
      "Narrow the date range to reduce rows"
    ]
  }
}

The AI reads this, understands the data shape, and crafts a follow-up query with filters targeting exactly what the user needs. Instead of 4,194 rows of everything, it gets 200 rows of what matters.

No data is lost. No rows silently dropped. The AI just asks a better question.

Always tell the AI what happened

Every response includes a _meta object with full transparency:

{
  "_meta": {
    "original_rows": 1985,
    "returned_rows": 1857,
    "format": "csv",
    "token_estimate": 25563,
    "token_budget": 30000,
    "optimizations_applied": [
      {
        "type": "validation_filter",
        "rows_removed": 128,
        "detail": "Removed 128 rows with empty dimensions"
      }
    ]
  }
}

The AI reads this and communicates it to the user. No black boxes.

Query guardrails

Large queries don’t just hit token limits — they time out. We added three guardrails at the entry point:

Account limit. Max 10 accounts per query. Multi-account queries across dozens of ad accounts time out before returning useful data. If exceeded, we return the full account list so the AI can narrow down.

Timeout protection. Every query races against a 55-second deadline — a comfortable buffer before ChatGPT’s ~90-second HTTP timeout. On timeout, we return actionable suggestions: narrow dates, reduce dimensions, use coarser granularity.

Dimension sanitization. AI models love adding "date" as a dimension. It’s always wrong — time aggregation is a separate parameter. We strip it silently instead of failing the query.

Client-aware budgets

Different AI clients get different token budgets:

Claude Code: 80,000 tokens
ChatGPT Web: 80,000 tokens
Claude Web: 30,000 tokens
Unknown clients: 30,000 tokens

The optimizer reads the client type from the MCP auth context and adjusts. The same query might return full data on Claude Code but trigger a data summary on Claude Web.

What we learned from user feedback

A few patterns stood out from watching real usage:

People query big. The average query spans 3+ months of daily data across multiple campaigns. Token limits aren’t an edge case — they’re the common case for any serious analysis.

AI models don’t handle partial data well. When given truncated results, models present them as complete without adequate caveats. Users make decisions on incomplete information without knowing it. The data summary approach solves this — the AI knows it’s working with a summary, not the full picture, and acts accordingly.

Timeouts are worse than large responses. A query that returns a data summary in 3 seconds is infinitely better than one that hangs for 90 seconds and fails. The guardrails matter as much as the optimization.

Date-as-dimension is universal. Every AI model we tested eventually tries to add “date” as a dimension instead of using the time aggregation parameter. Silently fixing this saved more failed queries than any other single change.

The broader MCP landscape

While building this, we looked at what others are doing about context window management:

Dynamic toolsets (Speakeasy) reduce tool definition overhead — load schemas on demand instead of sending 50 upfront. This optimizes the request side.

Programmatic tool calling (Anthropic) lets AI write code to process data locally. Powerful, but requires a code execution environment.

Server instructions explain cross-tool relationships in a single block. GitHub’s MCP server went from 20% to 80% accuracy by adding explicit instructions.

Our approach — optimizing the data payload — is complementary to all of these. Different layers of the same problem.

Try it yourself

The optimizer is live on the Detrics MCP server. Connect from app.detrics.io/mcp and try a large query — daily campaign data over several months. You’ll see either the full CSV with _meta, or a data summary with re-query suggestions.

If you’re building your own MCP server, the pattern is simple: validate, CSV, coarser granularity, summarize. No truncation needed.

Questions? support.detrics.io.

Back to all posts