Building an AI agent that calls APIs? Read this first

The 3 hardest technical problems I hit building an AI agent that calls real APIs — and the fixes that actually work.

Building an AI agent that calls APIs - 3 pitfalls

I wish someone had written these down before I spent a month figuring them out. Here are the three hardest technical problems I hit building an AI agent that calls real APIs.

LLMs send partial payloads on write operations

You ask the agent to update a record. It sends only the fields you mentioned in the prompt. The PUT request goes through, returns 200, and you have silently wiped every field you did not specify.

This is not a hallucination problem — the model did exactly what you asked. It just has no concept of "the rest of the object" unless you give it one.

The fix: Before every write call, fetch the current resource state via the companion GET endpoint and deep-merge the LLM's payload on top. The LLM only needs to specify what is changing — the executor fills in the rest.

def safe_update(endpoint, llm_payload):
    current = requests.get(endpoint).json()
    merged = {**current, **llm_payload}
    return requests.put(endpoint, json=merged)

This way a prompt like "change the customer name" cannot accidentally blank out the address, phone number, and every other field on the record.

LLMs hallucinate success when API calls fail

A tool returns a 404. The agent says, "Done, the record was updated!" The user trusts it. The record was never touched.

This happens because LLMs are trained to be helpful. When a tool response is ambiguous — or even clearly an error — the model will often interpret it in the most optimistic way possible and report success.

The fix: Explicitly prefix every error response with Error: and add one line to the system prompt:

If a tool returns a message starting with "Error:", report it directly to the user. Do not assume success.

def call_tool(func, *args, **kwargs):
    try:
        response = func(*args, **kwargs)
        if response.status_code >= 400:
            return f"Error: {response.status_code} — {response.text}"
        return response.json()
    except Exception as e:
        return f"Error: {str(e)}"

Without this, the agent will confidently lie every time. The explicit prefix gives the model an unambiguous signal it cannot rationalize away.

Query parameters break in subtle ways

The LLM passes query params as a plain string instead of a dict. The request fires, looks fine in logs, returns nothing. No error. Just silence.

This is especially insidious because everything appears to work. The HTTP call succeeds with a 200. The response is a valid but empty list. The agent reports "no results found" — which sounds plausible — when the real issue is malformed parameters.

The fix: Coerce string inputs to dicts in the tool executor and be extremely explicit in the field description about the expected shape — including a concrete example.

import json

def normalize_params(params):
    if isinstance(params, str):
        try:
            return json.loads(params)
        except json.JSONDecodeError:
            return dict(pair.split("=") for pair in params.split("&") if "=" in pair)
    return params

In the tool schema, do not just say params: dict. Say something like:

params: A JSON object of query parameters.
Example: {"status": "active", "limit": 10}
Do NOT pass as a string like "status=active&limit=10".

The more explicit the schema, the less room the model has to improvise a format that silently fails.

Why none of this is obvious

None of these problems show up in tutorials or documentation. They only surface when you ship something real and watch it break against live data.

Each one shares the same root cause: LLMs do not operate with the same assumptions as a human developer. They do not know what a full PUT payload looks like, they do not treat HTTP 404 as a hard failure, and they do not have a strong prior about parameter serialization. Your tool executor has to bridge that gap.

If you are building anything that connects an LLM to a real API, design your executor layer to be defensive by default — validate inputs, normalize formats, and never trust that the model understood what "success" means.

Run Gemma 4 26B MOE Locally on a Mac with Only ~6GB RAM

On This Page

LLMs send partial payloads on write operations LLMs hallucinate success when API calls fail Query parameters break in subtle ways Why none of this is obvious