Are Your Agents the Problem, or Are You Just Bad at Communicating?
TL;DR: The Communication Economy
I’ve spent a decade learning that in the era of LLMs, imprecise language is a compilation error. Every extra word in your prompt is a potential hallucination vector; if I wouldn't write spaghetti code, I shouldn't write spaghetti prompts. By applying management principles to agentic development, I’ve found that treating an LLM like a Staff Engineer—rather than a magic box—is the key to performance. In this piece, I break down the research-backed strategies I use to achieve 67% token savings and higher accuracy by treating clarity as a performance metric
Introduction
In 2007, if I asked you to picture a "Software Engineer," what would you have imagined? Likely the stereotype we all grew up with: a reclusive nerd, an antisocial coder, or someone who found solace in the warmth of a homemade server room because people were just too... variable.
I remember picturing this myself and thinking, “I refuse.”
I didn’t want my code to be my only voice; I wanted my communication skills to be a differentiator. I spent the next decade obsessing over both "soft" and technical skills, learning how to distill complex architecture into stories that non-technical stakeholders could not only understand, but rally behind.
At the time, I thought I was just becoming a better manager. In reality, I was preparing for the agentic paradigm shift no one saw coming. We aren’t just writing recursive loops anymore; we are casting intentions into probabilistic engines.
If your agents are hallucinating, your costs are spiraling, and your latency is unacceptable, it might not be a model failure. It might be a communication failure. In agentic development, clarity isn't just a soft skill, it’s a performance metric. Token efficiency is the new communication economy.
The Micro-Economics of Language (Token Efficiency)
I’m sure we’ve all learned the hard way about O(n)2 loops, full table scans, and the lie that is zero-latency networking. We know this because efficiency isn't just "nice to have," it's the difference between a scalable system and a crashed server.
Yet, when we write prompts for agents, we often abandon these principles. We treat the Context Window like a dump truck, shoveling in paragraphs of "context," pleasantries, and ambiguous instructions, hoping for… what?
Today, imprecise language is a compilation error. It increases latency, spikes costs, and, crucially, confuses the logic. Recent research validates what every experienced manager already knows: a checklist beats a manifesto. Researchers from the University of Connecticut recently introduced a scaling law for token efficiency to quantify this:
Accuracy = A ᐧ V𝛽 ᐧ M𝛾 +E
Where:
A: A constant coefficient representing the baseline efficiency of the process.
V (Volume): The effective data size.
M (Model Size): The parameter count of your agent.
𝛽, 𝛾 (Exponents): The scaling exponents that determine the rate of improvement.
E: The irreducible error (the "noise" floor).
They proved that accuracy isn't random. It is a predictable function of how well you balance your Dataset Volume (V) multiplied by average length (L) against your Model Size (M).
The researchers tested this law by comparing a "Few Long" strategy (massive, sprawling blocks of text) against a "Many Short" strategy (concise, atomic segments). The results spoke for themselves: "Many Short" consistently outperformed "Few Long" in accuracy, even when the total compute budget was identical.
The TALE Effect: Budget-Aware Reasoning
If "Many Short" is the architecture for success, the TALE effect explains why we fail when we ignore it. A study on Token-Budget-Aware LLM Reasoning (TALE) found that models suffer from "Token Elasticity." Without constraints, an agent naturally over-explains, generating redundant tokens that add zero value.
However, when researchers forced models to "estimate" their effort first and adhere to a strict token budget, the results were game-changing: a 67% reduction in token usage with less than a 3% drop in accuracy.
The Staff Engineer Paradox
To solve the math of the Scaling Law, we have to solve the "Staff Engineer Paradox." We treat agents like magic boxes that should "just know," but you should instead think of your agent as a Staff Engineer on their first day.
They’ve read every repo on GitHub and know more syntax than you ever will, but they have zero context on your specific business logic. If you assume they "get it," they will use their massive compute power to hallucinate a very convincing, very wrong solution.
The Paradox: To get the best out of this "Senior" intelligence, you must communicate with the explicit clarity you would use for a Junior.
The "Few Long" Failure: A 4,000-token stream of consciousness. The agent doesn't lack the skill to understand it; it lacks the boundaries to prioritize it. It drowns in the noise.
The "Many Short" Success: You respect the agent's intelligence by respecting its constraints. You provide a specific User Story with a clear Definition of Done: "Fix the latency. Here is the log. Do not refactor the auth service."
Refactoring Your Prompts
We have to stop viewing "conciseness" as being rude; it is an optimization problem.
Feature | The Old Way (Unconstrained) | The New Way (Pattern-Based) |
Prompt | "Let's think step by step and take your time..." | "Estimate complexity. Solve in <100 tokens." |
Diagnosis | Creates a high-latency, high-cost drift. | Forced "meta-reasoning" before execution. |
Result | High cost, low reliability. | Low Latency, High Accuracy. |
Every extra word in your prompt is a potential hallucination vector. If you wouldn't write spaghetti code, don't write spaghetti prompts.
The Architecture of Context (Cognitive Load)
If Token Elasticity wastes your budget, "Cognitive Load" is the onboarding overhead that kills your performance. In both human brains and Transformer architectures, Attention is a finite economic resource. There is a dangerous misconception in Agentic Development: the idea that because a model has a 128k or 1M token context window, you should fill it. But would you do that to your Staff engineer on their first day?
The Theory: John Sweller (1988) argued that if you overload a learner with "Extraneous Load" (bad formatting/noise), they have zero capacity left for "Germane Load" (solving the problem).
The Reality: LLMs function identically. As context grows, the model's ability to attend to specific tokens degrades.
Sweller wasn't thinking about Sonnet-4.5, but he perfectly described why your 2,000-word prompt is giving your agent a digital migraine. When you dump 500 pages of unstructured docs into a prompt, you are bankrupting the model's attention span. You are forcing it to spend its "compute budget" on parsing your mess, rather than executing your logic.
A simple fix to this, is applying a standard essay structure (Communication) to System Prompts (CS). This is how we implement the "Segmentation beats Volume" math we saw in the research.
If you standardize inputs into three layers to optimize the model's "Working Memory":
The Top Bun (System Instructions): Who are you? (Framing).
The Meat (The Context): The specific code/logs only. (Relevance).
The Bottom Bun (Output Format): JSON/Markdown. (Constraints).
The "Zero-Shot" Trap
The economy of communication collapses in "Zero-Shot" prompting. While efficient for simple tasks, it is a liability for discovery. When you use Zero-Shot for complex tasks, you run into the problem illustrated by this famous educational cartoon. A teacher tells a diverse group of animals: “For a fair selection, everybody has to take the same exam: Please climb that tree.” When you shout "Write a unit test" (Zero-Shot) you assume "Unit Test" means the same thing to the model as it does to you.
The Monkey (your Python backend) might succeed.
The Goldfish (your SQL database) will suffocate.
Instead of paying for the model to hallucinate (climb the tree) and then paying again to fix it, you invest upfront in Few-Shot Prompting.
Prompt: "Write a test using pytest. Here is a 3-line example of our mocking pattern."
Result: The model stops guessing. Latency drops. Accuracy spikes.
Agent "Management"
There is a reason we call it "Prompt Engineering," but truthfully, it is "Prompt Management."
When you write code, you are a laborer laying bricks. You control every logic gate. When you prompt an agent, you are a foreman directing a crew. You control the outcome, but the agent controls the implementation.
The friction most developers feel, the hallucinations, the loops, the wrong turns, is the same friction a new manager feels when they realize their team can’t read their mind. We can map standard Agentic Patterns directly to management styles. Choosing the wrong one is why your agent fails.
Future-Proofing: The Era of Orchestration
We are moving toward Agentic Teams where "Manager Agents" direct fleets of specialists. If your communication is sloppy with one agent, you get a hallucination; with a fleet, you get a cascade failure.
To manage this fleet, you need a shared language. In Computer Science, we call this Domain Modeling. In Management, we call it Onboarding. Essentially, it is a schema for the team’s brain. Before you ask a fleet to build a feature, you must inject the "Business Domain" into their system prompt. You must define what "User," "Account," and "Latency" mean in your specific company. Without this shared dictionary, your agents are just talented strangers shouting at each other.
Conclusion
We started this article with a question: Are your agents the problem, or are you just bad at communicating?
If you have made it this far, you know the answer.
If your agents are hallucinating, spiraling, or burning budget, they are simply holding up a mirror to your own ambiguity. They are showing you exactly where your instructions are vague, where your context is messy, and where your expectations are unspoken.
But here is the silver lining.
The computer scientist Allen Perlis once said, "A language that doesn't affect the way you think about programming is not worth knowing."
The same is true for the language of prompts. As you force yourself to strip away the "fluff" to save tokens, you will find something unsettling happening in your real life. The discipline required to manage a fleet of AI agents is identical to the discipline required to manage a team of humans.
You will stop sending 4-paragraph emails to your CEO when a bulleted list does the job.
You will stop giving vague "look into it" instructions to your junior engineers.
You will start asking your spouse, "What is the definition of done for this weekend?" (Okay, maybe use that one with caution).
We are entering an era where the barrier between "Natural Language" and "Programming Language" has dissolved. You shouldn’t choose between being a "Soft Skills Person" or a "Technical Person." You need both.
So, are you the problem? Maybe. But if you master the syntax of clarity, you will be the solution.
Related Links:
Watch the follow-up podcast on YouTube
Listen to the follow-up podcast on Apple Podcasts and Spotify.
Stay connected
Sign up to stay connected and receive the latest content and updates from us!