Containing Coding Agents: Lessons from a Low-Latency Trading Engine
Introduction
Traditional engineering relies on humans pausing or knowing what's unsafe when crossing the boundary from development to production. Coding agents rarely stop themselves before executing critical actions and can't be expected to reliably know the difference between what's safe and what's not. In building a low latency trading engine from the ground up in C++ using coding agents, and eventually transitioning that from a development environment to production, I had to stop trying to make the agent more careful and start hardening the environment such that carelessness was bounded. This post details what that looked like in practice.
Trading Engine
I set out to build a sub-20 microsecond tick-to-trade trading engine in C++ with agentic coding that connected to traditional crypto venues and prediction markets. The system is designed to be modular: separate market data, order entry, strategy, GUI and analytics processes all communicate over SPSC lock-free ring buffers. While I'd architected and built these types of systems before, doing it with a coding agent was an entirely new challenge. In a little under 4 months and roughly 125,000 lines of code, I had an operational engine trading real money. I won't be going into the trading engine itself in much more depth here, but if that's the part that's interesting to you – reach out!
Environment as a Safety Mechanism
As mentioned above, humans are able to self-regulate when developing: they know they shouldn't point the config at a live account or endpoint, they use only the API key given to them and they won't update IAM permissions if a query fails without asking first. Coding agents force a rethinking of where safety actually lives.
The instinct, when first working with a coding agent, is to try and make the agent more careful. This might mean better prompts, a more detailed CLAUDE.md, or firm reprimanding when things do go wrong. This approach treats the agent as a junior engineer that will learn from its mistakes and remember what it's been told. Further, it doesn't meaningfully solve the safety problem – coding agents fundamentally miscalculate the cost of risky behavior and routinely violate soft constraints – and it costs you the productivity that made agents worth using in the first place.
Instead of trying to constrain the agent, constrain the environment in which it operates. If the development server literally cannot access production API keys, no prompt can cause it to. If the agent's writable filesystem is scoped to a working branch, it cannot accidentally rewrite main. If the only credentials it has access to are testnet credentials, accidentally trying to trade into a live account simply fails. The agent doesn't have to be careful, because the environment imposes bounds on carelessness.
This is the thread that unifies the practices that follow – don't ask the agent to know better, build a development environment where the wrong thing simply cannot happen.
Separation of Development and Production
A well defined separation between development and production environments seems obvious, but when dealing with agents, it's a safe assumption that any boundary that can be crossed will be crossed. As such, I took the following concrete steps to bound my coding agents:
- I use a single server for all development work – myself and the coding agent included
- I leveraged AWS Security Groups and IAM permissioning to prevent the development server from accessing any credentials tagged as production in ParameterStore (the AWS CLI credentials on the dev server also cannot modify its own permissions)
- I explicitly whitelisted only the production server's IP address on the exchange side API whitelisting
- No coding agents installed on the production server whatsoever
- No SSH access into the production server from the development server
While the above considerations may seem obvious when listed out, they're the kind of things that an organization with exclusively human developers may get away with relaxing for some time, but when using coding agents it can create problems within hours or days.
Structured Logging and Access
The key to any long running, production system is thorough and detailed logging. Logging allows us to diagnose issues after they've happened and learn from our mistakes. Low latency trading engines require frequent and persistent log diving to understand why market data, order entry and strategy logic isn't operating the way they were intended. I've spent countless hours of my own time and delegated countless hours to others to review logs and make sense of a bug or system failure. With coding agents, a well structured and detailed log, with consistent and reliable access methods, can give hours back to human developers while the coding agent is dispatched to piece together a sequence of events. It seems simple enough, but ensuring a predictable naming scheme and access pattern for development and production logs (paired with Claude's new auto mode) allows the user to provide a single prompt to the coding agent, switch screens and come back 5 minutes later to a thorough analysis. I leveraged Grafana to enable access to production logs from the development server (remember: we don't want the agent SSH'ing into the server to access the files) and built two separate skills for local log retrieval and production log retrieval. Spending a little time making sure the agent is aware of the universe of hosts, services, log levels and formats will save you hours of iteration on the backend. Referencing our core principle, safe access to production logs via Grafana or a similar service lets you confidently set the agent on auto to process the logs without any concern that a production environment will get disrupted.
Github Issues & Branches
Leveraging Github issues and branches is standard practice, but one that often goes overlooked in reference to "vibe coding" or agentic development. I used Github issues to create a page that contains the entire context needed for a fresh agent to begin implementation. I often spent more time working with an agent to architect, design and write the issue than I did in implementation. A well written issue should be self contained and digestible for any coding agent starting off with zero context. In writing the issue, I'd prompt the coding agent to work with me to construct a plan that we could present to a brand new coding agent. This process was itself self-policing in that it prevented me from committing too large a piece of work to one issue – if it was too big for the agent writing the issue, it was too big for a single issue and needed to be broken up.
In my CLAUDE.md, I explicitly instruct the coding agent to use a new branch and commit frequently when beginning work on a new issue. This creates a natural and self-contained environment in which the coding agent could run wild without any concern that it might disrupt or otherwise overwrite the main branch. As I mentioned, in any large organization this is simply good coding practice, but it's easy to overlook when operating at the speed of an agent.
Model Drift
Frontier labs ship new models every few months, which means working with coding agents over a long-running project is like getting a new coworker every two months. Last month they were careless with deadlocks, and now they consider them carefully. Last month they weren't able to get the code to compile in less than four attempts, and now they're able to get it on the first try. There are new strengths, but also new weaknesses. While a model upgrade is generally net better, at a minimum it is different. This creates a problem that the bounded-carelessness approach can't fully solve. It relies on knowing the model's faults and building the environment around them. But what the agent gets wrong is a moving target. The CLAUDE.md rules, skills and prompting patterns were all calibrated against a model that no longer exists. This results in a kind of guardrail decay. Old rules become stale or counterproductive – at best it wastes context, and at worst it can cause the model to overengineer. Meanwhile, new failure modes show up that I didn't know to look for and in some cases the model simply regresses.
This model drift creates a new layer of work: in addition to building and extending the current system, I need to remain vigilant about whether the model is behaving in line with the environment I'd built for it. This type of work tended to be more reactive than proactive for me. I would notice the model failing in a new way, or trying something dangerous that it hadn't before, and I would quickly implement a rule or policy that prevents this behavior.
As long as we're relying on models served to us by someone else and tuned in the background, the most reliable guardrails are structural. The ones that decay fastest are prompt-level rules calibrated against a specific model's quirks.
Closing
Coding agents change how fast production systems can be built, but they don't change what makes the systems safe. The discipline is the same as in any serious software organization – well-defined environments, clear deployment boundaries and structured feedback loops. What changes is that you can no longer take the details for granted. An oversight a human developer might have flagged is now a single prompt away from failure.