LangChain is one of the most popular frameworks for building AI agents. But it has a blind spot: there's no built-in way to set a dollar-based budget limit on an agent run. You can set max_iterations, but that doesn't tell you how much a run will cost — and it won't stop an agent that's burning $0.20 per iteration.
This tutorial shows you how to add hard budget limits, loop detection, and cost tracking to any LangChain agent using AgentGuard. Each step includes complete code you can copy and adapt.
The problem — LangChain agents have no built-in budget controls
LangChain's AgentExecutor provides two knobs for controlling agent behavior: max_iterations and max_execution_time. Both are useful, but neither addresses cost directly.
max_iterations=15 caps the number of tool-call loops, but says nothing about how much each iteration costs. With GPT-4o, a single iteration with a large context window can cost $0.15 or more. Fifteen iterations at that rate is $2.25 — and that's for one user request.
max_execution_time is even less useful for cost control. A fast agent can burn through $5 in 10 seconds. A slow agent might take 60 seconds but cost $0.03. Time and cost are not correlated.
What you actually need is a way to say: "Stop this agent if it has spent more than $1.00 in LLM API costs." That's what AgentGuard provides.
Step 1 — Install agentguard47
AgentGuard's Python SDK is a single package with zero dependencies. Install it alongside LangChain:
# Install the AgentGuard SDK pip install agentguard47 # You'll also need LangChain and an LLM provider pip install langchain langchain-openai
That's it. No extra services to run, no Docker containers, no config files.
Step 2 — Set up the AgentGuard callback handler
AgentGuard integrates with LangChain via a callback handler. This handler intercepts every LLM call, tracks token usage, calculates cost, and sends telemetry to your dashboard:
from agentguard47 import AgentGuardHandler, Tracer, HttpSink # Create a tracer that sends data to AgentGuard tracer = Tracer( sink=HttpSink(api_key="ag47_your_key_here") ) # Create the LangChain callback handler handler = AgentGuardHandler(tracer=tracer)
The AgentGuardHandler implements LangChain's BaseCallbackHandler interface. It hooks into on_llm_start, on_llm_end, and on_tool_start events to capture costs and behavior in real time. Every LLM call is logged with its token count, model, latency, and calculated cost.
Step 3 — Add budget enforcement
Now add a BudgetGuard that sets a hard dollar limit on the agent run:
from agentguard47 import BudgetGuard # Hard stop at $1.00 — agent will gracefully halt if budget is exceeded budget_guard = BudgetGuard(max_cost_usd=1.00) # Add it to the handler handler = AgentGuardHandler( tracer=tracer, guards=[budget_guard] )
When the cumulative cost of LLM calls in a single run crosses $1.00, AgentGuard raises a BudgetExceeded exception that LangChain's executor catches gracefully. The agent stops, partial results are preserved, and the event is logged to your dashboard with the reason budget_exceeded.
This is a hard ceiling, not an estimate. AgentGuard tracks actual token usage reported by the LLM provider, not approximations.
Step 4 — Add loop detection
Budget limits catch cost overruns, but loop detection catches them earlier. If your agent calls the same tool 5 times in a row with similar arguments, something is wrong — and you want to stop it before it wastes budget:
from agentguard47 import LoopGuard # Detect repeated tool calls — stop after 3 identical calls loop_guard = LoopGuard(max_repeats=3) handler = AgentGuardHandler( tracer=tracer, guards=[budget_guard, loop_guard] )
The LoopGuard watches the sequence of tool calls. If the same tool is called with the same (or nearly identical) arguments more than max_repeats times consecutively, it triggers a graceful stop. This catches the most common failure mode in production agents: an error-retry loop where the agent keeps calling a broken tool hoping for a different result.
Step 5 — Run your agent and view costs
Now wire everything into your LangChain agent and run it:
from langchain_openai import ChatOpenAI from langchain.agents import AgentExecutor, create_openai_tools_agent from langchain_core.prompts import ChatPromptTemplate # Set up LLM and tools as usual llm = ChatOpenAI(model="gpt-4o", temperature=0) tools = [your_search_tool, your_calculator_tool] prompt = ChatPromptTemplate.from_messages([ ("system", "You are a helpful research assistant."), ("human", "{input}"), ("placeholder", "{agent_scratchpad}"), ]) agent = create_openai_tools_agent(llm, tools, prompt) executor = AgentExecutor(agent=agent, tools=tools) # Run with the AgentGuard handler as a callback result = executor.invoke( {"input": "What were NVIDIA's Q4 2025 earnings?"}, config={"callbacks": [handler]} ) print(result["output"])
Every LLM call during this run is tracked. Open the AgentGuard dashboard to see a timeline of calls, per-call costs, total run cost, and whether any guards fired. If the budget was exceeded or a loop was detected, you'll see exactly which call triggered the stop.
Complete example
Here's the full code combining all steps into a single copy-pasteable script:
from agentguard47 import ( AgentGuardHandler, Tracer, HttpSink, BudgetGuard, LoopGuard ) from langchain_openai import ChatOpenAI from langchain.agents import AgentExecutor, create_openai_tools_agent from langchain_core.prompts import ChatPromptTemplate # 1. Configure AgentGuard telemetry tracer = Tracer( sink=HttpSink(api_key="ag47_your_key_here") ) # 2. Set up guards budget_guard = BudgetGuard(max_cost_usd=1.00) loop_guard = LoopGuard(max_repeats=3) # 3. Create the LangChain callback handler handler = AgentGuardHandler( tracer=tracer, guards=[budget_guard, loop_guard] ) # 4. Set up your LangChain agent as usual llm = ChatOpenAI(model="gpt-4o", temperature=0) tools = [your_search_tool, your_calculator_tool] prompt = ChatPromptTemplate.from_messages([ ("system", "You are a helpful research assistant."), ("human", "{input}"), ("placeholder", "{agent_scratchpad}"), ]) agent = create_openai_tools_agent(llm, tools, prompt) executor = AgentExecutor(agent=agent, tools=tools) # 5. Run with AgentGuard callbacks result = executor.invoke( {"input": "What were NVIDIA's Q4 2025 earnings?"}, config={"callbacks": [handler]} ) print(result["output"])
Next steps
You now have a LangChain agent with hard budget limits, loop detection, and full cost visibility. Here's where to go next:
- Quickstart guide — Set up AgentGuard with other frameworks including CrewAI, AutoGen, and raw OpenAI.
- Dashboard — View cost breakdowns, run timelines, and configure alert rules.
- GitHub — Read the SDK source, file issues, or contribute.
Budget enforcement is the single most important safety measure for production agents. It takes two minutes to set up and can save you hundreds of dollars on the first bad run it catches.