Which LangGraph Calls Actually Stream? Three Ways to Control Token Output in LangGraph.js
When you run graph.stream() with streamMode: 'messages', every LLM call in every node streams to the client by default. Here's how to control that precisely — without breaking your LangSmith traces.
The Problem: Both Nodes Are Streaming
You build a two-node graph. One node classifies the user's intent — an internal step, never meant to surface to the UI. The other generates the actual response. You call graph.stream() with streamMode: 'messages', hook it up to your API route, and start the server.
The client receives tokens from both nodes. The UI shows garbled output. The classifier's reasoning leaks into the response stream before the responder even starts.
This isn't a bug. It's exactly how LangGraph.js is designed to work — and it catches enough people that langchain-ai/langchainjs#9455 was filed specifically about summarizationMiddleware leaking its internal model calls to the UI, with no obvious way to suppress them.
Why Every Node Streams by Default
When you call graph.stream() with streamMode: 'messages', LangGraph creates a StreamMessagesHandler and attaches it to the root callback manager for the entire graph run. From there, every node receives a child callback manager derived from that root — via LangChain's standard getChild() mechanism.
The consequence: every llm.invoke() call anywhere in the graph fires handleChatModelStart on that handler, and tokens start flowing upstream whether you want them to or not.
The LangGraph.js streaming docs describe it plainly: streamMode: 'messages' streams "all messages from all nodes." That's the feature. If you have nodes that shouldn't stream, you have to say so explicitly.
There are three ways to do that. They're not interchangeable.
Mechanism 1: TAG_NOSTREAM — Suppress at Source, Keep LangSmith
This is the right default for production code.
async function classifyNode(state: AgentState) {
const response = await llm.invoke(state.messages, {
tags: ["langsmith:nostream"],
});
return { intent: parseIntent(response) };
}The StreamMessagesHandler.handleChatModelStart implementation checks whether "langsmith:nostream" is present in the run's tags before registering it for streaming. If it is, zero tokens enter the stream. The run still executes normally — it just doesn't contribute to the output.
Critically, the LangSmith tracer is not affected. The callback chain continues as usual; only the streaming attachment is skipped. Your LangSmith trace shows the classifier run exactly as before, with full input/output and latency data.
The source for this behavior lives in src/pregel/messages_stream.ts in the langchain-ai/langgraphjs repo — search for handleChatModelStart to see the tag check directly.
A note on the constant name: In the Python version of LangGraph, TAG_NOSTREAM ("nostream") is a named constant exported from langgraph.constants. In JS/TS, the equivalent constant ("langsmith:nostream") exists in the @langchain/langgraph package source but is not currently exported from the package's main index — so we pass the string value directly. The two values differ between languages, but the JS runtime accepts both for backwards compatibility.
When to use it: Any internal LLM call where you own the invoke() call site. This covers the vast majority of custom graph nodes in production systems.
Mechanism 2: { callbacks: [] } — Suppress at Source, Kills LangSmith
This is the blunt-instrument version.
async function classifyNode(state: AgentState) {
const response = await llm.invoke(state.messages, {
callbacks: [],
});
return { intent: parseIntent(response) };
}Passing callbacks: [] replaces the inherited callback manager entirely, severing the StreamMessagesHandler. Tokens stop flowing to the client. But it severs the LangSmith tracer too — the nested LLM run disappears from your trace. If you're running in a production environment with observability configured, this is the wrong choice.
{ callbacks: [] } hides the LLM run from LangSmith entirely. Your trace will show the node executed but the internal model call will be invisible. Use TAG_NOSTREAM instead for production code.
There is one legitimate use case: local development or test sandboxes where LangSmith isn't configured and you explicitly don't want the noise. The other is library and middleware code — the LangChain.js team used exactly this approach in PR #9640 to fix the summarizationMiddleware streaming leak reported in #9455. For internal library code, there's no meaningful LangSmith trace identity to preserve at that layer, so { callbacks: [] } is the right call. For your own application nodes, it isn't.
When to use it: Dev/test environments with no observability concern, or internal library/middleware code where the LLM call has no meaningful trace identity.
Mechanism 3: Route-Level langgraph_node Filtering — Suppress at the Consumer
The first two mechanisms act at the source — they decide whether tokens enter the LangGraph stream at all. This third mechanism acts at the consumer — it lets tokens enter the stream, then decides which ones to forward to the client.
// In your API route handler
const stream = await graph.stream(input, {
streamMode: 'messages',
configurable: { thread_id: threadId },
});
for await (const [chunk, metadata] of stream) {
if (metadata.langgraph_node === 'responseNode' && chunk.content) {
send({ type: 'message_delta', content: String(chunk.content) });
}
}Every message in a streamMode: 'messages' stream comes as a [chunk, metadata] tuple. The metadata.langgraph_node field tells you which node produced it. You forward only what you want.
This approach leaves the LangGraph stream fully intact. All tokens arrive at the route handler; the route handler filters. LangSmith traces are unaffected.
When to use it: Three situations make this the only viable option:
-
Prebuilt nodes.
createReactAgentand similar constructs make LLM calls inside library code — there's noinvoke()call site you can reach. Source-level suppression isn't available. Route-level filtering is the only way to target specific nodes. -
Multiple consumers. If the same graph is consumed by different API routes that need different subsets of the token stream, route-level filtering lets each route make its own decision without modifying the graph.
-
Context-dependent terminals. If a node is terminal in some graph configurations and internal in others, filtering by node name at the route level avoids embedding routing logic into the graph itself.
The Decision Matrix
| Situation | Approach |
|---|---|
| Internal node, production system | { tags: ["langsmith:nostream"] } |
| Internal node, no LangSmith concern | { callbacks: [] } |
| Prebuilt/third-party node (no call site access) | Route-level langgraph_node filtering |
| Multiple consumers, different forwarding needs | Route-level filtering |
| Node is terminal in some graphs, internal in others | Route-level filtering |
Subgraph called via .invoke() without config passthrough | Nothing — already isolated |
The Subgraph Isolation Nuance
This one catches people. It's worth understanding clearly.
When a wrapper node calls a subgraph like this:
async function wrapperNode(state: AgentState) {
// Note: no config argument passed
const result = await subgraph.invoke({ messages: state.messages });
return { summary: result.output };
}The parent graph's StreamMessagesHandler does not propagate into the subgraph. You could add { callbacks: [] } to every internal node in that subgraph and it would make no difference — the streaming context was never there to begin with.
Why? The StreamMessagesHandler is attached when createDuplexStream sets CONFIG_KEY_STREAM on config.configurable. That only happens at the top-level graph.stream() call. A nested .invoke() that doesn't receive the parent config starts with a clean slate.
The contrast matters: if the wrapper node accepts and forwards config:
async function wrapperNode(state: AgentState, config: LangGraphRunnableConfig) {
// Config is explicitly passed through — streaming context propagates
const result = await subgraph.invoke({ messages: state.messages }, config);
return { summary: result.output };
}Now the subgraph inherits the streaming context, and suppression inside it becomes meaningful again.
If you're reading a codebase and see { callbacks: [] } on nodes inside a subgraph that's called without config passthrough, those calls are inert. They don't hurt anything, but they mislead readers into thinking the tokens would otherwise stream — they wouldn't. The LangGraph.js subgraph concepts doc covers how config propagation works in more detail.
A Note on Prebuilt Agents
createReactAgent wraps a tool-calling loop in a prebuilt subgraph. The LLM invocations happen inside the library. You can't tag them with TAG_NOSTREAM; you can't inject { callbacks: [] }.
If you're using createReactAgent as a node inside a larger graph and you only want to stream its final output, route-level filtering by metadata.langgraph_node is your only tool. The createReactAgent API reference documents the node names used internally so you know exactly what to filter on.
The Mental Model
There are two sides to the stream, and each mechanism acts on a different side.
Source side (TAG_NOSTREAM, { callbacks: [] }): controls whether tokens enter the LangGraph stream at all. Once you suppress at the source, there's nothing to filter downstream. This is permanent and global across all consumers of the graph.
Consumer side (route-level filtering): controls which tokens exit the stream toward the client. The full token stream is preserved internally. Different routes can apply different filters to the same graph without touching the graph code.
Source-side suppression is the right tool when the node is categorically internal — a classifier, a planner, a validator that should never surface raw tokens to any consumer. Consumer-side filtering is the right tool when the same tokens might be wanted by some consumers and not others, or when you're working with library code you can't modify.
Getting this wrong in either direction is costly: suppress too little and your UI streams garbage; suppress too much with { callbacks: [] } and your traces go dark. TAG_NOSTREAM is the narrow path that threads both concerns.
For deeper context on the callback runtime behavior that makes all of this work, the LangChain.js callbacks at runtime guide explains the inheritance model that LangGraph.js builds on.
Streaming control is one of those topics that's easy to get wrong the first time you build a multi-node graph — and the kind of thing that becomes second nature once you understand the underlying mechanism. If you want to go deep on LangGraph.js patterns like this, including persistence, human-in-the-loop, subgraphs, and production-grade agent architecture, the LangGraph.js Mastery course covers it across 36 hands-on lessons with a real codebase you build from scratch.