Ben •
You're 90 minutes into a Claude Code session. Non-trivial task - a multi-file refactor, something with actual state. You've made three architectural decisions, ruled out two approaches, and are in the middle of wiring up the fourth file when the context window fills and compaction kicks in.
The agent comes back. It asks you to clarify what you're working on.
Some of what it knew survives the summary. Most doesn't. The specific reasoning behind the decisions - gone. The approaches you already ruled out - gone. The half-formed constraint you mentioned twenty minutes ago that turned out to matter - gone. You spend ten minutes re-explaining the premise of the session, and you still can't fully reconstruct the context the pre-compaction agent was working from.
The next morning, you open a new session. Every long Claude Code session ends the same way: 20 minutes the next morning re-explaining what the previous session already figured out. The new session has your codebase, your persistent project rules , and a vague memory of what you were doing. It does not have the context the previous session learned.
This isn't an edge case. It's the default state of any AI coding task that takes longer than one session. AI coding tools are excellent at the work. They are structurally indifferent to remembering what work they did.
This isn't a model problem. It's a handoff problem. And handoff problems have engineering solutions.
Most content about AI coding session management focuses on one problem: the context window fills up. Use /compact. Scope your sessions. Keep your CLAUDE.md clean.
That advice isn't wrong. But it addresses the least damaging of the three surfaces where continuity actually fails.
Surface 1: Mid-session compaction. The context window fills and the agent summarizes. Summaries are lossy by design - the agent decides what to keep. Specific reasoning, inline decisions, and ruled-out approaches: these compress poorly. What survives is the broad shape of the task; what gets lost is the texture of the work. /compact is a damage-limitation mechanism, not a continuity mechanism. It keeps the session alive; it doesn't preserve what the session knew.
Surface 2: Session-to-session restart. This is where most developers lose the most time, and almost no tooling addresses it directly. A new session starts cold. It has your codebase and your persistent configuration, but it doesn't have the session's learned state: the hypotheses explored, the decisions made, the things explicitly flagged as "don't forget this." Depending on the task, reconstructing that state takes anywhere from five minutes to half an hour. Over a multi-day task, that's a real cost.
Surface 3: Multi-session task handoff. The highest-stakes surface. A feature taking four days, or a task two developers are sharing, or a debugging thread you need to hand to a colleague. The session state - the working model of the problem that only exists in the LLM's context - has no place to live. Without deliberate engineering, it evaporates at session end. With it, it becomes a transferable artifact.
Most existing content on session management addresses Surface 1 only. The protocol in this post is designed primarily for Surfaces 2 and 3, where the actual productivity losses accumulate.
What session-continuity sloppiness actually costs. The token cost is real, but the rework cost is fatal. Re-reading a mid-size codebase to reconstruct prior context runs 20,000–60,000 tokens per session restart, and with current API pricing that adds up on multi-day tasks. The higher cost is rework time. A 20-minute re-orientation at the start of each session means roughly 90 minutes lost per week for a developer running three to four AI coding sessions a day - before accounting for the subtler cost of the agent re-exploring already-eliminated hypotheses and re-making already-settled decisions. On a five-day refactor, that compounds into a full session's worth of wasted work. The protocol in this post is designed to recover most of it. You might think throwing a larger context window at the problem is the answer. It isn't.
The intuition is seductive: if the model can hold 1M tokens, it can hold everything. Session continuity is solved.
It isn't. Research on long-context LLM performance consistently shows that model quality degrades as input length grows - not uniformly, but predictably. The Stanford "lost in the middle" finding is the canonical result: models perform well on information at the beginning and end of a long context, and substantially worse on information in the middle. A 200K-token session doesn't give the agent full access to 200K tokens of relevant state; it gives it full access to some of those tokens and degraded access to the rest. Chroma's 2025 multi-model study found consistent performance degradation across 18 models as context length grew - the mechanics behind this are worth understanding if you're relying on long windows to preserve session state.
There's also a cost dimension that doesn't go away with larger windows. Re-reading large contexts on every turn is expensive. Compaction exists partly because Anthropic's own research found that long, unmanaged contexts produce worse outputs and cost more to run.
The argument here isn't that context windows don't matter. It's that the session-continuity problem is structurally different from the context-length problem. You can have a 1M-token window and still start every session from scratch. The issue isn't how much the model can hold inside a session - it's what survives the boundary between sessions.
Here's the framing that most continuity content misses.
Sessions are bounded by definition. Claude Code doesn't persist state between invocations. The model doesn't remember your last session; it reads what you give it. That's not a limitation to work around - it's a property to engineer around.
The boundary between sessions is the engineering surface. What you choose to carry across that boundary determines what survives. And right now, most developers carry nothing intentional: they open a new session, dump in some context, and rely on the model to reconstruct what it needs from the codebase.
Think of each session as producing two artifacts:
The work artifact is well-served by existing tools: git history, PRs, and commit messages. The state artifact has no equivalent. Developers don't have a standard place to put it, a standard format for capturing it, or a standard protocol for handing it to the next session.
That's the gap this post is addressing.
The discipline of managing session-boundary state is one layer of what Anthropic and others are calling context engineering for agents . For the full picture - not just session boundaries but the whole architecture of what you give an LLM and why - that post is worth reading alongside this one.
In April 2026, Anthropic published "Effective Harnesses for Long-Running Agents" - a rich technical document written primarily for teams building agent harness infrastructure. Most of it is about sub-agent orchestration, tool call design, and compaction strategies at the framework level.
Three patterns from that research translate directly to individual developers running long Claude Code sessions without any custom harness infrastructure.
Pattern 1: The progress log (claude-progress.txt)
Anthropic's harness research describes maintaining an agent journal - a file the agent writes to as it works, recording decisions, progress state, and explicit checkpoints. The harness version is structured and machine-readable. The individual developer version is simpler: a scratch file, adjacent to the work, that the agent updates as a deliberate part of the session workflow.
What this looks like in practice: at the start of a session, you tell the agent to maintain ./scratch/session-log.md as it works. Every significant decision, every ruled-out approach, every flagged concern goes in. At session end, this file is the raw material for your rehydration payload. Without this habit, you're reconstructing from memory. With it, you're curating from record.
Pattern 2: Sub-agent isolation for context-heavy subtasks
Anthropic's harness research recommends spinning up sub-agents in fresh context windows for discrete, bounded subtasks - rather than loading the full session state into a single long-running agent. The insight: fresh context is a feature, not a limitation, when the subtask is well-specified.
For individual developers, the practical equivalent is deliberate session scoping. Rather than running one eight-hour Claude Code session on a monolithic task, decompose into bounded sessions with explicit handoffs. Each session gets a well-scoped entry point. The session boundary becomes an intentional checkpoint rather than an interruption.
Pattern 3: Compaction-resistant prompts
Anthropic's research describes structuring system prompts and early context to be compaction-resistant: front-loading the information that must survive summary, keeping decision-critical state in accessible positions rather than buried in long conversation threads.
The individual developer equivalent: when you know a fact or decision is critical to the rest of the session, state it explicitly near the beginning of the session, not just once mid-conversation. "We've decided X because Y - this is not negotiable for the rest of this session" is a compaction-resistant instruction. Burying that same decision in turn 14 of a 40-turn conversation is not.
These patterns are the professional version of what most developers do informally when they're doing it at all. Naming them, systematizing them, and treating them as deliberate engineering choices is the upgrade.
The concept at the center of this post needs a name, so here it is.
A rehydration payload is a structured, human-curated artifact that captures the transient state of an AI coding session - decisions made, decisions deferred, the current goal, the next action, the files in play, and explicit "don't forget" flags - in a form compact enough to paste as the first message of the next session and precise enough to restore working context without replaying the full transcript.
The name is intentional. A transcript records; a rehydration payload restores. The goal isn't to preserve everything - it's to preserve the minimum sufficient to make the next session as capable as the previous session was at its best.
A well-constructed rehydration payload has six components, in priority order:
UserService and OrderService from the monolith, maintaining the existing interface contracts."OrderService.create(), which depends on the UserContext interface we finalized in the last turn."This is it. Six components. An 800-token payload that hands more working context to the next session than a 12,000-token transcript replay - because it's curated rather than recorded, and signal rather than noise.
This section is worth writing explicitly, because the instinct is to include more.
Don't include full chat transcripts. The next session will infer wrong things from a raw conversation thread - the hedging, the exploratory thinking, the abandoned directions are all in there, and the model can't reliably distinguish what's still true from what was superseded. Curate the output, don't dump the input.
Don't include the full file tree. It's rediscoverable - file structure is one list_files call away. Including it wastes tokens on information the agent can retrieve in seconds, which means fewer tokens left for the decisions and context that aren't rediscoverable. What the session knew about the file tree - which files are in scope, which are irrelevant, which are off-limits - is worth preserving. The tree itself is not.
Don't include every micro-decision. "We used const here instead of let" is not session state. Signal-to-noise discipline is what makes the payload effective. If it doesn't change what the next session would do, it doesn't belong.
Don't include code style preferences. Those belong in your persistent project rules - your CLAUDE.md or AGENTS.md - which the next session already has. The rehydration payload is for transient, task-specific state. Stable preferences live in stable places.
Don't include the four layers of repo-level memory that belong in your memory file stack. AGENTS.md handles project-wide conventions. CLAUDE.md handles tool preferences. The rehydration payload handles the working state of the current task. These are different layers for a reason.
This is the operational version of everything above. Five steps, designed to take under five minutes at session end and under two minutes at session start.
Step 1: Before ending the session, ask the agent to draft its own rehydration payload.
The exact prompt: "Before we close this session, write a rehydration payload for the next session. Include: current goal, decisions made and why, decisions deferred, the next concrete action, files currently in play, and anything you'd flag as 'don't forget.' Keep it under 1,000 tokens."
The agent will produce a reasonable first draft. It won't be perfect - it doesn't know what you'll actually need when you resume - but it's a better starting point than writing from scratch. One formatting note: keep the payload structurally flat. Deep nesting (multiple levels of sub-bullets, deeply indented code comments inside the payload) can cause the model to misinterpret the hierarchy during rehydration. Plain markdown headers and flat lists parse reliably across models and context sizes.
Step 2: Review and edit it. You are the curator.
Read through what the agent produced. Add anything it missed. Cut anything that's noise. Adjust the "decisions made" section to reflect your actual intent, not the agent's interpretation of it. The human review pass is what makes the payload curated rather than generated.
This step takes two to three minutes. It is not optional.
Step 3: Save it adjacent to the work.
File naming convention: ./scratch/session-N.md, where N is the session number. Checking this file into version control is optional but recommended for tasks spanning multiple developers or long timelines. The file being adjacent to the code, rather than in a notes app or a clipboard, means it survives a machine restart and can be committed alongside the session's code changes.
Step 4: When resuming, paste the payload as the first message of the new session.
Not in the system prompt. Not as a file attachment. As the first message in the conversation. This positions it at the front of the context window, where it will receive the highest attention weight and where it will survive compaction most reliably.
Step 5: Ask the agent to confirm its understanding before continuing.
The confirmation prompt: "Based on the context above, what's your understanding of the current goal, and what would you do next?"
This is a cheap verification step that catches misreadings before they become wasted work. If the agent's confirmation matches your intent, proceed. If it doesn't, correct it before starting the session proper. Five lines of correction at session start is worth an hour of misdirected work.
The rehydration payload is half the picture. It handles what the previous session leaves behind - the outbound side of the handoff.
The other half is what the new session brings in: the Notion spec the task is implementing, the design doc with the constraints, the Slack thread where the product decision was made, and the relevant snippet from your architecture notes. These aren't session-specific artifacts. They're external sources that need to be assembled, token-budgeted, and structured before the session starts.
Both halves of this problem are real, and both require deliberate engineering. Without the rehydration payload, the session starts cold on prior state. Without a pre-session assembly layer, the session starts cold on the broader task context. The two failure modes compound: a session that doesn't know what it previously decided and doesn't know the specs it's supposed to implement is starting essentially from scratch.
This is the problem HiveTrail Mesh was built to solve on the input side. Mesh assembles context from Notion, local files, and saved prompt snippets into reusable Stacks - scoped to a specific task, token-counted, and privacy-scanned before export. It handles the "what does this session need to know from external sources" problem so that the rehydration payload can focus on what it's actually good at: "what did the previous session learn."
Mesh handles the pre-session input. The rehydration payload handles the post-session state. Together, they form the full continuity discipline: every session starts with the right external context and the right internal state from the session before.
The protocol above is general. These three scenarios show what it looks like applied to real task types.
The scenario. You're extracting a service layer from a monolith. The task spans three days. Each day's session is two to three hours. By day three, you've made four architectural decisions, touched eleven files, and deferred two problems that will need revisiting after the extraction is done.
The failure mode without the protocol. Each morning's session spends its first 20 minutes re-reading the codebase to reconstruct what the previous sessions did. By day three, the agent is starting from the same understanding it had on day one - it knows the codebase structure, but not the task-specific decisions layered on top. Worse, the two deferred problems aren't flagged anywhere, so day three's session might try to solve them in the wrong order.
The rehydration payload that fixes it:
## Session 3 rehydration payload - ServiceLayer extraction
**Current goal:** Extract UserService and OrderService from the monolith.
Interface contracts must remain unchanged (breaking changes are a blocker).
**Decisions made:**
- No shared singleton state between services (decided session 1 - the original
singleton caused test isolation failures, SessionManager was the specific culprit)
- UserService owns auth logic; OrderService takes a userId parameter, does not
import UserService directly (decided session 2 - circular dep risk)
- Error types are re-exported from each service's own module, not from a shared
errors.ts (decided session 2 - keeps services independently deployable)
**Decisions deferred:**
- Retry logic in OrderService.create() is incomplete. Intentionally skipped -
will address after extraction is stable. Do not "fix" this in session 3.
- The legacy `createOrderWithUser()` convenience method needs a deprecation
wrapper. Flagged for after extraction is complete.
**Next action:** Start OrderService.create() - depends on UserContext interface
finalized at end of session 2 (see src/types/UserContext.ts, lines 14–31).
**Files in play:**
- DONE: src/services/UserService.ts, src/types/UserContext.ts
- IN PROGRESS: src/services/OrderService.ts (create() not yet implemented)
- UNTOUCHED: src/services/NotificationService.ts (out of scope this sprint)
**Don't forget:**
- The test suite for OrderService uses a mock that doesn't reflect the new
interface - will fail until OrderService.create() is complete. Expected.
- There's a hidden dependency in src/legacy/checkout.ts line 88 that imports
UserService directly. It will break. It's on the list.The cost comparison. Session 3 opens with this payload (~600 tokens) rather than a re-read of 2,000 lines of changed code. The agent's first response confirms the goal and names the next action correctly. Day three starts in three minutes rather than twenty.
The scenario. You're hunting a race condition in an async job processor. You've been at it for two hours. You've narrowed it to the lock acquisition sequence in ProcessQueue, eliminated three other hypotheses, and identified the specific log pattern that correlates with the failure. You need to stop for the night and resume in the morning.
The failure mode without the protocol. You open a new session with the codebase and a vague description of the bug. The agent starts from the beginning: asks clarifying questions, suggests the three hypotheses you already eliminated, and gets interested in a side issue that's unrelated. You spend 30 minutes getting back to where you were.
The rehydration payload that fixes it:
## Debug session rehydration - ProcessQueue race condition
**Current goal:** Find and fix the race condition causing intermittent job
duplication in ProcessQueue under high concurrency.
**Hypothesis that survived last session:** Lock acquisition ordering in
ProcessQueue.acquire() is the cause. Specifically: the lock is acquired
after the job is read from the queue but before it's marked in-progress.
Under race conditions, two workers can read the same job before either
marks it. This is the active hypothesis.
**Hypotheses eliminated (do not revisit):**
- Database-level duplicate writes: ruled out - DB write is idempotent,
duplicates occur before the write
- Message queue deduplication: ruled out - the queue is correctly
configured, we verified with message IDs
- Worker timeout + retry overlap: ruled out - timing analysis showed
duplicates occur on first attempt, not retry
**Next action:** Add logging at ProcessQueue.acquire() lines 44–47 to
confirm the race window hypothesis. Run the load test suite at 200
concurrent workers (the failure rate is ~3% at that concurrency -
sufficient to reproduce without a long wait).
**Files in play:**
- src/queue/ProcessQueue.ts (primary - lines 40–60 are the focus)
- src/queue/__tests__/ProcessQueue.test.ts (the load test is here)
- logs/race-condition-samples.txt (3 captured log sequences showing
the pattern - important reference)
**Don't forget:**
- The log pattern: "acquired job X" appearing twice within 50ms for
the same job ID is the signal. Normal acquires are >200ms apart.
- There's a separate, unrelated issue in ErrorQueue.retry() - don't
get drawn into it. It's a different bug.The cost comparison. The morning session opens with the active hypothesis stated, three dead ends blocked, and the exact next action specified. The agent doesn't rediscover eliminated approaches. The debugging resumes where it left off rather than where it started.
The scenario. Two developers are sharing a long-running feature: an async job processor that will take three developers working across a week. Developer A has done two sessions. Developer B is picking up on a different machine for the next two sessions.
The failure mode without the protocol. Developer B opens the codebase, reads the README and the branch diff, and reconstructs task state from code review. This is slow, lossy, and dependent on commit message quality. Architectural decisions that were explicit in Developer A's sessions become archaeological guesses in Developer B's.
The rehydration payload that fixes it:
## async-job-processor - feature handoff payload (A→B)
## Last updated: end of Dev A session 2
**Feature goal:** Async job processor with priority queuing, retry logic,
and dead-letter handling. Target: 500 jobs/sec sustained, p99 latency <200ms.
**Architecture decisions locked in:**
- Priority queue implemented as a min-heap, not a sorted list
(sorted list benchmarked 4x slower at >10K queued jobs - do not revisit)
- Workers are stateless processes; all state lives in Redis
(the original proposal for in-process state was rejected by Team Lead
due to horizontal scaling requirements)
- Retry uses exponential backoff with jitter; max 5 retries before DLQ
(jitter implementation is in src/queue/backoff.ts - tested and finalized)
**Open architectural questions (need Dev B input):**
- Dead-letter queue monitoring: alerting threshold is undefined.
Dev A flagged this as a product decision needed before shipping.
Don't implement a default - escalate to PM.
- Job schema versioning: if job payload format changes, how do we
handle in-flight jobs? No decision made. Needs discussion.
**Current implementation status:**
- COMPLETE: Priority queue, worker pool, basic retry logic
- IN PROGRESS: Dead-letter queue (src/queue/DeadLetterQueue.ts) -
storage is done, retrieval API is stubbed, not implemented
- NOT STARTED: Monitoring hooks, admin API, documentation
**Warnings:**
- src/queue/WorkerPool.ts line 112 has a known memory leak under
sustained high load. Dev A left a TODO. It's not causing test failures
but will matter in production. Fix before shipping.
- The integration test at tests/integration/full-pipeline.test.ts is
currently skipped - it requires a live Redis instance. Don't unskip
without the Redis dev container running or it will appear to pass on stubs.The cost comparison. Developer B opens this payload as their first message. The session starts with architectural decisions visible, open questions flagged, and implementation status mapped. Developer B doesn't reimplement the priority queue as a sorted list. The hand-off takes three minutes; without it, it takes 40.
The rehydration payload is the artifact you build at the end of a session. It's the structured record of what the session learned, formatted to survive the boundary between sessions and restore working context in minutes rather than hours.
Stop starting each session from scratch. The pre-session input side of this is what we built HiveTrail Mesh for: assemble the docs, files, and prompt fragments specific to the task - with a token budget and secret scanning - and copy a clean payload to your clipboard before the session starts. The rehydration payload is what you build at the other end.
Together, they're the discipline that turns long-running AI coding from frustrating into routine. Pre-session, you give the model the right external context. Post-session, you preserve the right internal state. The session boundary stops being a place where work disappears and starts being a checkpoint you control.
On this page
AI coding sessions degrade for three compounding reasons. First, mid-session compaction is lossy - when the context window fills, the agent summarizes, and summaries preserve the broad shape of the task while losing specific reasoning and ruled-out approaches. Second, performance degrades on very long contexts even without compaction - LLMs systematically attend less reliably to information positioned in the middle of a large context window. Third, and most impactful for multi-day tasks: session-to-session restarts start cold. The model's working understanding of your specific task doesn't persist between sessions. Each restart loses the accumulated knowledge from every previous session unless you engineer a deliberate handoff.
A rehydration payload is a structured, human-curated artifact that captures the transient state of an AI coding session: decisions made and why, decisions deferred, the current goal, the next concrete action, files in play, and explicit flags for things the previous session surfaced as important. It's designed to be compact enough to paste as the first message of the next session and precise enough to restore working context without replaying the full transcript. The name distinguishes it from a session log (which records) and a progress file (which tracks): a rehydration payload restores.
Follow the five-step protocol in this post. Before ending a session, ask the agent to produce a rehydration payload covering decisions made, decisions deferred, the next action, and any "don't forget" flags. Review and edit it - human curation is what makes it signal rather than noise. Save it as ./scratch/session-N.md. When resuming, paste the payload as the first message of the new session (not the system prompt). Ask the agent to confirm its understanding before starting work. This process takes under seven minutes total and prevents the 20-minute re-orientation cost that compounds over long tasks.
/compact is an in-session, automated operation: Claude Code summarizes the conversation history to free up context window space when the session is near its limit. It operates on what's already in the context and runs when needed. A rehydration payload is a between-session, human-authored artifact: it captures the session's learned state in a form designed to be handed to the next session. /compact keeps a running session alive; a rehydration payload makes the next session capable. They address different surfaces. You need both if you're running long tasks.
Different tools for different things. MEMORY.md is for stable, project-wide context: architectural conventions, tooling preferences, team norms. It persists across all sessions on a project. A session log or rehydration payload is for transient, task-specific state: the working decisions from the current task, which change session to session. Stable preferences belong in stable places; session-specific state belongs in session-specific artifacts. Putting task-specific decisions in MEMORY.md bloats it and creates stale state; putting project conventions in a session payload means restating them every session. Keep the layers separate.
There's no clean threshold, because quality degradation is a function of task complexity, not session length in isolation. The practical signal is compaction: once /compact fires, you've entered a regime where context fidelity is degraded and the session is managing information loss rather than accumulating state. For complex tasks - multi-file changes, interleaved decisions, long reasoning chains - the useful session life is often shorter than you'd expect, sometimes as little as 60–90 minutes of substantive work before compaction starts affecting quality. The right mental model isn't "sessions can run X hours" - it's "sessions should end with a deliberate handoff, and the handoff is better done before compaction than after."
A practitioner's read on Anthropic's 2026 Agentic Coding Trends Report - the eight trends, the context problem underneath them, and what to do about it.
Read more about We Read Anthropic's 2026 Agentic Coding Trends Report. Here's What It Actually Means for Engineering Teams.
Read the Anthropic context engineering guide 2026 but stuck on implementation? Translate its four pillars into a concrete checklist for your next LLM session.
Read more about Anthropic Context Engineering Guide 2026: A Field Manual
A plain-English guide to Agentic Context Engineering (ACE). Learn how this evolving playbook framework prevents context collapse in self-improving AI agents.
Read more about Agentic Context Engineering (ACE) Explained: How Evolving Playbooks Fix Context Collapse