Anyone who has spent a full workday inside Claude Code, Cursor, or a similar AI coding assistant knows the pattern. The first hour is a step change in productivity. The session moves fast, the suggestions are sharp, the model is reading your code like it was paying attention. Around hour two, something shifts. The responses get slower. The suggestions get vaguer. The model starts repeating itself, forgetting decisions you made an hour ago, and offering refactors that contradict architecture choices already made. By hour three, you are doing more correction than coding. The cause is not your machine. The cause is how context windows actually scale with usage, and most developers do not know what to do about it.
A context window is the working memory of the model. Claude 4.5 Sonnet has roughly 200,000 tokens of context. Gemini 2.0 has 1 million. GPT-5 sits in the middle. Those numbers sound abundant. They are not, because every message in a session adds to the cumulative context, including tool call results, code snippets the model reads, previous reasoning, and the model's own prior responses. A coding session that involves reading three medium-sized files, running 15 tool calls, and going through 40 messages can hit 50,000 to 80,000 tokens of accumulated context within 90 minutes. After that, performance degrades in two specific ways.
The first degradation is attention dilution. Transformer models compute attention across every token in the context window, but attention is finite. The relative weight of any single token decreases as more tokens are added. When you ask a question at the start of a session, the model is paying meaningful attention to your question. When you ask the same question 80 messages in, the model is splitting its attention across the question, the 80 prior messages, all the file contents, and all its prior reasoning. The signal-to-noise ratio drops. The model still produces an answer, but the answer is less specifically responsive to your current question and more shaped by the cumulative weight of the entire session.
The second degradation is retrieval interference. Models have a known weakness called the lost-in-the-middle problem, documented in research from Stanford and Anthropic in 2023 and reconfirmed in 2025 benchmarks. Information placed in the middle of a long context window is recalled less accurately than information at the beginning or end. As a session grows, the architectural decisions you made early in the session migrate from the start of the context (where they were salient) to the middle (where they get fuzzy). The model still has the information. It cannot use it as reliably. You see this when the model confidently suggests a pattern that violates a constraint you set 30 messages ago.
There is a third factor most users miss. The model's own prior reasoning takes up context. Every long response the model gave you is stored and weighed in subsequent attention computations. By hour two of a coding session, perhaps 30 to 40 percent of your context window is occupied by the model's own previous thinking. That self-context is useful early in a session for maintaining continuity. By hour three, it functions more like noise. The model is partly responding to your code and partly responding to its own earlier responses. The output quality suffers in proportion to how much of the context is the model talking to itself.
There are practical fixes. The first is to start a new session every 90 minutes whether you feel you need to or not. Begin the new session with a tight context-setting message that summarizes the architecture decisions, the current task, and the relevant file paths. Do not paste the full files unless necessary. The second is to use the model's compaction features (Cursor has /reset, Claude Code has explicit context compaction commands) to clear out accumulated noise without losing the thread. The third is to externalize state. Write architecture decisions, naming conventions, and project constraints into a markdown file the model reads at the start of each session rather than relying on the model to remember.
The fourth fix is task partitioning. Long debugging sessions degrade faster than long building sessions, because debugging requires the model to hold many partial hypotheses simultaneously. If you find yourself an hour into a debugging session with no resolution, end the session and restart with a clean context that contains only the symptoms, the relevant code, and your current best hypothesis. The temptation to keep grinding the same session is strong because the sunk cost feels real. The math says otherwise. A fresh session at the 90-minute mark routinely outperforms a continued one at the three-hour mark on the same problem.
The architecture of these tools will likely improve, and the larger context windows in Gemini 2.0 and the upcoming Claude 5 do reduce the problem for raw context capacity. They do not eliminate attention dilution. They do not eliminate lost-in-the-middle. They give you more room before degradation starts, but the degradation curve still exists. Developers who internalize this and structure their work around the session economics, rather than treating the AI as a continuously-available oracle, ship measurably more. Most operators I work with have rebuilt their AI coding workflow around 60 to 90 minute focused sessions with explicit context restarts in between. Their output is roughly twice what it was when they were running marathon sessions with the same tools.
If your AI coding sessions feel like they are getting worse, the answer is rarely a better tool. It is a discipline shift around how you structure context. Start sessions clean. Externalize architecture into a file. Compact aggressively. Reset every 90 minutes. The tools were designed for sprints, not marathons, and the sooner you treat them that way, the more you will get out of them. The honest version of the productivity story in 2026 is that the developers winning with AI are the ones managing context, not just prompting better.




