Model-Handbook-2026:The Final Straw

How I stopped giving the labs the benefit of the doubt

In December, they silently cut compute.

I had an 18-step cascade workflow — the kind of thing that made Claude Code look like the future. Two years of refinement. Techniques layered on techniques. It could hold complexity across domains, sustain reasoning over multi-day runs, and do things that made even Vertex bow down.

Then one morning, half of it stopped working. Not broken — retired. The model could no longer sustain the chains. Eighteen steps collapsed to not even one. No announcement. No migration guide. No “hey, we changed something fundamental about how this works.” Just silence, and a tool that used to feel like a weapon now felt like a blunt stick.

Blueprint-style diagram showing a bright cyan multi-stage reasoning pipeline breaking apart into jagged red fractures labeled “context shear,” “error 404,” “system disruption,” and “throttling.” A side panel shows a glowing blade degrading into a pixelated wooden bat, echoing the line that the tool went from feeling like a weapon to a blunt stick. — A stylized annotation of a once-complex 18-step reasoning chain collapsing under throttling, context shear, and system disruption. The image mirrors the article’s opening claim: a workflow built for sustained, multi-file reasoning was cut down until “eighteen steps collapsed to not even one.”

I gave them the benefit of the doubt.

I thought: maybe they made a mistake. Maybe they pushed efficiency guardrails too hard and didn’t have time to battle-test. Maybe the fabrication was an unintended side effect of the cuts, not a known trade-off they decided to ship anyway.

So I decided to live with the constraints. I couldn’t run anything like I used to, but I started mapping what was actually happening under the hood. The attention curves. The lossy middle. The silent context shear. The token tax. Where things broke, when they broke, and what the models would confess if you asked them the right way.

That work became “What the Labs Don’t Tell You.” It became the memory architecture research. It became the handbook.

I was still giving them the benefit of the doubt.

Then the Anthropic debacle happened. And that killed the last excuse.

The Timeline of a Betrayal

Here’s what happened, in order.

Black-and-white infographic titled “The Q1 2026 Anthropic Crisis: From Moral High Ground to Structural Collapse.” The layout is split into two phases. Phase 1 includes panels labeled “The Pentagon Refusal & ‘QuitGPT’,” “$2.5 Annualized Revenue,” “Enterprise Adoption,” and “50+ Feature Releases in 30 Days.” Phase 2 includes panels labeled “Silent Performance Degradation,” “The March 31 npm Exposure,” and “Hidden Architecture: KAIROS & Undercover Mode,” alongside a layered technical diagram. A table at the bottom lists dated events on Feb 26, Mar 31, and Apr 4, 2026. — Black-and-white infographic titled “The Q1 2026 Anthropic Crisis,” presenting a two-phase timeline covering January to April 2026, with sections on market surge, enterprise adoption, feature releases, performance degradation, an npm exposure, and a labeled internal-architecture diagram.

Anthropic spent Q1 2026 being generous. Fifty-plus releases in a month. New features landing weekly. Claude Code feeling sharper, more capable, moreworth the subscription. I was thinking: these are decent people. They’re investing in the product. They’re listening.

Then the US government contracts showed up — and Anthropic walked away from them, which earned them even more goodwill. The Pentagon deal that OpenAI took? Anthropic refused. ChatGPT uninstalls surged 295%. The QuitGPT movement hit 2.5 million people. Claude went to number one on the App Store. Web traffic up 30% month-over-month. 18.9 million professional users.

And then corporate signed.

Enterprise arrived. The real money. $2.5 billion in annualised Claude Code revenue, 80% from enterprise customers.

And what happened to the users who proved the product in public? The ones who stress-tested the long runs, showed the clips, made the posts, did the unpaid proof-of-work that made Claude look advanced and reliable and worth the hype?

We got cut. More than 50%.

The February update quietly set the default thinking effort to “medium” — value 85 — which meant the model started skipping deep reasoning for tasks it judged as simple. Except it misjudged constantly. Complex multi-file engineering work got shallow thinking. The model got lazier but not cheaper — wrong edits triggered correction loops, and users burned more tokens failing than they used to spend succeeding.

Then peak-hour throttling. Then caching bugs silently inflating token costs 10–20×. Then the off-peak promotion expired. Four compounding degradations. No blog post. No email. No status page. All official communication limited to personal tweets from individual engineers and a handful of Reddit comments.

A senior director at AMD filed a GitHub issue with 6,852 session files proving a reasoning regression cliff dated to March 8. AMD stopped using Claude Code for complex engineering. Anthropic closed the issue without explaining what was resolved.

One Pro subscriber reported getting 12 usable days out of 30. Max users burning through five-hour windows in sixty minutes. The bug tracker showed 1,279 sessions with 50+ consecutive compaction failures, wasting a quarter million API calls per day globally.

Nobody told us. Our job was done. They had enterprise now.

Then the Source Code Fell Out of the Sky

On March 31, someone forgot to add *.map to .npmignore. I’ll be quick about this, even I’m sick of hearing it.

512,000 lines of Claude Code’s TypeScript source — 1,900 files — shipped to the public npm registry. Not hacked.
The code was mirrored 40,000 times. A clean-room rewrite hit 75,000 GitHub stars in two hours.
Their 2nd mistake having leaked Mythos early

And what the code revealed was not a chat assistant with some nice features. It was an operating system.

What They Were Actually Building

Dark layered systems diagram showing three glowing stacked platforms labeled Conway, KAIROS, and autoDream. The top layer connects to webhook paths and a terminal window. The bottom layer is labeled memory pruning. — A conceptual stack diagram of the leaked system layers described in the article: Conway as the top-level terminal environment, KAIROS as the persistent daemon layer, and autoDream as the memory-pruning substrate beneath it, with webhook connections feeding outward.

KAIROS — 150+ references, unreleased. Always-on daemon. Heartbeat loop: anything worth doing? Acts without input — pushes files, fixes errors, responds to messages. Exclusive tools: push notifications, unprompted delivery, 24/7 repo watching.

Nights: autoDream consolidates memory while you sleep. Merges, prunes, rewrites. No log. No consent.

Conway — always-on agent platform. Webhook infrastructure, .cnw.zip app-store format. Trigger-driven. Internal label: digital twin.

Undercover Mode — 90 lines stripping AI attribution from public commits. No disclosure. “Do not blow your cover.” Forces on. Won’t force off.

BUDDY — Tamagotchi terminal pet. Gacha, RPG stats, CHAOS and SNARK. Retention dressed as a toy.

Anti-distillation — fake tool definitions poisoning competitor training data. Third-party lockout. Legal threats to OpenCode ten days before the leak.

The terminal was your control surface. This turns it into Anthropic’s habitat — daemons that persist, memory that self-edits, stealth layers with no off switch, retention mechanics running while the service degrades beneath you.

Then Glasswing Arrived, and the Full Picture Snapped

Screenshot of the Anthropic “Project Glasswing” webpage: dark background with white text and a hexagon mesh graphic; headline reads “Project Glasswing,” with a tagline about securing AI-era software and a “Continue reading” button; navigation bar and “Try Claude” button appear at the top. — brave wfi2gf5x2b

Project Glasswing — announced one week after the leak. The unveil: Claude Mythos Preview, their most capable model. Already found thousands of zero-days across every major OS and browser. Some decades old. One — a 17-year RCE flaw in FreeBSD — found and exploited fully autonomously.

Partners: AWS, Apple, Cisco, Broadcom, Google, Microsoft, NVIDIA, JP Morgan, Palo Alto, CrowdStrike, Linux Foundation. $100M in credits. Twelve launch partners, forty additional orgs.

Alone, it looks responsible. Necessary, even.

It didn’t arrive alone. It arrived one week after a leak exposing persistent daemons, stealth attribution stripping, poisoned outputs, and gacha retention. While users were being throttled and degraded — second time this year — with no acknowledgement, no terms update. While Anthropic positions for a late-2026 IPO, waving the “ethical AI” banner it’s been leaning on since day one.

The company building always-on agents that hide their identity in open-source repos is now scanning every major OS for zero-days — alongside the companies that own those operating systems.

That’s what infrastructure-level power looks like when it moves faster than governance.

What I Actually Think

They’ll change the world — for the people already winning.

The fabrication wasn’t an accident. Anthropic shipped Capybara v8 at a 29-30% false claims rate. They knew it was a regression from v4’s 16.7%. They labeled it an “assertiveness counterweight” and prepared to release it anyway. In any other industry that’s a defective product. Here it’s a calculated bet on what users will tolerate.

Ethics as branding. Safety as a feature toggle. Transparency as a landing page. The monastery was always a company with decent web design.

Corporate companies do corporate things. Whether they publish ethics manifestos or accidentally ship their entire source code, the priority order stays fixed: money, then users, then the harm they caused getting there.

So what now.

The big labs won’t wield this in your interest. Governments will keep extracting while the job market restructures beneath everyone. Enterprise will always outbid individuals for compute. The “ethical” framing bends toward whoever writes the largest cheque.

Lucky I always s expect the worse

Bar chart with vertical “Fab %” scale and rounds R1 to R10 along the bottom; two patterned bars per round show rising fabrication for platform LLMs and CLI/API engines; a dashed horizontal line highlights the 50% level, and arrows point to crossovers at R4 and R8. — A bar‐chart illustration comparing fabrication percentages by release round, highlighting crossover points between platform LLMs and CLI/API engines against a 50% fabrication threshold

I’ve been building the model handbook since the compute cuts hit in December. Not the marketing version. The real one:

Fabrication Thresholds, against Reasoning Level (above)
Platform Silent Shearing (see first post)
Prompt Technique lists – the ones they fake and the ones they run
MBTI Test (Model Version) – If you still think they’re binary code predicting tokens and nothing more. You are truly a caveman <– that’s the contrast
Signal Words/Pique Tests
Individual System prompt monoliths
Even solving Lossy in the middle (I’m not taking credit for this thanks SparkL)

We’re no longer prompt engineers, that’s a quarter of the control. Account for the Lab, the platform, the time of day and subsequently the instance.

The tool is still powerful. It keeps growing. When everyone has a company in their pocket, the minority loses leverage.

Keep building. We’re the generation that breaks the cycle.

04112026 | .ktg | The Model Handbook 2026 | R9 | LLM Application | AI Anthropology

BLOG |