This Week in AI: The Asset Nobody Wanted, Anthropic's Accidental Confession, and the Benchmark That Should Embarrass Everyone

Two stories this week said the quiet part loud. OpenAI shut down Sora — and in doing so, accidentally answered a question nobody thought to ask. Anthropic published a blog post about their most powerful model yet, panicked, and deleted it — but not fast enough. Meanwhile, the week's most honest piece of research quietly confirmed that the frontier models everyone is calling near-AGI can't do something a reasonably alert ten-year-old manages without breaking a sweat. None of these were the headlines. All of them were the story.
The Asset Nobody Wanted
Sora is dead. The app is gone. The API is being wound down. The video generation product that OpenAI launched with enormous fanfare, signed a Disney deal around, and used as proof of their creative AI ambitions, has been quietly switched off. The official reason is focus — reallocation of compute toward coding, enterprise, and what OpenAI now calls "real-world physical tasks."
That explanation is plausible. It is not complete.
The Question Nobody Asked
When a company shuts down a product, the standard playbook has options. You wind it down. You fold it back into the core product. You spin it out. Or — if it has value — you sell it.
OpenAI did not sell Sora.
This is worth sitting with. Sora was a brand-name AI product. It had millions of users. It had a functioning API with paying developers. It had a consumer app with genuine traction. It had name recognition that most AI products would spend years and hundreds of millions of marketing dollars to acquire. There are dozens of companies — media companies, gaming companies, enterprise software companies, well-capitalised startups — actively looking for exactly this kind of asset.
Nobody bought it. Or if they were offered it, nobody wanted it at a price OpenAI would accept. Which, in a market this hungry for AI assets, suggests the price they would accept and the price the market would pay were very far apart.
What Zero Implies
If Sora's market value is effectively zero — or negative, once you factor in the compute costs of running it — then some uncomfortable questions follow.
OpenAI has spent an enormous amount on research and development. That R&D gets capitalised and sits on the balance sheet as an intangible asset. The accounting assumption baked into that treatment is that the asset has a useful life long enough to justify carrying it at cost rather than expensing it immediately. For traditional software, that's reasonable — a product built in 2022 might still generate revenue in 2027.
AI models don't work like that. Sora was state-of-the-art roughly eight months ago. It's now being turned off. The model that replaced it — Sora 2 — is already being superseded by open-source alternatives that cost nothing to run. The useful life of an AI model, measured in commercial terms, appears to be somewhere between six and eighteen months before it's either obsolete or commoditised.
If that's the depreciation schedule, the R&D capitalisation assumptions across the AI industry are wrong. Not slightly wrong. Materially wrong.
The Balance Sheet Problem
Walk it back further. If the current video model has no recoverable market value, what is the carrying value of GPT-4? Of GPT-4o? Of every prior model version sitting on OpenAI's books as a capitalised intangible? These are not hypothetical assets — they represent real R&D expenditure that was capitalised rather than expensed. If they have no market value, they should be written down. If they're being written down, that hits equity. If equity takes that hit, the $300 billion valuation narrative becomes considerably harder to sustain.
This isn't unique to OpenAI. Every AI company capitalising R&D on models that depreciate faster than their accounting schedules assume has the same problem. OpenAI just made it visible by turning off a product in plain sight and declining to find a buyer.
The Disney Footnote
One more detail worth noting. OpenAI signed a major deal with Disney — one of the most IP-protective companies on the planet — giving Disney access to Sora in exchange for allowing OpenAI users to generate Disney content. When Sora was shut down, Disney exited the deal immediately. OpenAI apparently gave them no notice. People at both companies described being blindsided.
A company that genuinely believed it was shutting down a product with recoverable value would have managed the stakeholder communications more carefully. The abruptness suggests the decision was made quickly, under financial pressure, and with limited appetite for the conversation that a sale process would have required.
The asset had no queue of buyers. The partner was given no warning. Make of that what you will.
Anthropic Won't Stop Shipping
While OpenAI was consolidating, Anthropic was doing the opposite. The product tracking site Product Compass documented 74 releases in 52 days — roughly one and a half meaningful product updates every day. Most of them were for developers. Several of them matter to everyone.
Computer Use Goes Live
The feature that lets Claude control your computer — mouse, keyboard, click, scroll, open applications, navigate interfaces — shipped this week for paid Co-work and Claude Code users. In testing it navigated to a specific feature inside DaVinci Resolve without any human input. It took five minutes to do what would take a human ten seconds.
That sounds like a criticism. It isn't. The point isn't speed — it's that you don't have to be there. Combine computer use with Dispatch — the feature that lets you message Claude from your phone and collect finished work when you return — and you have a system that executes tasks on your physical computer while you're nowhere near it. The latency is a current limitation. The capability is permanent.
Auto Mode for Claude Code
The most quietly celebrated feature of the week. Claude Code used to pause and ask permission before running terminal commands, browsing the web, or executing anything that touched your system. Now it doesn't — at least not for the benign operations that made up the majority of interruptions. You tell it to build something, you walk away, and it builds it. Every Claude Code user has been waiting for this since day one.
The Leak They Didn't Mean to Publish
Anthropic accidentally left an unprotected blog post on their website. It was found, copied, and circulated before it was taken down. The post described a new tier of model called Claude Mythos — larger and more capable than Opus, with dramatically higher scores on coding, academic reasoning, and cybersecurity benchmarks.
The language they used about their own model is worth reading carefully. The post warned that Mythos "can exploit vulnerabilities in ways that far outpace the efforts of defenders" and that Anthropic wanted to "understand the model's potential near-term risks in the realm of cybersecurity" before release. They also noted it would be "very expensive for us to serve and very expensive for customers to use."
This is a company warning, in its own words, that its next model may be dangerous to release. That is either responsible caution or an extraordinary admission, depending on your priors — and possibly both simultaneously. Take it with appropriate salt given the circumstances, but note the language. Labs don't usually write that way about their own products unless they mean it.
The Pentagon Situation
A US federal judge halted one of the two Trump administration designations of Anthropic as a supply chain risk, citing free speech violations. A compliance report is due by April 6th. The second designation remains legally in effect. Anthropic has won a battle. The war has another front.
The Benchmark That Should Embarrass Everyone
ARC AGI 3 shipped this week. It is a benchmark designed to answer a specific question: can frontier AI models learn new things in real time, adapt to unfamiliar environments, and figure out rules they've never been told — the way humans do instinctively.
The answer, as of this week, is no.
The benchmark presents interactive puzzles where the rules are never explained. You explore, observe consequences, form hypotheses, and adapt. It's the kind of thing humans find mildly engaging on a Tuesday afternoon. At release, the four frontier models tested — Gemini 3.1 Pro Preview (0.37%), GPT 5.4 High (0.26%), Opus 4.6 Max (0.25%), and Grok-4.20 (0.00%) — all scored below 1%. Humans score 100%.
Grok scoring exactly zero is worth pausing on. Not below 1%. Not below 0.5%. Zero. On a task humans clear without exception.
This is not a narrow gap. This is not a benchmark where AI is catching up. This is a capability — real-time learning, goal inference in novel environments, genuine adaptation — that today's models simply do not have.
At a moment when the phrase "we are close to AGI" appears in press releases with the regularity of quarterly earnings guidance, this benchmark is worth bookmarking. The models are extraordinarily capable within their training distribution. Outside it, in situations they haven't seen, following rules they haven't been told — they are lost. Humans are not.
That gap is real. It's large. And it's useful to know about before you build a workflow that assumes it doesn't exist.
Wikipedia and the Ouroboros Problem
Wikipedia banned AI-generated articles this week. Editors can still use AI for copy editing and translation. They cannot use it to generate content that becomes part of the encyclopaedia.
It's the right call, and the reasoning is worth understanding beyond the headline.
Large language models were trained, in significant part, on Wikipedia. Wikipedia is one of the highest-quality, most consistently structured knowledge bases available in text form, which is precisely why it was so valuable as training data. If AI models now generate Wikipedia content, and future AI models train on that content, the information loop closes on itself. Each generation of models trains on output from the previous generation, with no new human knowledge entering the system. Quality degrades in ways that are difficult to detect and impossible to reverse once the contamination is widespread.
The technical term for this is model collapse. Wikipedia's editors understood it without needing the technical term. They just decided not to participate in it.
There is a broader point here that connects to the Sora story. The value in AI systems — the thing that makes them actually useful — derives from the quality and originality of the data they were trained on. When that data source starts feeding on itself, the value erodes. Slowly at first. Then faster. Wikipedia noticed early enough to do something about it. The question is which other data sources will notice — and which won't until it's too late.
The Open Source Stack Keeps Winning
Six open-source releases this week. Collectively, they matter more than most of the closed model announcements.
- GLM 5.1 (ZAI): Near Opus 4.6 performance on agentic coding benchmarks. Faster. Cheaper. Higher usage limits. Already available via API and directly integrable into Claude Code and OpenClaw, following published instructions. Open-source release planned. If your agentic coding workflows are running on Opus, run a comparison this week. The cost differential is significant enough to justify the test.
- Cohere Transcribe: The new state-of-the-art open-source transcription model. 2 billion parameters, 4GB, Apache 2 licence — minimal restrictions, runs on consumer hardware. Beats Whisper and ElevenLabs on independent benchmarks. Supports long-form transcription up to and beyond 55-minute recordings. Free Hugging Face space available for immediate testing. If you're paying for transcription, stop and test this first.
- ComfyUI Dynamic VRAM: A significant quality-of-life update for anyone running image or video generation locally. Dynamic VRAM loads and unloads model components only when needed rather than holding everything in memory simultaneously. The practical result: larger models on smaller GPUs, fewer out-of-memory crashes, and generation times roughly halved in testing. Nvidia GPUs on Windows and Linux only for now — Mac not yet supported. Update ComfyUI before you evaluate any new model this week.
- Google TurboQuant: A compression technique — not a model — that shrinks AI memory usage by 6x and speeds up data retrieval by 8x while maintaining performance on long-context tasks. The implications run beyond Google's own systems. As this technique and its equivalents propagate through the open-source stack, the hardware requirements for running capable models will continue to fall. Watch where this lands in the next round of open-source releases.
- Prism Audio: The best open-source video-to-audio synchronisation model available. Takes silent video and generates realistic, precisely-timed sound effects matched to the on-screen action. 518 million parameters, 6GB, runs on consumer GPUs. Benchmarks ahead of every comparable open-source alternative at a fraction of the model size. For any video production workflow that needs sound design, test this before booking session time.
- Da Vinci Magi Human: A unified 15 billion parameter model that generates video and audio natively in a single pipeline — no stitching, no separate audio model. Blind tests give it a 60% win rate over LTX 2.3. The caveat is significant: the distilled model alone is 61GB. This is not running on consumer hardware yet. Note it, watch for quantised versions, and understand what it does before the hardware requirement comes down — which it will.
Google's Quiet Week
Google didn't headline anything this week. They shipped into everything.
- Gemini 3.1 Flash Live is now rolling into Search, the Gemini app, AI Studio, and enterprise APIs. The multimodal live conversation model — screen share, webcam, real-time interaction — is available to more users than any comparable product from any other lab. It remains underused and underappreciated. If you haven't tested screen sharing with a live AI model for walkthrough and tutorial use cases, do it this week. It is considerably more capable than its coverage suggests.
- Memory migration landed quietly — import your conversation history, preferences, and memory from ChatGPT or Claude directly into Gemini. The practical barrier to switching just got lower. Google watched Anthropic capitalise on OpenAI's Pentagon drama with the same feature two months ago and decided they wanted the same option available. Smart.
- Lyria 3 Pro extended music generation to three minutes with structural controls — intros, verses, choruses, bridges. Available across AI Studio, Gemini, Vertex, and Google Vids. The gap between Lyria and Suno is narrowing. Worth a test if music generation is part of your content production workflow.
What Agencies Do Next
- Read the Sora balance sheet argument and share it with your finance team. If you have clients who are AI companies, or if you're advising on AI investments, the question of how capitalised R&D is being valued against model depreciation timelines is one they should be asking their auditors. It will matter more in the next 12 months than most people currently expect.
- Integrate GLM 5.1 into your OpenClaw or Claude Code setup this week. The instructions are published. The performance is near Opus 4.6. The cost is substantially lower. This is a direct swap with a measurable financial impact on your API spend.
- Replace your transcription workflow with Cohere Transcribe before your next project. 4GB, free, Apache 2, beats Whisper. The decision should take one test to confirm and five minutes to implement.
- Update ComfyUI before you evaluate any new image or video model. Dynamic VRAM changes what your existing hardware can run. Know the new ceiling before you decide you need new hardware.
- Enable Claude Computer Use with Dispatch on one delegated task this week. It's slow. Do it anyway. The workflow habit of delegating computer tasks to run while you're away from your desk compounds over time. Start building it now while the capability is new, not after everyone else has a six-month head start.
Bangkok8 AI: We'll show you what the numbers actually say — so you're not building on someone else's valuation assumptions.