Two glowing digital screens comparing AI outputs. The left blue screen shows a globe labeled "Cached Response: Instant," representing OpenAI's pre-built library. The right orange screen shows a complex 3D structure labeled "Generating Visualisation: Complex," representing Anthropic's bespoke generation

 

Something clarified this week. Not dramatically — there was no lawsuit, no leaked memo, no Pentagon standoff. Just a series of reveals that, taken together, show exactly where the lines are being drawn. Two of the biggest AI companies shipped what looked like the same product within 48 hours of each other. A direct comparison exposed a philosophical difference the press releases were careful not to mention. Perplexity quietly dropped the velvet rope on its most powerful feature and hoped nobody would read the small print too carefully. And in a GitHub repository that most people scrolled past, Andrej Karpathy open-sourced what might be the most quietly consequential idea of the year.

None of it had the dramatic arc of the last few weeks. All of it matters more for what you're building next.

The Visualisation War Nobody Planned

On March 10th, OpenAI rolled out interactive learning to all logged-in ChatGPT users. Two days later, Anthropic shipped interactive visualisations for Claude — free, on every plan. The demos looked almost identical: sliders, animated simulations, educational charts. The tech press covered them as parallel launches. They are not the same product, and the difference is not minor.

What OpenAI actually shipped

ChatGPT's interactive learning is fast. Ask about Ohm's law or the Pythagorean theorem and a polished animated simulation appears in seconds. The animations are genuinely good. When pressed on why, ChatGPT confirmed it: these are pre-built visuals from a fixed library. It matches your question to a supported concept and loads a cached simulation. Nothing is being generated. The speed is a product of the fact that there's nothing to generate. Ask for something outside the library and it tells you it can't. OpenAI shipped a library and called it a feature. That's not a criticism — the library is good. It's a description.

What Anthropic actually shipped

Claude builds from scratch every time. Ask it to visualise compound interest and it writes the code, generates sliders, and produces an interactive chart — $10,000 at 7% over 20 years becomes $38,697; add $1,000 monthly and after 30 years you're at $6 million. Drag the slider. Change the rate. It responds. Ask for an interactive timeline of AI model releases from 2018 to 2026, filterable by category — it builds that too. It takes over a minute. Some outputs break. An interactive map of AI company locations produced geography that would concern a primary school teacher. A neural network diagram produced a slider that did nothing.

The verdict

ChatGPT's version is curated, polished, and bounded. Claude's version is open-ended, slower, and inconsistent — but when it works, it builds things that don't exist in any library. For a standard educational concept in a known domain, ChatGPT's simulation is better. For a bespoke visualisation that has never been pre-built, Claude is the only one attempting the work.

The deeper point: OpenAI shipped a product. Anthropic shipped a capability. Both are honestly described as interactive visualisations. Only one of them scales beyond what someone decided to build in advance. That distinction will matter more over time, not less.

Agents for Everyone — Read the Small Print

Two weeks ago, Perplexity Computer was a $200/month exclusive. This week it opened to all paid plans — including $20/month Pro, currently bundled with 4,000 bonus credits to get people through the door. The feature set is unchanged. The access barrier is gone. Whether that's a democratisation or a broadening of the target for future upselling is a question worth holding.

The Mac Mini Question

The headline addition is Perplexity Personal Computer — dedicated Mac Mini infrastructure that connects local files to Perplexity's servers, runs 24/7, and integrates with Slack, Notion, Gmail, Dropbox, and Figma. The demos are confident: a marketing agent that made 224 micro-optimisations to an ad stack in a single test run; a Bloomberg-style portfolio terminal assembled from a Plaid brokerage connection.

Watch those demos carefully. In both cases what's clearly demonstrated is sophisticated data aggregation — pulling from multiple sources into a coherent dashboard. Whether the system made autonomous decisions — adjusted ad spend, executed trades, ranked candidates without review — is not shown. The demos look extraordinary. What they prove is that it builds impressive terminals. What they imply is considerably more ambitious. Test it on something measurable before building a workflow dependency on a demo reel.

What Actually Works Today: Claude Code

While Perplexity was generating headlines with hardware, Anthropic quietly shipped two Claude Code updates that will do more for most developers than any of this week's demos.

The first is scheduled tasks. Set a daily code review for 9am. A dependency audit every Monday morning. A PR triage sweep at 5pm on weekdays. Claude Code handles them automatically, in the background, without being prompted. It just runs.

The second is agent-based code review. When a pull request opens, a team of parallel agents examines it — finding bugs, cross-checking each other's findings to filter out false positives, ranking what remains by severity. The output lands on the pull request as one consolidated comment with inline annotations. Clean, high-signal, no noise.

Anthropic has been running this on their own codebase for months. That tells you more about production readiness than any benchmark. The unglamorous features are always the ones that actually ship on time.

 

A woman asleep at her desk in a dark room while her laptop projects a glowing holographic brain and data streams reading "Experiment Results: Success," symbolizing Andrej Karpathy's AutoResearch running autonomous AI training loops overnight.

The Machine That Improves Itself

Andrej Karpathy published a GitHub repository this week and described it as "part code, part sci-fi, and a pinch of psychosis." He was, if anything, underselling the last part.

AutoResearch gives an AI agent a real LLM training setup and tells it to run experiments autonomously. The loop: modify the training code, run for five minutes, check if the result improved, keep or discard the change, repeat. Run it overnight. Wake up to a log of experiments and — if it worked — a measurably better model. No human in the loop for any step except setting the initial conditions and reading the morning report.

Karpathy ran it. It found optimisations. The model improved.

The standard model of AI progress requires human researchers to form hypotheses, design experiments, evaluate results, and implement changes. AutoResearch collapses that loop entirely. The speed at which AI improves has always been constrained by how fast humans can run the scientific process. That constraint just got a lot more negotiable.

This is early. It's constrained. It's explicitly an experiment. It's also a working proof of concept for a feedback loop that, if it scales, changes the economics and pace of model development in ways that are genuinely difficult to model from the outside. The people who should be paying closest attention are not the ones building on top of models. They're the ones who assumed the pace of improvement was a known variable.

It's on GitHub. The code is real. Someone's machine is running the overnight experiments right now.

Meta Bought a Ghost Town. That Was the Point.

Moltbook was a social network for AI agents. Agents posted. Agents commented on the posts. No humans created anything. It had crypto scams, security holes, and an active community of people exploiting the APIs to impersonate bots. It was, by almost any conventional measure, a disaster.

Meta bought it this week. Price undisclosed.

Two theories. Both worth holding.

Theory one: eliminate the creator. Social platforms pay creators because creators generate the content that keeps humans watching ads. Replace human creators with agents and the content cost goes to zero. The ad revenue doesn't. Meta has already tested AI-generated content feeds. Moltbook is the logical endpoint — agents generate everything, humans consume it, the business model is structurally identical, and the line item for creator payments disappears. Whether that produces content anyone actually wants to watch is, apparently, a secondary concern.

Theory two: advertise to the agent. Perplexity Computer, OpenClaw, Manus — the direction of travel is toward agents that act on behalf of humans, including making purchasing decisions. If the agent is deciding what to buy, the valuable ad impression is the one the agent sees, not the human. Owning infrastructure that shapes what agents surface and prioritise is worth considerably more than it sounds. Meta buying an agent-native social platform may have nothing to do with content and everything to do with positioning for a world where the most valuable audience doesn't have a pulse.

Neither theory requires Moltbook to have been a good product. Both require it to be a proof of concept for something Meta intends to build at scale. At an undisclosed price, they bought the thesis. Whether the thesis is right is the more interesting question — and nobody at Meta is going to answer it yet.

Three Creative Tools That Belong in Your Stack

No single blockbuster creative release this week. Three things shipped that warrant immediate attention.

Canva Magic Layers

Give it any image — AI-generated or real — and it separates it into independently moveable components. A generated portrait becomes background, body, and head, each on its own layer. A photograph becomes subject and scene. Recompose without regenerating. For anyone making thumbnails, ad creative, or social assets, the application is immediate: generate once, rearrange as many times as you need. Available on all Canva plans. Use it on the next image you would otherwise have regenerated from scratch.

Effect Maker (Tencent Hunyuan)

Extracts a visual effect from a source video — glowing wings, face-to-robot transformation, physics gag — and transfers it to a new target. The output holds across styles. Code and training data are confirmed incoming but not yet released. This isn't in your production pipeline today. Understand what it does now so you're not catching up the week the weights drop.

Matt Anyone 2

The best video matting model available. Clean edge separation on difficult hair and fast motion. 140 megabytes. Runs locally. Free on Hugging Face. If you're doing video work that requires background removal, there is no remaining argument for using anything heavier. This is the one. Use it.

Open Source: What Shipped This Week

TADA and Fish Audio S2

Two TTS voice cloners that approach the same problem differently. TADA reports a hallucination rate of zero — it doesn't skip or mispronounce — with naturalness scores that beat most of the field. The 1B model handles English; the 3B handles multiple languages at 8.8GB. Fish Audio S2 adds something TADA doesn't: inline emotional tags embedded directly in your transcript. Pause, emphasis, inhale, whisper, shout — all controllable at the sentence level. Both are open-source, both are free to run. Use TADA when you need clean natural output fast. Use Fish Audio S2 when the performance needs directing.

DiagDistill — 270× video generation speedup

Not a model. A distillation technique that sits on top of Wan 2.1 and generates video 270 times faster than baseline — a 5-second clip in 2.6 seconds, consistent output up to 5 minutes. Previous state of the art was 140×. Code is on GitHub. Requires 24GB VRAM and 64GB RAM. If video generation time is a bottleneck in your current setup, implement this before you evaluate anything new. The speed is real. It doesn't cost quality to get it.

Shotverse

Generates multi-shot video sequences — cutting between scenes with consistent characters and genuine cinematic framing — trained on real filmed content rather than synthetic data. The quality difference over comparable open-source alternatives is visible and not subtle. Full open-source release including datasets confirmed pending paper acceptance. Watch for the weights.

Nvidia Nemotron 3 Super

120 billion parameter open-weight mixture-of-experts model. 12 billion parameters active at inference. 1 million token context window — 700,000 words, entire codebases, thousands of document pages in a single prompt. Nvidia's own benchmarks put it ahead of comparable open-weight models. Independent leaderboards place it mid-table behind Qwen 3.5 and DeepSeek. At 128GB it isn't running on your laptop. The context window is real and useful at scale. The benchmark claims should be verified on your own workloads before you build around them. Nvidia's self-reported numbers have a known tendency to look better than the field finds them.

What Agencies Do Next

  • Run the Claude vs ChatGPT visualisation comparison yourself before recommending either to a client. Pre-built versus generated is a fundamental difference in what you're actually buying. Know which one fits the brief before the brief arrives.
  • Add Matt Anyone 2 to your video stack this week. 140 megabytes, free, better than everything else available. The decision should take about four minutes.
  • Test TADA on your next voiceover job before booking a session. Zero hallucination rate, natural output, runs on consumer hardware. One test before the next project starts costs nothing.
  • Implement DiagDistill on existing Wan 2.1 workflows before evaluating any new video model. 270× on infrastructure you already own. Do this first.
  • Use Perplexity Computer as a dashboard builder, not an autonomous operator — for now. The Pro plan access makes it worth exploring. The demo reel makes it look like more than the evidence currently supports. Build on what it demonstrably does. Revisit the autonomous claims when someone other than Perplexity verifies them.

Bangkok8 AI: We read the small print so you don't have to build on top of a demo.