This Week in AI: The AI That Acts While You Sleep, OpenAI's House of Cards, and the On-Device Trap

Three things happened this week that individually would each make a strong news story. Together they make an argument. A leaked codebase revealed that Anthropic has been quietly building AI that acts without being asked: proactive, persistent, and running in the background while you sleep. OpenAI raised the largest funding round in corporate history and buried the most interesting part of it in the final paragraphs. And Google released a genuinely impressive on-device AI model to universal enthusiasm, while nobody stopped to ask what happens when the model itself becomes the attack surface.
The direction of travel is identical across every major lab. The questions of who can afford to get there, and at what cost to users, are the ones not being asked loudly enough.
The AI That Acts While You Sleep
On March 31st, Anthropic accidentally shipped hundreds of thousands of lines of Claude Code's source code inside an npm package. A developer spotted it immediately. Within hours it had been mirrored, downloaded, and forked across dozens of GitHub repositories. Anthropic sent DMCA takedowns. By then, copies were everywhere.
The leak itself is not the story. What was inside it is.
Chyros and the post-prompting era
Buried in the source was a feature called Chyros, described in the code as an autonomous daemon mode. A daemon, for the non-technical, is a process that runs permanently in the background, waiting for conditions that trigger action. Chyros is not a chatbot feature. It is not a prompt-response loop. It is an always-on agent that operates whether you are at your computer or not.
Every few seconds, Chyros receives what the code calls a heartbeat. A prompt that asks, in effect: is there anything worth doing right now? If the answer is yes, it acts. It can fix errors in your code, respond to messages, update files, and run terminal commands. It does everything Claude Code can already do, except without you initiating any of it.
Beyond what Claude Code can already do, Chyros gets three capabilities that the standard tool does not. It can send push notifications directly to your phone or desktop when you are away from your terminal. It can deliver files it created without you asking for them. And it watches your GitHub repositories, reacting to pull requests and code changes on its own initiative. Regular Claude Code waits to be spoken to. Chyros taps you on the shoulder — and keeps daily logs of everything it noticed, everything it decided, and everything it did.
The practical implications are not subtle. Your website goes down at 2am. Chyros detects it, restarts the server, and sends you a notification. By the time you see it, the problem is resolved. A customer complaint arrives overnight. Chyros reads it, drafts and sends a reply, and logs what it did. You wake up to a resolved ticket. A typo has been live on your payment page for three days. Chyros spots it, fixes it, and flags the change.
This is not a feature. It is a different category of product. The distinction between a tool you use and infrastructure that runs on your behalf is the difference between a hammer and electricity. You do not prompt electricity.
What else the leak showed us
The three-layer memory architecture in the code tells us something important about how Anthropic thinks about context. At its core is a memory.md file: a lightweight index of pointers, not data. Claude Code never reads full conversation histories back into context. It searches for specific identifiers within them. The architecture is designed for scale and persistence, not individual sessions. This is not built for a chat interface. It is built for something that runs continuously.
The leak also confirmed more evidence of Capiara, the next-tier model above Opus that has appeared in leaked materials for the past two weeks, referred to interchangeably as Mythos in some internal references. And then there was Buddy: a Tamagotchi-style terminal pet with personality stats, almost certainly a planned April Fools release that got sidelined when the leak forced Anthropic's hand on timing.
The irony worth noting
The feature that exposed the entire leak was called undercover mode. This was a system prompt instructing Claude Code to behave like a human developer in public repositories, stripping out internal model names, unreleased version numbers, and any indication the code was AI-generated. The mode designed to hide Anthropic's internal work from public view was itself part of the source code that became public. You could not write a better metaphor for the week.
Where every lab is heading
OpenAI this week buried a super app announcement inside a fundraising press release. The direction of travel is the same as Chyros points to: AI that operates across all your tools without you switching between them. AI that understands intent and acts without being asked. Whether it is Anthropic's background daemon, OpenAI's unified agent platform, or the cron job architecture already running inside OpenClaw, every major player is building toward the same destination. AI falls into the background. It becomes infrastructure. The prompting era is ending.
OpenAI's House of Cards Gets Another Floor
OpenAI raised $122 billion this week at a valuation of $852 billion. It is the largest single fundraising round in corporate history. Microsoft participated. Revenue is running at $2 billion per month. By their own account, they are growing four times faster than Alphabet and Meta were at a comparable stage.
The number is real, probably. The questions it raises are absolutely real.
Last week's argument just got a price tag
The Wall Street Journal reported this week that Sora was losing approximately $1 million per day before it was shut down. We argued last week, without that number, that the decision not to sell Sora implied the asset had no recoverable market value, and that this had implications for how capitalised R&D is carried across the AI industry. The $1 million daily loss figure does not change the argument. It confirms it. A product losing $365 million a year, with no willing buyer, being carried on a balance sheet as a capitalised intangible, is precisely the scenario we described. The number just arrived a week late.
The announcement they buried
In the final paragraphs of OpenAI's fundraising blog post: "We're building a unified AI super app. Users do not want disconnected tools. They want a single system that can understand intent, take action, and operate across applications, data, and workflows. Our super app will bring together ChatGPT, Codex, browsing, and our broader agentic capabilities into one agent-first experience."
This is functionally what Anthropic already built. Claude chat, Claude Co-work, and Claude Code exist within a single application today. OpenAI is announcing as a future product what their primary competitor ships as a current one. The pattern of following rather than leading is becoming consistent.
The unit economics nobody wants to discuss
The pivot to agentic coding as OpenAI's growth engine deserves more scrutiny than it is receiving.
Anthropic is rate-limiting Claude Code for paying users. Google is managing capacity on their coding tools. Neither company has demonstrated a path to profit from agentic coding at scale, and both have significantly lower cost structures than OpenAI. In Google's case, an entire cloud infrastructure business helps absorb losses. The vibe coding wave that generated so much enthusiasm over the past six months has been a heavily subsidised experience. Users have been paying a fraction of the true inference cost. There is no evidence they will pay the economic cost when the subsidies reduce, and considerable evidence from comparable markets that they will not.
OpenAI is entering this fight late, at a valuation that requires extraordinary revenue growth to justify, against competitors who are better capitalised for a sustained infrastructure war and still not making money from it. The $2 billion monthly revenue figure is real. The path from there to a valuation implying multiples of that revenue, in a market where the core product is being commoditised from below by open-source alternatives every single week, is a great deal less clear than the fundraising announcement suggests.
The media company that contradicts the strategy
OpenAI acquired TBPN this week. The Technology Business Programming Network runs a daily live show covering AI and tech, with interviews of founders, CEOs, and industry figures. It is a good show. The acquisition is a puzzling one.
Two weeks ago, OpenAI's stated strategic direction was focus. No side quests. Sora cut. Adult mode cut. Concentrate on coding and enterprise. This week they bought a media company.
The editorial question is worth sitting with. Can a daily show credibly cover AI when its owner is the most powerful AI lab in the world? Can they interview Anthropic executives without it being awkward? Can they report critically on OpenAI product failures? The TBPN team are good at what they do. Whether they can keep doing it the same way is a different question.
Gemma 4 and the On-Device Trap
Google released Gemma 4 this week to near-universal enthusiasm, and the enthusiasm is warranted on the capability side. What is not being discussed is the risk side.
What Gemma 4 actually is
Four model sizes built for different environments. The E2B and E4B (roughly 2 and 4 billion effective parameters) are designed for phones and edge devices. The 26B MoE model uses a mixture-of-experts architecture where only 3.8 billion parameters are active at any time, making it far more efficient than the total parameter count suggests. The 31B Dense model maximises raw quality for higher-end hardware. All models are multimodal out of the box, handling text, image, and audio natively. Apache 2 licence. Over 140 languages. Context windows of 128K for the smaller models and 256K for the larger two.
The benchmark headline: Gemma 4 performs near Kimi K2.5 Thinking despite being 35 times smaller. It runs on a modern smartphone. It runs offline. It is genuinely impressive engineering.
The multi-model strategy
Google's app approach is worth noting separately from the model itself. Rather than locking users into a Gemma-only experience, the Google AI app supports Qwen and DeepSeek models alongside Gemma. This mirrors what Antigravity has been doing. Own the interface and the infrastructure. Let the model layer remain open. Value accrues to the platform, not the weights.
The question nobody is asking
The on-device AI story has been told almost entirely as a privacy win. Your data stays on your device. No cloud API calls. No data leaving your phone. The framing is uniformly positive.
Nobody is asking what happens when the model becomes the attack surface.
Smartphones are not neutral compute devices. They hold biometric authentication data. They have persistent access to banking applications, health records, location history, and personal communications. They are the most sensitive data repositories most people carry, and they are also constantly being fed documents, images, emails, and web content from the outside world.
Prompt injection (the technique by which malicious content embedded in data fed to an AI model causes it to take unintended actions) is not a theoretical concern. It is an active and growing attack vector. A compromised cloud API is a serious problem. A compromised on-device model with persistent access to everything on your phone, operating with the permissions of a trusted local application, is a different category of problem entirely.
The on-device framing sells security. The reality is that it moves the attack surface closer to the most sensitive data, removes the cloud-side monitoring that can detect anomalous behaviour, and places the security burden on a model and an operating system not designed with adversarial prompt scenarios in mind.
This is worth understanding before encouraging clients to run AI models locally on their devices. And before running one yourself.
The Open Source Multimodal Moment
No single blockbuster this week. Six releases that collectively cover every dimension of multimodal AI, most of them running on consumer hardware.
- Qwen 3.5 Omni (Alibaba): True omnimodal: text, image, audio, and video in one model. Benchmarks ahead of Gemini 3.1 Pro on audio-visual tasks. The standout feature is audiovisual vibe coding. Describe what you want to build to your camera, and the model generates a working application from what it sees and hears. Available now via Qwen Chat and HuggingFace.
- Qwen 3.6 Plus (Alibaba): One million token context window by default, no special settings required. Agentic coding focus with multimodal input. Open-source release confirmed as coming. Alibaba shipped two frontier-level models in the same week. That deserves more attention than it received.
- GLM 5V Turbo (ZAI): Vision coding model that takes sketches, images, and video as input and generates functional applications from them. Benchmarks ahead of Opus 4.6 on design-to-code tasks. Already compatible with Claude Code and OpenClaw via published integration instructions. If you are building design-to-code workflows, test this this week.
- Netflix Void: Netflix's first open-source model release. Video object deletion with physically realistic in-filling. Remove an object from a video and the scene reconstructs naturally around the gap. 22GB so needs higher-end hardware. The provenance alone makes it worth noting. A streaming company releasing open-source AI tooling is a signal about where the industry thinks the value actually sits.
- Omni Voice: 600-language text-to-speech with voice cloning, cross-language voice transfer, and emotion tag support. Under 3GB. Runs on Apple Silicon. One of the most capable open-source voice models released to date, and small enough for most consumer hardware. The 600-language coverage is genuinely differentiated.
- Trinity Large Thinking (RC): An American open-source model under Apache 2, benchmarking near Opus 4.6, GLM5, and Kimi K2.5 on most tasks. The open-source competitive landscape now includes credible American entrants alongside the Chinese labs. Worth watching.
Microsoft and Google: Useful Weeks, No Headlines
MAI Transcribe 1 (Microsoft)
Best-in-class word error rate across 25 languages. Outperforms Whisper, GPT Transcribe, Scribe V2, and Gemini 3.1 Flash by a meaningful margin in independent testing. Built specifically for transcription workloads. Handles noisy environments well. Available now via Microsoft MAI Foundry alongside MAI Voice 1 and MAI Image 2. If your workflows involve transcription at any scale, test this against your current solution before renewing anything.
Google Veo 3.1 Lite
A cost-effective video generation tier at $0.05 per second at 720p. Further price reductions across the Veo 3.1 Fast tier are confirmed for April 7th. The timing relative to Sora's shutdown was not accidental. Google explicitly positioned this as a commitment to video generation at the moment OpenAI was exiting the space. Video generation infrastructure is getting cheaper faster than most production budgets have accounted for. Reprice your video production assumptions accordingly.
Google AI Inbox
Smart email prioritisation with daily personalised briefings. Currently beta-only for Google AI Ultra subscribers at $250 per month. Worth knowing about. Not worth the price for most users yet. Watch for the broader rollout.
What Agencies Do Next
- Understand Chyros before your clients ask you about it. The post-prompting era is not a roadmap item. It is already in Anthropic's source code and running inside OpenClaw's cron job architecture. When clients ask about AI automation, the honest answer increasingly involves systems that act without being asked. Know the capability, the risk profile, and the workflow implications before that conversation arrives.
- Test GLM 5V Turbo in your design-to-code pipeline this week. Sketch or image input to functional application output, benchmarking ahead of Opus 4.6 on design tasks, compatible with Claude Code and OpenClaw via published instructions. This is a direct upgrade to an existing workflow with a measurable quality difference.
- Replace your transcription stack with MAI Transcribe 1 before your next project. Best-in-class accuracy, 25 languages, available via MAI Foundry API. If you are paying for Whisper or a commercial transcription service, run a comparison this week.
- Reprice your video production assumptions. Veo 3.1 Lite at $0.05 per second changes the cost floor for AI video generation. Combined with the Prism Audio open-source sync tool from last week, the infrastructure cost of AI video production is falling faster than most agencies have updated their pricing models to reflect.
- Brief your clients on on-device AI risk before they brief you. The Gemma 4 story will reach them as a pure capability win. The prompt injection risk on mobile devices is real, underreported, and the kind of observation an agency should be raising proactively. Being the person who asked the question first is worth considerably more than explaining the problem after it occurs.
Bangkok8 AI: We'll show you where the ceiling used to be — so you can stop building under it.
