Google Wins Mobile AI: Siri Gets Gemini + Agent Takeover

Visual representation of Google's mobile AI victory: smartphone emanating electric energy in Google's brand colors with text Google wins the mobile war, symbolizing Gemini's dominance on both Android and iPhone platforms

If you blinked this week, you may have missed the most important partnership in AI history. Apple and Google announced a multi-year deal that puts Gemini inside Siri—meaning Google now controls AI on every major mobile platform on the planet. While that was happening, OpenAI rehired an alleged corporate spy on the same day he got fired for "unethical conduct." Anthropic told their $200/month paying customers they could only use Claude in one specific app—and the community revolted. Claude launched an AI agent that can organize your desktop autonomously. Three other AI agents dropped that can watch you work, learn your tasks, and execute them without you. An open-source music generator rivaled Suno's quality. Meta released a 3D tool that works with phone footage. And both OpenAI and Google released translation tools in the same week—one quietly, one open-sourced. This wasn't just a busy week. This was the week Google locked down mobile AI for the next decade, and the week AI agents became real enough to clean your downloads folder while you grab coffee.

Google Wins the Mobile Wars

Apple Surrenders to Gemini

The biggest news of the week—possibly the year—dropped with surprisingly little fanfare. Apple and Google announced a multi-year collaboration under which the next generation of Apple Foundation models will be based on Google's Gemini models and cloud technology. Translation: when you ask Siri a complex question that requires cloud processing, you're getting Gemini. Not ChatGPT. Not Claude. Gemini.

Apple Intelligence still handles simple on-device tasks with its own smaller models. But the moment Siri needs to tap the cloud for reasoning, generation, or complex queries, it's routing through Google's infrastructure. This is a strategic surrender. Apple spent years positioning itself as the privacy-first alternative to Google. Now they're handing over the most valuable interface on the planet—voice assistants on a billion iPhones—to their biggest competitor in mobile.

For Google, this is total victory. They already own AI on Android through Gemini's deep integration into Chrome, Gmail, Drive, Photos, and every other Google service. Now they own it on iPhone too. It doesn't matter what phone you buy in 2026. You're getting Gemini. The mobile AI wars are over. Google won.

Personal Intelligence: Gemini Gets Creepy Good

Google also rolled out Personal Intelligence this week, a feature that connects Gemini to your Gmail, Google Photos, YouTube, Calendar, and Drive accounts. The demo that went viral: a Google VP asked Gemini what tires he needed for his car—without telling it what car he drives. Gemini searched his Google Photos, found images of his Honda Odyssey, identified the tire size from the photos, and recommended replacement tires.

He then asked for his license plate number. Gemini pulled it from a photo in his camera roll.

This is both impressive and unsettling. The use case is real—who actually remembers their license plate number when filling out forms?—but the implications are obvious. Gemini now has access to everything you've ever photographed, emailed, searched, or saved. It can cross-reference that data in ways no human assistant ever could. That tire example? It's a preview of what happens when an AI has total context on your life.

The catch: Personal Intelligence is only available on Google One AI Pro or Ultra plans (paid tiers), only in the US for now, and—most frustratingly for business users—not available on Google Workspace accounts yet. If you run your email and calendar through a custom domain on Workspace, you can't use this yet. Google says it's coming, but right now it's personal accounts only.

Still, the signal is clear. Google is building toward a future where Gemini is the operating system for your digital life. And with Apple now routing Siri queries through Gemini, that future just got a lot closer.

Veo 3.1 Upgrades: Google's Video Push

Google also upgraded Veo 3.1, their image-to-video model, with several production-ready features. The biggest: native 9:16 vertical video generation (perfect for TikTok, Instagram Reels, YouTube Shorts), 4K upscaling, and massively improved character and object consistency across scenes.

The new "ingredients to video" feature lets you feed Veo a single image and a short prompt—like "documentary style, raccoon runs a coffee shop, dialogue"—and it generates a multi-shot video with dialogue, camera movement, and narrative flow. The character consistency improvements mean you can now use the same character across multiple scenes without the model forgetting what they look like halfway through.

Veo 3.1 is available in YouTube Shorts, the YouTube Create app, the Gemini app, Google Vids, and via API. It's not as polished as Runway or Pika yet, but it's fast, integrated everywhere, and improving rapidly. And unlike most video models, it's already embedded in platforms with billions of users.

Google didn't just win mobile AI this week. They shipped AI across every surface they control—search, photos, email, video creation, voice assistants, and now Siri. While OpenAI was distracted by corporate espionage and Anthropic was burning goodwill with developers, Google executed a full-stack AI takeover.

The Week AI Went Full Corporate Drama

OpenAI's Alleged "Double Agent" Scandal

Here's a story that sounds like a rejected screenplay. Barrett Zoph worked at OpenAI. When Mira Murati—OpenAI's former CTO and briefly its CEO during the Sam Altman ousting drama—left to start her own company called Thinking Machines, she brought Zoph with her as CTO.

This week, Thinking Machines fired Zoph for "unethical conduct." On the same day, OpenAI rehired him.

According to a source who spoke to Wired, Zoph was allegedly passing confidential company information to competitors while working at Thinking Machines. OpenAI released an internal memo saying they "don't share Murati's concerns" about Zoph's conduct. Translation: OpenAI is fine with whatever he did.

The prevailing theory: Zoph was a double agent. He went to Thinking Machines to monitor what Murati was building and fed that intelligence back to OpenAI. Now he's back at OpenAI as if nothing happened.

This is unconfirmed. The source is anonymous. But it fits the pattern of behavior we've seen from OpenAI—aggressive talent poaching, paranoia about competitors, and a willingness to play hardball in ways that would make traditional tech companies uncomfortable. Whether Zoph was actually a spy or just got caught in the crossfire of two feuding executives, the optics are terrible. You don't get fired for unethical conduct and rehired by your old employer on the same day unless something very strange is happening behind the scenes.

Anthropic's Self-Inflicted PR Disaster

While OpenAI was dealing with espionage allegations, Anthropic was busy killing their own developer ecosystem. Here's what happened:

Anthropic offers a Claude Max plan for $100-200/month that includes API access and effectively unlimited usage of Claude Opus, their most powerful model. Developers were using that API key in third-party coding tools like Open Code instead of Anthropic's own Claude Code IDE.

On January 9th, Anthropic quietly updated their abuse detection systems to ban accounts that used Claude API keys outside of Claude Code. No warning. No announcement. Just sudden account bans.

Developers lost their minds. Comments flooded in: "You killed an entire ecosystem." "This is a losing battle." "Opus 4.5 is dominating—it would be smarter to work with Open Code than against them."

The backlash was so severe that entire blog posts were written about how this was a catastrophic misstep. Anthropic was essentially telling customers paying $200/month: "You must use our IDE. If you use our API in a tool you prefer, you're banned."

And here's where it gets delicious: on the exact same day, OpenAI announced they were partnering with Open Code to allow users to bring their OpenAI API subscriptions into Open Code directly. While Anthropic was locking down their ecosystem, OpenAI was opening theirs.

The developer community responded exactly as you'd expect. They praised OpenAI for being open and collaborative. They roasted Anthropic for being controlling and shortsighted. Anthropic's attempt to force people into Claude Code backfired spectacularly.

Here's the lesson: when you're charging $200/month and your customers are power users who have strong opinions about their tools, telling them "my way or the highway" is a fast track to losing them. Especially when your biggest competitor is standing right there offering the highway.

Both stories—the OpenAI spy drama and the Anthropic lockdown—point to the same uncomfortable truth: these companies are more worried about each other than they are about serving users. OpenAI's hiring back alleged spies. Anthropic's banning paying customers for using third-party tools. Both are hemorrhaging trust faster than they're shipping features.

The Agent Takeover Begins

Claude Co-work: Your Desktop Butler

The most genuinely useful launch of the week came from Anthropic—ironically, the same company that just shot themselves in the foot with developers. Claude Co-work is essentially "Claude Code for people who don't code." You give it access to specific folders on your computer, and it can autonomously organize files, summarize documents, prep meeting agendas, and execute multi-step workflows.

One demo showed Claude Co-work scanning a folder of meeting transcripts, identifying action items, checking a calendar for what's urgent, and building an "at a glance" dashboard—all without further input. Another demo showed it reorganizing a chaotic desktop with 200+ files into clean, categorized folders in under a minute.

I tested it myself on my downloads folder, which had 270 files and zero organization. I gave Claude Co-work access and said "organize this." It scanned the contents, proposed a plan (new folders for 3D models, installers, fonts, archives, projects, videos), flagged duplicates for deletion, estimated I could free up 35GB by deleting old installer files, and then executed the entire reorganization autonomously. I didn't touch my keyboard. It just ran terminal commands in the background and cleaned everything up.

The catch: it's Mac-only for now, and you need to be on a paid plan (originally $100/month Max plan, but now also available on the $20/month Pro plan). Windows support and free-tier access are coming eventually, but right now it's limited.

Still, this is the most practical AI agent I've used. It's not a demo. It's not vaporware. It's a real tool that does real work while you do something else. That's the bar AI agents need to clear—and Claude Co-work just cleared it.

ShowUI-Aloha: The Agent That Learns by Watching

If Claude Co-work impressed you, wait until you see ShowUI-Aloha. This is an AI agent that learns how to do tasks by watching a human do them once. You record yourself booking a flight, editing a spreadsheet, or transposing a matrix in Excel. ShowUI-Aloha watches, parses the actions (clicks, drags, typing, scrolling), converts them into executable code, and stores that workflow. Next time you need to do a similar task, it does it for you.

The demos are wild. After watching a human book one flight, it can autonomously search Google Flights and book tickets. After watching someone transpose a matrix in Excel once, it can do batch transpositions on its own. After seeing a background removal workflow in PowerPoint, it handles similar edits without further input.

The architecture is clever: it records your actions, parses them into code, sends that code through a "learner" component to build a task trajectory, and then executes that trajectory through an "actor" module. It outperforms other AI agents like UI-TARS and Claude's Computer Use on benchmark tasks—especially on PowerPoint and video editing workflows.

The best part: it's fully open-source. The GitHub repo includes installation instructions and everything you need to run it locally. You do need an API key from an LLM provider (OpenAI, Anthropic, etc.), but the agent itself is free.

ShowUI-Pi: Smooth Mouse Movements and CAPTCHA Destruction

Then there's ShowUI-Pi, an AI agent designed to solve the one problem every other agent struggles with: smooth, natural mouse movements. Existing agents can click buttons and type text just fine, but they fail at tasks that require dragging, resizing, or drawing.

ShowUI-Pi handles all of that. It can draw text in Microsoft Paint. It can drag video clips around in Premiere Pro. It can resize elements in PowerPoint. It can drag files onto folders. And—most impressively—it can solve slider CAPTCHAs and drag-and-drop CAPTCHAs that trip up every other agent.

In benchmarks, ShowUI-Pi destroyed competitors. It's especially good at PowerPoint tasks, Adobe Premiere workflows, and CAPTCHA solving. Other agents don't even come close.

The code is on GitHub, though documentation is sparse. It looks like the team intends to fully open-source it, but right now you're on your own for setup. Still, the capability is there. Real-time, infinite-duration, photorealistic control over graphical interfaces. The "AI that uses your computer better than you do" era just became real.

Three agents. One week. One organizes your desktop autonomously. One learns by watching you work. One solves CAPTCHAs and manipulates interfaces like a human. If you're still thinking of AI as a chatbot, you're already behind.

Production Tools You Can Use Today

While the tech giants fought over mobile dominance and corporate espionage, the open-source community kept shipping.

HeartMula: The Open-Source Suno Killer

HeartMula is an open-source music generator that creates full songs—vocals, instrumentals, melody, structure—from a text prompt and lyrics. And it's shockingly good. Not "good for open-source" good. Actually competitive with Suno, the current industry leader.

It supports five languages: English, Chinese, Japanese, Korean, and Spanish. The vocals are crisp. The instrumentals are clean. It maintains melody consistency across verses and choruses, which older open-source music models couldn't do. In benchmark tests, it achieved the lowest lyric error rate of any model—better than Suno v5.

The total download is 16GB for the 3 billion parameter model. A 7 billion parameter model is coming soon, which should improve quality even further. It runs locally or on consumer GPUs. For agencies running podcast production, explainer videos, or multilingual campaigns, this is the tool that just eliminated your $30/month Suno subscription.

VerseCrafter, ShapeR, and the 3D Production Flood

Three 3D tools dropped this week, and all of them are production-ready.

VerseCrafter takes a single image, generates a 3D point cloud, and lets you manipulate camera trajectories and object movements in Blender. Feed it a photo of a street scene, drag the camera along a path, move the people forward, and it renders a video with those exact movements. It's like having full 3D control over 2D images. The GitHub repo includes a Blender add-on for direct integration.

ShapeR (from Meta) takes a video or sequence of photos and generates metric-accurate 3D models of every object in the scene—separately. Unlike traditional 3D generators that create one giant mesh, ShapeR creates individual objects you can move, edit, and rearrange. It works with regular phone footage. No special equipment required. The code and training data are fully open-sourced.

RigMo automates the rigging process—adding a skeleton to a 3D model so it can move and animate. It works with humans, animals, quadrupeds, and even non-organic objects like fans. It's not released yet (paper only), but code and data are "coming soon."

For 3D artists, these three tools just eliminated days of manual work. ShapeR handles scene reconstruction. VerseCrafter handles camera and motion control. RigMo handles animation setup. The entire 3D pipeline just got open-sourced.

Flux 2 Klein, VIBE, and the Image Editing Arms Race

Two image editors dropped this week, and both are blazing fast.

Flux 2 Klein is a combined image generator and editor (like Nano Banana) that generates and edits images in under a second. It's not quite as photorealistic as some competitors, but it follows prompts better and runs faster. Total size: 13GB. It's open-source and available on Hugging Face.

VIBE generates 2K resolution images in 4 seconds on an H100 GPU. It combines Nvidia's Sauna 1.5 (a tiny 1.6 billion parameter image model) with Qwen 3VL (a 2 billion parameter encoder). The result is a lightweight, efficient image editor that preserves original image details better than most competitors. It runs on consumer GPUs in just a few seconds. There's a free Hugging Face demo to try it online.

Both tools are production-ready. Both are open-source. Both run on consumer hardware. The gap between "state-of-the-art closed model" and "open-source equivalent" is now measured in weeks, not years.

Pocket TTS and NovaSR: Audio Tools That Run Anywhere

Pocket TTS is a 100 million parameter text-to-speech model that runs in real-time on just a CPU. No GPU required. On a MacBook Air M4, it runs at 6x real-time speed. It supports voice cloning, handles infinitely long text input (you could feed it an entire audiobook), and delivers the first audio chunk in just 200 milliseconds. The downside: English only for now. But because it's open-source, multilingual fine-tunes are inevitable.

NovaSR is a 50 kilobyte audio enhancement model that upscales audio quality at 3,500x real-time speed. It's designed to clean up low-quality recordings, improve clarity, and upsample audio. The examples are impressive—dialogue that was muffled becomes crisp and clear. Music becomes richer. And it's fast enough to run in real-time during live recordings or streaming.

For podcasters, video editors, and content creators, these two tools just made professional-grade audio production accessible on any device. You don't need a GPU cluster. You don't need cloud credits. You need a laptop and 20 minutes to set it up.

The Translation Wars Heat Up

Both OpenAI and Google released translation tools this week. One did it quietly. One open-sourced theirs.

ChatGPT's Stealth Translation Tool

OpenAI quietly rolled out a translation feature at chatgpt.com/translate that supports 50+ languages. It's a direct Google Translate competitor, and it's available on the free tier. The weird part: OpenAI usually hypes every feature launch. This one was silent. No blog post. No announcement. Just a URL that started working.

The interface is clean: select input language (or auto-detect), select output language, paste text, get translation. It works fine. But the stealth launch raises questions. Is this a test? Is OpenAI trying to avoid drawing regulatory scrutiny in translation-heavy markets? Or are they just distracted by all the corporate drama?

Google's Open-Source Counter-Move

Google released Translate Gemma, an open-source translation model supporting 55 languages across three model sizes: 4 billion parameters (optimized for mobile), 12 billion (for laptops), and 27 billion (for H100 GPUs or TPUs). It's fully open-source. You can download it on Kaggle or Hugging Face and run it locally or on your preferred cloud service.

This is classic Google strategy: when a competitor releases a closed tool, release an open-source equivalent that developers can build on. OpenAI's translation tool works. Google's translation model can be customized, fine-tuned, and integrated into any workflow.

DocuSign Joins the Fight

Even DocuSign got in on the action, adding AI-powered contract analysis and legalese translation to their platform. Instead of copying a contract into ChatGPT to analyze it for gotchas, you can now do that directly inside DocuSign. It's a small feature, but it's a signal: every software company is adding AI features to avoid being disrupted by chatbots.

The translation wars aren't about translation. They're about embedding AI into every surface where text exists. Google and OpenAI both know the company that controls language processing controls the interface layer for the entire internet. Neither can afford to cede that ground.

What Agencies Do Next

The tactical takeaway: the production stack just got cheaper and more powerful in a single week.

Use Claude Co-work for desktop organization, meeting prep, and file management workflows. Use HeartMula for music production if Suno's pricing doesn't fit your budget. Use ShapeR and VerseCrafter for 3D asset creation from phone footage or existing images. Use Pocket TTS for voice production on budget hardware—it runs real-time on a CPU. Use VIBE or Flux 2 Klein for fast image editing workflows. And if you're on Google Workspace, start preparing for Personal Intelligence to roll out—because when it does, Gemini will have access to your entire work history.

The strategic takeaway: Google just locked down mobile AI for the next decade. If your agency workflows depend on iOS users, you're now building on Gemini whether you like it or not. Apple surrendered. The mobile wars are over. The winning move: design platform-agnostic systems that work regardless of which tech giant controls the OS.

OpenAI's playing corporate espionage. Anthropic's burning goodwill with developers. Google's executing a full-stack takeover while everyone else is distracted. The only question is whether you're building on bedrock or quicksand.

Bangkok8 AI: We'll show you the partnerships that matter—before they reshape your entire stack.

Loading post...

Post not found

This Week in AI: Google Wins the Mobile Wars, OpenAI Plays Spy Games, and Agents Take Over Your Desktop