Radar Trends to Watch: December 2025
November ended. Thanksgiving (in the US), turkey, and a train of model announcements. The announcements were exciting: Google’s Gemini 3 puts it in the lead among large language models, at least for the time being. Nano Banana Pro is a spectacularly good text-to-image model. OpenAI has released its heavy hitters, GPT-5.1-Codex-Max and GPT-5.1 Pro. And the Allen Institute released its latest open source model, Olmo 3, the leading open source model from the US.
Since Trends avoids deal-making (should we?), we’ve also avoided the angst around an AI bubble and its implosion. Right now, it’s safe to say that the bubble is formed of money that hasn’t yet been invested, let alone spent. If it is a bubble, it’s in the future. Do promises and wishes make a bubble? Does a bubble made of promises and wishes pop with a bang or a pffft?
AI
- Now that Google and OpenAI have laid down their cards, Anthropic has released its latest heavyweight model: Opus 4.5. They’ve also dropped the price significantly.
- The Allen Institute has launched its latest open source model, Olmo 3. The institute’s opened up the whole development process to allow other teams to understand its work.
- Not to be outdone, Google has introduced Nano Banana Pro (aka Gemini 3 Pro Image), its state-of-the-art image generation model. Nano Banana’s biggest feature is the ability to edit images to change the appearance of items without redrawing them from scratch. And according to Simon WIllison, it watermarks the parts of an image it generates with SynthID.
- OpenAI has released two more components of GPT-5.1, GPT-5.1-Codex-Max (API) and GPT-5.1 Pro (ChatGPT). This release brings the company’s most powerful models for generative work into view.
- A group of quantum physicists claim to have reduced the size of the DeepSeek model by half, and to have removed Chinese censorship. The model can now tell you what happened in Tiananmen Square, explain what Pooh looked like, and answer other forbidden questions.
- The release train for Gemini 3 has begun, and the commentariat quickly crowned it king of the LLMs. It includes the ability to spin up a web interface so users can give it more information about their questions, and to generate diagrams along with text output.
- As part of the Gemini 3 release, Google has also announced a new agentic IDE called Antigravity.
- Google has released a new weather forecasting model, WeatherNext 2, that can forecast with resolutions up to 1 hour. The data is available through Earth Engine and BigQuery, for those who would like to do their own forecasting. There’s also an early access program on Vertex AI.
- Grok 4.1 has been released, with reports that it is currently the best model at generative prose, including creative writing. Be that as it may, we don’t see why anyone would use an AI that has been trained to reflect Elon Musk’s thoughts and values. If AI has taught us one thing, it’s that we need to think for ourselves.
- AI demands the creation of new data centers and new energy sources. States want to ensure that those power plants are built, and built in ways that don’t pass costs on to consumers.
- Grokipedia uses questionable sources. Is anyone surprised? How else would you train an AI on the latest conspiracy theories?
- AMD GPUs are competitive, but they’re hampered because there are few libraries for low-level operations. To solve this problem, Chris Ré and others have announced HipKittens, a library of programming primitive operations for AMD GPUs.
- OpenAI has released GPT-5.1. The two new models are Instant, which is tuned to be more conversational and “human,” and Thinking, a reasoning model that now adapts the time it takes to “think” to the difficulty of the questions.
- Large language models, including GPT-5 and the Chinese models, show bias against users who use a German dialect rather than standard German. The bias appeared to be greater as the model size increased. These results also apply to languages like English.
- Ethan Mollick on evaluating (ultimately, interviewing) your AI models is a must-read.
- Yann LeCun is leaving Facebook to launch a new startup that will develop his ideas about building AI.
- Harbor is a new tool that simplifies benchmarking frameworks and models. It’s from the developers of the Terminal-Bench benchmark. And it brings us a step closer to a world where people build their own specialized AI rather than rely on large providers.
- Music rights holders are beginning to make deals with Udio (and presumably other companies) that train their models on existing music. Unfortunately, this doesn’t solve the bigger problem: Music is a “collectively produced shared cultural good, sustained by human labor. Copyright isn’t suited to protecting this kind of shared value,” as professors Oliver Bown and Kathy Bowrey have argued.
- Moonshot AI has finally released Kimi K2 Thinking, the first open weights model to have benchmark results competitive with—or exceeding—the best closed weights models. It’s designed to be used as an agent, calling external tools as needed to solve problems.
- Tongyi DeepResearch is a new fully open source agent for doing research. Its results are comparable to OpenAI deep research, Claude Sonnet 4, and similar models. Tongyi is part of Alibaba; it’s yet another important model to come out of China.
- Data centers in space? It’s an interesting and challenging idea. Cooling is a much bigger problem than you’d expect. They would require massive arrays of solar cells for power. But some people think it might happen.
- MiniMax M2 is a new open weights model that focuses on building agents. It has performance similar to Claude Sonnet but at a much lower price point. It also embeds its thought processes between <think> and </think> tags, which is an important step toward interpretability.
- DeepSeek has introduced a new model for OCR with some very interesting properties: It has a new process for storing and retrieving memories that also makes the model significantly more efficient.
- Agent Lightning provides a code-free way to train agents using reinforcement learning.
Programming
- The Zig programming language has published a book. Online, of course.
- Google is weakening its controversial new rules about developer verification. The company plans to create a separate class for applications with limited distribution, and develop a flow that will allow the installation of unverified apps.
- Google’s LiteRT is a library for running AI models in browsers and small devices. LiteRT supports Android, iOS, embedded Linux, and microcontrollers. Supported languages include Java, Kotlin, Swift, Embedded C, and C++.
- Does AI-assisted coding mean the end of new languages? Simon Willison thinks that LLMs can encourage the development of new programming languages. Design your language and ship it with a Claude Skills-style document; that should be enough for an LLM to learn how to use it.
- Deepnote, a successor to the Jupyter Notebook, is a next-generation notebook for data analytics that’s built for teams. There’s now a shared workspace; different blocks can use different languages; and AI integration is on the road map. It’s now open source.
- The idea of assigning colors (red, blue) to tools may be helpful in limiting the risk of prompt injection when building agents. What tools can return something damaging? This sounds like a step towards the application of the “least privilege” principle to AI design.
Security
- We’re making the same mistake with AI security as we made with cloud security (and security in general): treating security as an afterthought.
- Anthropic claims to have disrupted a Chinese cyberespionage group that was using Claude to generate attacks against other systems. Anthropic claims that the attack was 90% automated, though that claim is controversial.
- Don’t become a victim. Data collected for online age verification makes your site a target for attackers. That data is valuable, and they know it.
- A research collaboration uses data poisoning and AI to disrupt deepfake images. Users use Silverer to process their images before posting. The tool makes invisible changes to the original image that confuse AIs creating new images, leading to unusable distortions.
- Is it a surprise that AI is being used to generate fake receipts and expense reports? After all, it’s used to fake just about everything else. It was inevitable that enterprise applications of AI fakery would appear.
- HydraPWK2 is a Linux distribution designed for penetration testing. It’s based on Debian and is supposedly easier to use than Kali Linux.
- How secure is your trusted execution environment (TEE)? All of the major hardware vendors are vulnerable to a number of physical attacks against “secure enclaves.” And their terms of service often exclude physical attacks.
- Atroposia is a new malware-as-a-service package that includes a local vulnerability scanner. Once an attacker has broken into a site, they can find other ways to remain there.
- A new kind of phishing attack (CoPhishing) uses Microsoft Copilot Studio agents to steal credentials by abusing the Sign In topic. Microsoft has promised an update that will defend against this attack.
Operations
- Here’s how to install Open Notebook, an open source equivalent to NotebookLM, to run on your own hardware. It uses Docker and Ollama to run the notebook and the model locally, so data never leaves your system.
- Open source isn’t “free as in beer.” Nor is it “free as in freedom.” It’s “free as in puppies.” For better or for worse, that just about says it.
- Need a framework for building proxies? Cloudflare’s next generation Oxy framework might be what you need. (Whatever you think of their recent misadventure.)
- MIT Media Labs’ Project NANDA intends to build infrastructure for a decentralized network of AI agents. They describe it as a global decentralized registry (not unlike DNS) that can be used to discover and authenticate agents using MCP and A2A. Isn’t this what we wanted from the internet in the first place?
Web
- The spread of misinformation can be reduced, though not stopped, by adding a small amount of friction to the sharing process.
- Netflix has released an excellent guide to using generative AI to create content.
- Luke Wroblewski suggests a new model for designing AI chat sessions. A simple chat isn’t as simple as it seems; particularly with reasoning models, it can become cluttered to the point of uselessness. This new design addresses those problems.
Things
- You need the Slightly Annoying Rubik’s Cube Automatic Solving Machine aka S.A.R.C.A.S.M. And you can make your own.
- Tracking butterfly migration: Can you make a sensor with associated telemetry small enough to fit on a butterfly tag? Yes, you can!









