Normal view

There are new articles available, click to refresh the page.
Before yesterdayMain stream

Radar Trends to Watch: December 2025

2 December 2025 at 07:15

November ended. Thanksgiving (in the US), turkey, and a train of model announcements. The announcements were exciting: Google’s Gemini 3 puts it in the lead among large language models, at least for the time being. Nano Banana Pro is a spectacularly good text-to-image model. OpenAI has released its heavy hitters, GPT-5.1-Codex-Max and GPT-5.1 Pro. And the Allen Institute released its latest open source model, Olmo 3, the leading open source model from the US.

Since Trends avoids deal-making (should we?), we’ve also avoided the angst around an AI bubble and its implosion. Right now, it’s safe to say that the bubble is formed of money that hasn’t yet been invested, let alone spent. If it is a bubble, it’s in the future. Do promises and wishes make a bubble? Does a bubble made of promises and wishes pop with a bang or a pffft?

AI

  • Now that Google and OpenAI have laid down their cards, Anthropic has released its latest heavyweight model: Opus 4.5. They’ve also dropped the price significantly.
  • The Allen Institute has launched its latest open source model, Olmo 3. The institute’s opened up the whole development process to allow other teams to understand its work.
  • Not to be outdone, Google has introduced Nano Banana Pro (aka Gemini 3 Pro Image), its state-of-the-art image generation model. Nano Banana’s biggest feature is the ability to edit images to change the appearance of items without redrawing them from scratch. And according to Simon WIllison, it watermarks the parts of an image it generates with SynthID.
  • OpenAI has released two more components of GPT-5.1, GPT-5.1-Codex-Max (API) and GPT-5.1 Pro (ChatGPT). This release brings the company’s most powerful models for generative work into view.
  • A group of quantum physicists claim to have reduced the size of the DeepSeek model by half, and to have removed Chinese censorship. The model can now tell you what happened in Tiananmen Square, explain what Pooh looked like, and answer other forbidden questions.
  • The release train for Gemini 3 has begun, and the commentariat quickly crowned it king of the LLMs. It includes the ability to spin up a web interface so users can give it more information about their questions, and to generate diagrams along with text output.
  • As part of the Gemini 3 release, Google has also announced a new agentic IDE called Antigravity.
  • Google has released a new weather forecasting model, WeatherNext 2, that can forecast with resolutions up to 1 hour. The data is available through Earth Engine and BigQuery, for those who would like to do their own forecasting. There’s also an early access program on Vertex AI.
  • Grok 4.1 has been released, with reports that it is currently the best model at generative prose, including creative writing. Be that as it may, we don’t see why anyone would use an AI that has been trained to reflect Elon Musk’s thoughts and values. If AI has taught us one thing, it’s that we need to think for ourselves.
  • AI demands the creation of new data centers and new energy sources. States want to ensure that those power plants are built, and built in ways that don’t pass costs on to consumers.
  • Grokipedia uses questionable sources. Is anyone surprised? How else would you train an AI on the latest conspiracy theories?
  • AMD GPUs are competitive, but they’re hampered because there are few libraries for low-level operations. To solve this problem, Chris Ré and others have announced HipKittens, a library of programming primitive operations for AMD GPUs.
  • OpenAI has released GPT-5.1. The two new models are Instant, which is tuned to be more conversational and “human,” and Thinking, a reasoning model that now adapts the time it takes to “think” to the difficulty of the questions.
  • Large language models, including GPT-5 and the Chinese models, show bias against users who use a German dialect rather than standard German. The bias appeared to be greater as the model size increased. These results also apply to languages like English.
  • Ethan Mollick on evaluating (ultimately, interviewing) your AI models is a must-read.
  • Yann LeCun is leaving Facebook to launch a new startup that will develop his ideas about building AI.
  • Harbor is a new tool that simplifies benchmarking frameworks and models. It’s from the developers of the Terminal-Bench benchmark. And it brings us a step closer to a world where people build their own specialized AI rather than rely on large providers.
  • Music rights holders are beginning to make deals with Udio (and presumably other companies) that train their models on existing music. Unfortunately, this doesn’t solve the bigger problem: Music is a “collectively produced shared cultural good, sustained by human labor. Copyright isn’t suited to protecting this kind of shared value,” as professors Oliver Bown and Kathy Bowrey have argued.
  • Moonshot AI has finally released Kimi K2 Thinking, the first open weights model to have benchmark results competitive with—or exceeding—the best closed weights models. It’s designed to be used as an agent, calling external tools as needed to solve problems.
  • Tongyi DeepResearch is a new fully open source agent for doing research. Its results are comparable to OpenAI deep research, Claude Sonnet 4, and similar models. Tongyi is part of Alibaba; it’s yet another important model to come out of China.
  • Data centers in space? It’s an interesting and challenging idea. Cooling is a much bigger problem than you’d expect. They would require massive arrays of solar cells for power. But some people think it might happen.
  • MiniMax M2 is a new open weights model that focuses on building agents. It has performance similar to Claude Sonnet but at a much lower price point. It also embeds its thought processes between <think> and </think> tags, which is an important step toward interpretability.
  • DeepSeek has introduced a new model for OCR with some very interesting properties: It has a new process for storing and retrieving memories that also makes the model significantly more efficient.
  • Agent Lightning provides a code-free way to train agents using reinforcement learning.

Programming

  • The Zig programming language has published a book. Online, of course.
  • Google is weakening its controversial new rules about developer verification. The company plans to create a separate class for applications with limited distribution, and develop a flow that will allow the installation of unverified apps.
  • Google’s LiteRT is a library for running AI models in browsers and small devices. LiteRT supports Android, iOS, embedded Linux, and microcontrollers. Supported languages include Java, Kotlin, Swift, Embedded C, and C++.
  • Does AI-assisted coding mean the end of new languages? Simon Willison thinks that LLMs can encourage the development of new programming languages. Design your language and ship it with a Claude Skills-style document; that should be enough for an LLM to learn how to use it.
  • Deepnote, a successor to the Jupyter Notebook, is a next-generation notebook for data analytics that’s built for teams. There’s now a shared workspace; different blocks can use different languages; and AI integration is on the road map. It’s now open source.
  • The idea of assigning colors (red, blue) to tools may be helpful in limiting the risk of prompt injection when building agents. What tools can return something damaging? This sounds like a step towards the application of the “least privilege” principle to AI design.

Security

  • We’re making the same mistake with AI security as we made with cloud security (and security in general): treating security as an afterthought.
  • Anthropic claims to have disrupted a Chinese cyberespionage group that was using Claude to generate attacks against other systems. Anthropic claims that the attack was 90% automated, though that claim is controversial.
  • Don’t become a victim. Data collected for online age verification makes your site a target for attackers. That data is valuable, and they know it.
  • A research collaboration uses data poisoning and AI to disrupt deepfake images. Users use Silverer to process their images before posting. The tool makes invisible changes to the original image that confuse AIs creating new images, leading to unusable distortions.
  • Is it a surprise that AI is being used to generate fake receipts and expense reports? After all, it’s used to fake just about everything else. It was inevitable that enterprise applications of AI fakery would appear.
  • HydraPWK2 is a Linux distribution designed for penetration testing. It’s based on Debian and is supposedly easier to use than Kali Linux.
  • How secure is your trusted execution environment (TEE)? All of the major hardware vendors are vulnerable to a number of physical attacks against “secure enclaves.” And their terms of service often exclude physical attacks.
  • Atroposia is a new malware-as-a-service package that includes a local vulnerability scanner. Once an attacker has broken into a site, they can find other ways to remain there.
  • A new kind of phishing attack (CoPhishing) uses Microsoft Copilot Studio agents to steal credentials by abusing the Sign In topic. Microsoft has promised an update that will defend against this attack.

Operations

  • Here’s how to install Open Notebook, an open source equivalent to NotebookLM, to run on your own hardware. It uses Docker and Ollama to run the notebook and the model locally, so data never leaves your system.
  • Open source isn’t “free as in beer.” Nor is it “free as in freedom.” It’s “free as in puppies.” For better or for worse, that just about says it.
  • Need a framework for building proxies? Cloudflare’s next generation Oxy framework might be what you need. (Whatever you think of their recent misadventure.)
  • MIT Media LabsProject NANDA intends to build infrastructure for a decentralized network of AI agents. They describe it as a global decentralized registry (not unlike DNS) that can be used to discover and authenticate agents using MCP and A2A. Isn’t this what we wanted from the internet in the first place?

Web

Things

Radar Trends to Watch: November 2025

4 November 2025 at 07:02

AI has so thoroughly colonized every technical discipline that it’s becoming hard to organize items of interest in Radar Trends. Should a story go under AI or programming (or operations or biology or whatever the case may be)? Maybe it’s time to go back to a large language model that doesn’t require any electricity and has over 217K parameters: Merriam-Webster. But no matter where these items ultimately appear, it’s good to see practical applications of AI in fields as diverse as bioengineering and UX design.

AI

  • Alibaba’s Ling-1T may be the best model you’ve never heard of. It’s a nonthinking mixture-of-experts model with 1T parameters, 50B active at any time. And it’s open weights (MIT license).
  • Marin is a new lab for creating fully open source models. They say that the development of models will be completely transparent from the beginning. Everything is tracked by GitHub; all experiments may be observed by anyone; there’s no cherrypicking of results.
  • WebMCP is a proposal and an implementation for a protocol that allows websites to become MCP servers. As servers, they can interact directly with agents and LLMs.
  • Claude has announced Agent Skills. Skills are essentially just a Markdown file describing how to perform a task, possibly accompanied by scripts and resources. They’re easy to add and only used as needed. A Skill-creator Skill makes it very easy to build Skills. Simon Willison thinks that Skills may be a “bigger deal than MCP.”
  • Pete Warden describes his work on the smallest of AI. Small AI serves an important set of applications without compromising privacy or requiring enormous resources.
  • Anthropic has released Claude Haiku 4.5, skipping 4.0 and 4.1 in the process. Haiku is their smallest and fastest model. The new release claims performance similar to Sonnet 4, but it’s much faster and less expensive.
  • NVIDIA is now offering the DGX Spark, a desktop AI supercomputer. It offers 1 petaflop performance on models with up to 200B parameters. Simon Willison has a review of a preview unit.
  • Andrej Karpathy has released nanochat, a small ChatGPT-like model that’s completely open and can be trained for roughly $100. It’s intended for experimenters, and Karpathy has detailed instructions on building and training.
  • There’s an agent-shell for Emacs? There had to be one. Emacs abhors a vacuum.
  • Anthropic launched “plugins,” which give developers the ability to write extensions to Claude Code. Of course, these extensions can be agents. Simon Willison points to Jesse Vincent’s Superpowers as a glimpse of what plugins can accomplish.
  • Google has released the Gemini 2.5 Computer Use model into public preview. While the thrill of teaching computers to click browsers and other web applications faded quickly, Gemini 2.5 Computer Use appears to be generating excitement.
  • Thinking Machines Labs has announced Tinker, an API for training open weight language models. Tinker runs on Thinking Machines’ infrastructure. It’s currently in beta.
  • Merriam-Webster will release its newest large language model on November 18. It has no data centers and requires no electricity.
  • We know that the data products, including AI, reflect historical biases in their training data. In India, OpenAI reflects caste biases. But it’s not just OpenAI; these biases appear in all models. Although caste bias was outlawed in the middle of the 20th century, these biases live on in the data.
  • DeepSeek has released an experimental version of its reasoning model, DeepSeek-V3.2-Exp. This model uses a technique called sparse attention to reduce the processing requirements (and cost) of the reasoning process.
  • OpenAI has added an Instant Checkout feature that allows users to make purchases with Etsy and Shopify merchants, taking them directly to checkout after finding their products. It’s based on the Agentic Commerce Protocol.
  • OpenAI’s GDPval tests go beyond existing benchmarks by challenging LLMs with real-world tasks rather than simple problems. The tasks were selected from 44 industries and were chosen for economic value.

Programming

  • Steve Yegge’s Beads is a memory management system for coding agents. It’s badly needed, and worth checking out.
  • Do you use coding agents in parallel? Simon Willison was a skeptic, but he’s gradually becoming convinced it’s a good practice.
  • One problem with generative coding is that AI is trained on “the worst code in the world.” For web development, we’ll need better foundations to get to a post–frontend-framework world.
  • If you’ve wanted to program with Claude from your phone or some other device, now you can. Anthropic has added web and mobile interfaces to Claude Code, along with a sandbox for running generated code safely.
  • You may have read “Programming with Nothing,” a classic article that strips programming to the basics of lambda calculus. “Programming with Less Than Nothing” does FizzBuzz in many lines of combinatory logic.
  • What’s the difference between technical debt and architectural debt? Don’t confuse them; they’re significantly different problems, with different solutions.
  • For graph fans: The IRS has released its fact graph, which, among other things, models the US Internal Revenue Code. It can be used with JavaScript and any JVM language.
  • What is spec-driven development? It has become one of the key buzzwords in the discussion of AI-assisted software development. Birgitta Böckeler attempts to define SDD precisely, then looks at three tools for aiding SDD.
  • IEEE Spectrum released its 2025 programming languages rankings. Python is still king, with Java second; JavaScript has fallen from third to fifth. But more important, Spectrum wonders whether AI-assisted programming will make these rankings irrelevant.

Web

  • Cloudflare CEO Matthew Prince is pushing for regulation to prevent Google from tying web crawlers for search and for training content together. You can’t block the training crawler without also blocking the search crawler, and blocking the latter has significant consequences for businesses.
  • OpenAI has released Atlas, its Chromium-based web browser. As you’d expect, AI is integrated into everything. You can chat with the browser, interrogate your history, your settings, or your bookmarks, and (of course) chat with the pages you’re viewing.
  • Try again? Apple has announced a second-generation Vision Pro, with a similar design and at the same price point.
  • Have we passed peak social? Social media usage has been declining for all age groups. The youngest group, 16–24, is the largest but has also shown the sharpest decline. Are we going to reinvent the decentralized web? Or succumb to a different set of walled gardens?
  • Addy Osmani’s post “The History of Core Web Vitals” is a must-read for anyone working in web performance.
  • Features from the major web frameworks are being implemented by browsers. Frameworks won’t disappear, but their importance will diminish. People will again be programming to the browser. In turn, this will make browser testing and standardization that much more important.
  • Luke Wroblewski writes about using AI to solve common problems in user experience (UX). AI can help with problems like collecting data from users and onboarding users to new applications.

Operations

  • There’s a lot to be learned from AWS’s recent outage, which stemmed from a DynamoDB DNS failure in the US-EAST-1 region. It’s important not to write this off as a war story about Amazon’s failure. Instead, think: How do you make your own distributed networks more reliable?
  • PyTorch Monarch is a new library that helps developers manage distributed systems for training AI models. It lets developers write a script that “orchestrates all distributed resources,” allowing the developer to work with them as a single almost-local system.

Security

  • The solution to the fourth part of Kryptos, the cryptosculpture at the CIA’s headquarters, has been discovered! The discovery came through an opsec error that led researchers to the clear text stored at the Smithsonian. This is an important lesson: Attacks against cryptosystems rarely touch the cryptography. They attack the protocols, people, and systems surrounding codes.
  • Public cryptocurrency blockchains are being used by international threat actors as “bulletproof” hosts for storing and distributing malware.
  • Apple is now giving a $2M bounty for zero-day exploits that allow zero-click remote code execution on iOS. These vulnerabilities have been exploited by commercial malware vendors.
  • Signal has incorporated postquantum encryption into its Signal protocol. This is a major technological achievement. They’re one of the few organizations that’s ready for the quantum world.
  • Salesforce is refusing to pay extortion after a major data loss of over a billion records. Data from a number of major accounts was stolen by a group calling itself Scattered LAPSUS$ Hunters. Attackers simply asked the victim’s staff to install an attacker-controlled app.
  • Context is the key to AI security. We’re not surprised; right now, context is the key to just about everything in AI. Attackers have the advantage now, but in 3–5 years that advantage will pass to defenders who use AI effectively.
  • Google has announced that Gmail users can now send end-to-end encrypted (E2EE) regardless of whether they’re using Gmail. Recipients who don’t use Gmail will receive a notification and the ability to read the message on a one-time guest account.
  • The best way to attack your company isn’t through the applications; it’s through the service help desk. Human engineering remains extremely effective—more effective than attacks against software. Training helps; a well-designed workflow and playbook is crucial.
  • Ransomware detection has now been built into the desktop version of Google Drive. When it detects activities that indicate ransomware, Drive suspends file syncing and alerts users. It’s enabled by default, but it is possible to opt out.
  • OpenAI is routing requests with safety issues to an unknown model. This is presumably a specialized version of GPT-5 that has been trained specially to deal with sensitive issues.

Robotics

  • Would you buy a banana from a robot? A small chain of stores in Chicago is finding out.
  • Rodney Brooks, founder of iRobot, warns that humans should stay at least 10 feet (3 meters) away from humanoid walking robots. There is a lot of potential energy in their limbs when they move them to retain balance. Unsurprisingly, this danger stems from the vision-only approach that Tesla and other vendors have adopted. Humans learn and act with all five senses.

Quantum Computing

Biology

Radar Trends to Watch: October 2025

7 October 2025 at 07:17

This month we have two more protocols to learn. Google has announced the Agent Payments Protocol (AP2), which is intended to help agents to engage in ecommerce—it’s largely concerned with authenticating and authorizing parties making a transaction. And the Agent Client Protocol (ACP) is concerned with communications between code editors and coding agents. When implemented, it would allow any code editor to plug in any compliant agent.

All hasn’t been quiet on the virtual reality front. Meta has announced its new VR/AR glasses, with the ability to display images on the lenses along with capabilities like live captioning for conversations. They’re much less obtrusive than the previous generation of VR goggles.

AI

  • Suno has announced an AI-driven digital audio workstation (DAW), a tool for enabling people to be creative with AI-generated music.
  • Ollama has added its own web search API. Ollama’s search API can be used to augment the information available to models. 
  • GitHub Copilot now offers a command-line tool, GitHub CLI. It can use either Claude Sonnet 4 or GPT-5 as the backing model, though other models should be available soon. Claude 4 is the default.
  • Alibaba has released Qwen3-Max, a trillion-plus parameter model. There are reasoning and nonreasoning variants, though the reasoning variant hasn’t yet been released. Alibaba also released models for speech-to-text, vision-language, live translation, and more. They’ve been busy. 
  • GitHub has launched its MCP Registry to make it easier to discover MCP servers archived on GitHub. It’s also working with Anthropic and others to build an open source MCP registry, which lists servers regardless of their origin and integrates with GitHub’s registry. 
  • DeepMind has published version 3.0 of its Frontier Safety Framework, a framework for experimenting with AI-human alignment. They’re particularly interested in scenarios where the AI doesn’t follow a user’s directives, and in behaviors that can’t be traced to a specific reasoning chain.
  • Alibaba has released the Tongyi DeepResearch reasoning model. Tongyi is a 30.5B parameter mixture-of-experts model, with 3.3B parameters active. More importantly, it’s fully open source, with no restrictions on how it can be used. 
  • Locally AI is an iOS app that lets you run large language models on your iPhone or iPad. It works offline; there’s no need for a network connection. 
  • OpenAI has added control over the “reasoning” process to its GPT-5 models. Users can choose between four levels: Light (Pro users only), Standard, Extended, and Heavy (Pro only). 
  • Google has announced the Agent Payments Protocol (AP2), which facilitates purchases. It focuses on authorization (proving that it has the authority to make a purchase), authentication (proving that the merchant is legitimate), and accountability (in case of a fraudulent transaction).
  • Bring Your Own AI: Employee adoption of AI greatly exceeds official IT adoption. We’ve seen this before, on technologies as different as the iPhone and open source.
  • Alibaba has released the ponderously named Qwen3-Next-80B-A3B-Base. It’s a mixture-of-experts model with a high ratio of active parameters to total parameters (3.75%). Alibaba claims that the model cost 1/10 as much to train and is 10 times faster than its previous models. If this holds up, Alibaba is winning on performance where it counts.
  • Anthropic has announced a major upgrade to Claude’s capabilities. It can now execute Python scripts in a sandbox and can create Excel spreadsheets, PowerPoint presentations, PNG files, and other documents. You can upload files for it to analyze. And of course this comes with security risks.
  • The SIFT method—stop, investigate the source, find better sources, and trace quotes to their original context—is a way of structuring your use of AI output that will make you less vulnerable to misinformation. Hint: it’s not just for AI.
  • OpenAI’s Projects feature is now available to free accounts. Projects is a set of tools for organizing conversations with the LLM. Projects are separate workspaces with their own custom instructions, independent memory, and context. They can be forked. Projects sounds something like Git for LLMs—a set of features that’s badly needed.
  • EmbeddingGemma is a new open weights embedding model (308M parameters) that’s designed to run on devices, requiring as little as 200 MB of memory.
  • An experiment with GPT-4o-mini shows that language models can fall to psychological manipulation. Is this surprising? After all, they are trained on human output.
  • Platform Shifts Redefine Apps”: AI is a new kind of platform and demands rethinking what applications mean and how they should work. Failure to do this rethinking may be why so many AI efforts fail.
  • MCP-UI is a protocol that allows MCP servers to send React components or Web Components to agents, allowing the agent to build an appropriate browser-based interface on the fly.
  • The Agent Client Protocol (ACP) is a new protocol that standardizes communications between code editors and coding agents. It’s currently supported by the Zed and Neovim editors, and by the Gemini CLI coding agent.
  • Gemini 2.5 Flash is now using a new image generation model that was internally known as “nano banana.” This new model can edit uploaded images, merge images, and maintain visual consistency across a series of images.

Programming

  • Anthropic released Claude Code 2.0. New features include the ability to checkpoint your work, so that if a coding agent wanders off-course, you can return to a previous state. They have also added the ability to run tasks in the background, call hooks, and use subagents.
  • Suno has announced an AI-driven digital audio workstation (DAW), a tool for enabling people to be creative with AI-generated music.
  • The Wasmer project has announced that it now has full Python support in the beta version of Wasmer Edge, its WebAssembly runtime for serverless edge deployment.
  • Mitchell Hashimoto, founder of Hashicorp, has promised that a library for Ghostty (libghostty) is coming! This library will make it easy to embed a terminal emulator into an application. Perhaps more important, libghostty might standardize the code for terminal output across applications. 
  • There’s a new benchmark for agentic coding: CompileBench. CompileBench tests the ability of models to solve complex problems in figuring out how to build code
  • Apple is reportedly rewriting iOS in a new programming language. Rust would be the obvious choice, but rumors are that it’s something of their own creation. Apple likes languages it can control. 
  • Java 25, the latest long-term support release, has a number of new features that reduce the boilerplate that makes Java difficult to learn. 
  • Luau is a new scripting language derived from Lua. It claims to be fast, small, and safe. It’s backward compatible with Version 5.1 of Lua.
  • OpenAI has launched GPT-5 Codex, its generation model trained specifically for software engineering. Codex is now available both in the CLI tool and through the API. It’s clearly intended to challenge Anthropic’s dominant coding tool, Claude Code.
  • Do prompts belong in code repositories? We’ve argued that prompts should be archived. But they don’t belong in a source code repo like Git. There are better tools available.
  • This is cool and different. A developer has hacked the 2001 game Animal Crossing so that the dialog is generated by LLM rather than coming from the game’s memory.
  • There’s a new programming language, vibe-coded in its entirety with Claude. Cursed is similar to Claude, but all the keywords are Gen Z slang. It’s not yet on the list, but it’s a worthy addition to Esolang
  • Claude Code is now integrated into the Zed editor (beta), using the Agent Client Protocol (ACP)
  • Ida Bechtle’s documentary on the history of Python, complete with many interviews with Guido van Rossum, is a must-watch.

Security

  • The first malicious MCP server has been found in the wild. Postmark-MCP, an MCP server for interacting with the Postmark application, suddenly (version 1.0.16) started sending copies of all the email it handles to its developer.
  • I doubt this is the first time, but supply chain security vulnerabilities have now hit Rust’s package management system, Crates.io. Two packages that steal keys for cryptocurrency wallets have been found. It’s time to be careful about what you download.
  • Cross-agent privilege escalation is a new kind of vulnerability in which a compromised intelligent agent uses indirect prompt injection to cause a victim agent to overwrite its configuration, granting it additional privileges. 
  • GitHub is taking a number of measures to improve software supply chain security, including requiring two-factor authentication (2FA), expanding trusted publishing, and more.
  • A compromised npm package uses a QR code to encode malware. The malware is apparently downloaded in the QR code (which is valid, but too dense to be read by a normal camera), unpacked by the software, and used to steal cookies from the victim’s browser. 
  • Node.js and its package manager npm have been in the news because of an ongoing series of supply chain attacks. Here’s the latest report.
  • A study by Cisco has discovered over a thousand unsecured LLM servers running on Ollama. Roughly 20% were actively serving requests. The rest may have been idle Ollama instances, waiting to be exploited. 
  • Anthropic has announced that Claude will train on data from personal accounts, effective September 28. This includes Free, Pro, and Max plans. Work plans are exempted. While the company says that training on personal data is opt-in, it’s (currently) enabled by default, so it’s opt-out.
  • We now have “vibe hacking,” the use of AI to develop malware. Anthropic has reported several instances in which Claude was used to create malware that the authors could not have created themselves. Anthropic is banning threat actors and implementing classifiers to detect illegal use.
  • Zero trust is basic to modern security. But groups implementing zero trust have to realize that it’s a project that’s never finished. Threats change, people change, systems change.
  • There’s a new technique for jailbreaking LLMs: write prompts with bad grammar and run-on sentences. These seem to prevent guardrails from taking effect. 
  • In an attempt to minimize the propagation of malware on the Android platform, Google plans to block “sideloading” apps for Android devices and require developer ID verification for apps installed through Google Play.
  • A new phishing attack called ZipLine targets companies using their own “contact us” pages. The attacker then engages in an extended dialog with the company, often posing as a potential business partner, before eventually delivering a malware payload.

Operations

  • The 2025 DORA report is out! DORA may be the most detailed summary of the state of the IT industry. DORA’s authors note that AI is everywhere and that the use of AI now improves end-to-end productivity, something that was ambiguous in last year’s report.
  • Microsoft has announced that Word will save files to the cloud (OneDrive) by default. This (so far) appears to apply only when using Windows. The feature is currently in beta.

Web

Virtual and Augmented Reality

  • Meta has announced a pair of augmented reality glasses with a small display on one of the lenses, bringing it to the edge of AR. In addition to displaying apps from your phone, the glasses can do “live captioning” for conversations. The display is controlled by a wristband.

Radar Trends to Watch: September 2025

2 September 2025 at 06:10

For better or for worse, AI has colonized this list so thoroughly that AI itself is little more than a list of announcements about new or upgraded models. But there are other points of interest. Is it just a coincidence (possibly to do with BlackHat) that so much happened in security in the past month? We’re still seeing programming languages—even some new programming languages for writing AI prompts! If you’re into retrocomputing, the much-beloved Commodore 64 is back—with an upgraded audio chip, a new processor, much more RAM, and all your old ports. Heirloom peripherals should still work.

AI

  • OpenAI has released their Realtime APIs. The model supports MCP servers, phone calls using the SIP protocol, and image inputs. The release includes gpt-realtime, an advanced speech-to-speech model.
  • ChatGPT now supports project-only memory. Project memory, which can use previous conversations for additional context, can be limited to a specific project. Project-only memory gives more control over context and prevents one project’s context from contaminating another.
  • FairSense is a framework for investigating whether AI systems are fair early on. FairSense runs long-term simulations to detect whether a system will become unfair as it evolves over time.
  • Agents4Science is a new academic conference in which all the submissions will be researched, written, reviewed, and presented primarily by AI (using text-to-speech for presentations).
  • Drew Breunig’s mix and match cheat sheet for AI job titles is a classic. 
  • Cohere’s Command A Reasoning is another powerful, partially open reasoning model. It is available on Hugging Face. It claims to outperform gpt-oss-120b and DeepSeek R1-0528.
  • DeepSeek has released DeepSeekV3.1. This is a hybrid model that supports reasoning and nonreasoning use. It’s also faster than R1 and has been designed for agentic tasks. It uses reasoning tokens more economically, and it was much less expensive to train than GPT-5.
  • Anthropic has added the ability to terminate chats to Claude Opus. Chats can be terminated if a user persists in making harmful requests. Terminated chats can’t be continued, although users can start a new chat. The feature is currently experimental.
  • Google has released its smallest model yet: Gemma 3 270M. This model is designed for fine-tuning and for deployment on small, limited hardware. Here’s a bedtime story generator that runs in the browser, built with Gemma 3 270M. 
  • ChatGPT has added GMail, Google Calendar, and Google Contacts to its group of connectors, which integrate ChatGPT with other applications. This information will be used to provide additional context—and presumably will be used for training or discovery in ongoing lawsuits. Fortunately, it’s (at this point) opt-in. 
  • Anthropic has upgraded Claude Sonnet 4 with a 1M token context window. The larger context window is only available via the API.
  • OpenAI released GPT-5. Simon Willison’s review is excellent. It doesn’t feel like a breakthrough, but it is quietly better at delivering good results. It is claimed to be less prone to hallucination and incorrect answers. One quirk is that with ChatGPT, GPT-5 determines which model should respond to your prompt.
  • Anthropic is researching persona vectors as a means of training a language model to behave correctly. Steering a model toward inappropriate behavior during training can be a kind of “vaccination” against that behavior when the model is deployed, without compromising other aspects of the model’s behavior.
  • The Darwin Gödel Machine is an agent that can read and modify its own code to improve its performance on tasks. It can add tools, re-organize workflows, and evaluate whether these changes have improved its performance.
  • Grok is at it again: generating nude deepfakes of Taylor Swift without being prompted to do so. I’m sure we’ll be told that this was the result of an unauthorized modification to the system prompt. In AI, some things are predictable.
  • Anthropic has released Claude Opus 4.1, an upgrade to its flagship model. We expect this to be the “gold standard” for generative coding.
  • OpenAI has released two open-weight models, their first since GPT-2: gpt-oss-120b and gpt-oss-20b. They are reasoning models designed for use in agentic applications. Claimed performance is similar to OpenAI’s o3 and o4-mini.
  • OpenAI has also released a “response format” named Harmony. It’s not quite a protocol, but it is a standard that specifies the format of conversations by defining roles (system, user, etc.) and channels (final, analysis, commentary) for a model’s output.
  • Can AIs evolve guilt? Guilt is expressed in human language; it’s in the training data. The AI that deleted a production database because it “panicked” certainly expressed guilt. Whether an AI’s expressions of guilt are meaningful in any way is a different question.
  • Claude Code Router is a tool for routing Claude Code requests to different models. You can choose different models for different kinds of requests.
  • Qwen has released a thinking version of their flagship model, called Qwen3-235B-A22B-Thinking-2507. Thinking cannot be switched on or off. The model was trained with a new reinforcement learning algorithm called Group Sequence Policy Optimization. It burns a lot of tokens, and it’s not very good at pelicans.
  • ChatGPT is releasing “personalities” that control how it formulates its responses. Users can select the personality they want to respond: robot, cynic, listener, sage, and presumably more. 
  • DeepMind has created Aeneas, a new model designed to help scholars understand ancient fragments. In ancient text, large pieces are often missing. Can AI help place these fragments into contexts where they can be understood? Latin only, for now.

Security

  • The US Cybersecurity and Infrastructure Security Agency (CISA) has warned that a serious code execution vulnerability in Git is currently being exploited in the wild.
  • Is it possible to build an agentic browser that is safe from prompt injection? Probably not. Separating user instructions from website content isn’t possible. If a browser can’t take direction from the content of a web page, how is it to act as an agent?
  • The solution to Part 4 of Kryptos, the CIA’s decades-old cryptographic sculpture, is for sale! Jim Sanborn, the creator of Kryptos, is auctioning the solution. He hopes that the winner will preserve the secret and take over verifying people’s claims to have solved the puzzle. 
  • Remember XZ, the supply-chain attack that granted backdoor access via a trojaned compression library? It never went away. Although the affected libraries were quickly patched, it’s still active, and propagating, via Docker images that were built with unpatched libraries. Some gifts keep giving.
  • For August, Embrace the Red published The Month of AI Bugs, a daily post about AI vulnerabilities (mostly various forms of prompt injection). This series is essential reading for AI developers and for security professionals.
  • NIST has finalized a standard for lightweight cryptography. Lightweight cryptography is a cryptographic system designed for use by small devices. It is useful both for encrypting sensitive data and for authentication. 
  • The Dark Patterns Tip Line is a site for reporting dark patterns: design features in websites and applications that are designed to trick us into acting against our own interest.
  • OpenSSH supports post-quantum key agreement, and in versions 10.1 and later, will warn users when they select a non-post-quantum key agreement scheme.
  • SVG files can carry a malware payload; pornographic SVGs include JavaScript payloads that automate clicking “like.” That’s a simple attack with few consequences, but much more is possible, including cross-site scripting, denial of service, and other exploits.
  • Google’s AI agent for discovering security flaws, Big Sleep, has found 20 flaws in popular software. DeepMind discovered and reproduced the flaws, which were then verified by human security experts and reported. Details won’t be provided until the flaws have been fixed.
  • The US CISA (Cybersecurity and Infrastructure Security Agency) has open-sourced Thorium, a platform for malware and forensic analysis.
  • Prompt injection, again: A new prompt injection attack embeds instructions in language that appears to be copyright notices and other legal fine print. To avoid litigation, many models are configured to prioritize legal instructions.
  • Light can be watermarked; this may be useful as a technique for detecting fake or manipulated video.
  • vCISO (Virtual CISO) services are thriving, particularly among small and mid-size businesses that can’t afford a full security team. The use of AI is cutting the vCISO workload. But who takes the blame when there’s an incident?
  • A phishing attack against PyPI users directs them to a fake PyPI site that tells them to verify their login credentials. Stolen credentials could be used to plant malware in the genuine PyPI repository. Users of Mozilla’s add-on repository have also been targeted by phishing attacks.
  • A new ransomware group named Chaos appears to be a rebranding of the BlackSuit group, which was taken down recently. BlackSuit itself is a rebranding of the Royal group, which in turn is a descendant of the Conti group. Whack-a-mole continues.
  • Google’s OSS Rebuild project is an important step forward in supply chain security. Rebuild provides build definitions along with metadata that can confirm projects were built correctly. OSS Rebuild currently supports the NPM, PyPl, and Crates ecosystems.
  • The JavaScript package “is,” which does some simple type checking, has been infected with malware. Supply chain security is a huge issue—be careful what you install!

Programming

  • Claude Code PM is a workflow management system for programming with Claude. It manages PRDs, GitHub, and parallel execution of coding agents. It claims to facilitate collaboration between multiple Claude instances working on the same project. 
  • Rust is increasingly used to implement performance-critical extensions to Python, gradually displacing C. Polars, Pydantic, and FastAPI are three libraries that rely on Rust.
  • Microsoft’s Prompt Orchestration Markup Language (POML) is an HTML-like markup language for writing prompts. It is then compiled into the actual prompt. POML is good at templating and has tags for tabular and document data. Is this a step forward? You be the judge.
  • Claudia is an “elegant desktop companion” for Claude Code; it turns terminal-based Claude Code into something more like an IDE, though it seems to focus more on the workflow than on coding.
  • Google’s LangExtract is a simple but powerful Python library for extracting text from documents. It relies on examples, rather than regular expressions or other hacks, and shows the exact context in which the extracts occur. LangExtract is open source.
  • Microsoft appears to be integrating GitHub into its AI team rather than running it as an independent organization. What this means for GitHub users is unclear. 
  • Cursor now has a command-line interface, almost certainly a belated response to the success of Claude Code CLI and Gemini CLI. 
  • Latency is a problem for enterprise AI. And the root cause of latency in AI applications is usually the database.
  • The Commodore 64 is back. With several orders of magnitude more RAM. And all the original ports, plus HDMI. 
  • Google has announced Gemini CLI GitHub Actions, an addition to their agentic coder that allows it to work directly with GitHub repositories. 
  • JetBrains is developing a new programming language for use when programming with LLMs. That language may be a dialect of English. (Formal informal languages, anyone?) 
  • Pony is a new programming language that is type-safe, memory-safe, exception-safe, race-safe, and deadlock-safe. You can try it in a browser-based playground.

Web

  • The AT Protocol is the core of Bluesky. Here’s a tutorial; use it to build your own Bluesky services, in turn making Bluesky truly federate. 
  • Social media is broken, and probably can’t be fixed. Now you know. The surprise is that the problem isn’t “algorithms” for maximizing engagement; take algorithms away and everything stays the same or gets worse. 
  • The Tiny Awards Finalists show just how much is possible on the Web. They’re moving, creative, and playful. For example, the Traffic Cam Photobooth lets people use traffic cameras to take pictures of themselves, playing with ever-present automated surveillance.
  • A US federal court has found that Facebook illegally collected data from the women’s health app Flo. 
  • The HTML Hobbyist is a great site for people who want to create their own presence on the web—outside of walled gardens, without mind-crushing frameworks. It’s not difficult, and it’s not expensive.

Biology and Quantum Computing

  • Scientists have created biological qubits: quantum qubits built from proteins in living cells. These probably won’t be used to break cryptography, but they are likely to give us insight into how quantum processes work inside living things.

❌
❌