Normal view

There are new articles available, click to refresh the page.
Before yesterdayMain stream

Radar Trends to Watch: January 2026

6 January 2026 at 06:48

Happy New Year! December was a short month, in part because of O’Reilly’s holiday break. But everybody relaxes at the end of the year—either that, or they’re on a sprint to finish something important. We’ll see. OpenAI’s end-of-year sprint was obvious: getting GPT-5.2 out. And getting Disney to invest in them, and license their characters for AI-generated images. They’ve said that they will have guardrails to prevent Mickey from doing anything inappropriate. We’ll see how long that works.

To get a good start on 2026, read Andrej Karpathy’s “2025 LLM Year in Review” for an insightful summary of where we’ve been. Then follow up with the Resonant Computing Manifesto, which represents a lot of serious thought about what software should be.

AI

  • Google has released yet another new model: FunctionGemma, a version of Gemma 3 270M that has been specifically adapted for function calling. It is designed to be fine-tuned easily and is targeted for small, “edge” devices. FunctionGemma is available on Hugging Face.
  • Anthropic is opening its Agent Skills (aka Claude Skills) spec, and added an administrative interface to give IT admins control over which tools are used and how. OpenAI has already quietly adopted them. Are skills about to become a de facto standard?
  • Google has released Gemini 3 Flash, the final model in its Gemini 3 series. Flash combines reasoning ability with speed and economy.
  • Margaret Mitchell writes about the difference between generative and predictive AI. Predictive AI is more likely to solve problems people are facing—and do so without straining resources.
  • NVIDIA has released Nano, the first of its Nemotron 3 models. The larger models in the family, Super and Ultra, are yet to come. Nano is a 30B parameter mixture-of-experts model. All of the Nemotron 3 models are fully open source (most training data, training recipes, pre- and posttraining software, in addition to weights).
  • GPT-5.2 was released. GPT-5.2 targets “professional knowledge workers”: It was designed for tasks like working on spreadsheets, writing documents, and the like. There are three versions: Thinking (a long-horizon reasoning model), Pro, and Instant (tuned for fast results).
  • Disney is investing in OpenAI. One consequence of this deal is that Disney is licensing its characters to OpenAI so that they can be used in the Sora video generator.
  • We understand what cloud native means. What does AI native mean, and what does MCP (and agents) have to do with it?
  • Researchers at the University of Waterloo have discovered a method for pretraining LLMs that is both more accurate than current techniques and 50% more efficient.
  • Mistral has released Devstral 2, their LLM for coding, along with Vibe, a command line interface for Devstral. Devstral comes in two sizes (123B and 24B) and is arguably open source.
  • Anthropic has donated MCP to the Agentic AI Foundation (AAIF), a new open source foundation spun out by the Linux Foundation. OpenAI has contributed AGENTS.md to the AAIF; Block has contributed its agentic platform goose.
  • Google Research has proposed a new Titans architecture for language models, along with the MIRAS framework. Together, they’re intended to allow models to work more efficiently with memory. Is this the next step beyond transformers?
  • Zebra-Llama is a new family of small hybrid models that achieve high efficiency by combining existing pretrained models. Zebra-Llama combines state space models (SSMs) with multihead latent attention (MLA) to achieve near-transformer accuracy with only 7B to 11B pretraining tokens and an 8B parameter teacher.
  • Now there’s Hugging Face Skills! They’ve used it to give Claude the ability to fine-tune an open source LLM. Hugging Face skills interoperate with Codex, Claude Code, Gemini CLI, and Cursor.
  • A research project at OpenAI has developed a model that will tell you when it has failed to follow instructions. This is called (perhaps inappropriately) “confession.” It may be a way for a model to tell when it has made up an answer.
  • Mistral 3 is here. Mistral 3 includes a Large model, plus three smaller models: Ministral 14B, 8B, and 3B. They all have open weights. Performance is comparable to similarly sized models. All of the models are vision-capable.
  • Wikipedia has developed an excellent guide to detecting AI-generated writing.
  • Claude 4.5 has a soul. . .or at least a “soul document” that was used in training to define its personality. Is this similar to the script in a Golem’s mouth?
  • DeepSeek has released V3.2, which incorporates the company’s sparse attention mechanism (DSA), scalable reinforcement learning, and a task synthesis pipeline. Like its predecessors, it’s an open weights model. There’s also a “Speciale” version, only available via API, that’s been tuned for extended reasoning sessions.
  • Black Forest Labs has released FLUX.2, a vision model that’s almost as good as Google’s Nano Banana but is open weight.

Programming

  • It’s Christmas, so of course Matz maintained the tradition of releasing another major version of Ruby—this year, 4.0.
  • The Tor Project is switching to Rust. The rewrite is named Arti and is ready for use.
  • A cognitive architect is a software developer who doesn’t write functions but decomposes larger problems into pieces. Isn’t this one of the things regular architects—and programmers—already do? We’ve been hearing this message from many different sources: It’s all about higher-order thinking.
  • Perhaps this isn’t news, but Rust in the Linux kernel is no longer considered experimental. It’s here to stay. Not all news is surprising.
  • Is PARK the LAMP stack for AI? PARK is PyTorch, AI, Ray, and Kubernetes. Those tools are shaping up to be the foundation of open source AI development. (Ray is a framework for distributing machine learning workloads.)
  • Bryan Cantrill, one of the founders of Oxide Computer, has published a document about how AI is used at Oxide. It’s well worth reading.
  • Go, Rust, and Zig are three relatively new general-purpose languages. Here’s an excellent comparison of the three.
  • Stack Overflow has released a new conversational search tool, AI Assist. It searches Stack Overflow and Stack Exchange and provides chat-like answers.
  • DocumentDB is an open source (MIT license) document store that combines the capabilities of MongoDB and PostgresSQL. It should be particularly useful for building AI applications, supporting session history, conversational history, and semantic caching.
  • “User experience is your moat… Your moat isn’t your model; it’s whether your users feel at home.” From Christina Wodtke’s Eleganthack. Well worth reading.

Security

  • SantaStealer is a new malware-as-a-service operation. It appears to be a rebranding of older malware service, BluelineStealer, and targets data held in the browser, services like Telegram, Discord and Steam, and cryptocurrency wallets. Just in time for the holidays.
  • Another list from MITRE: The top 25 software weaknesses for 2025. These are the most dangerous items added to the CVE database in 2025, based on severity and frequency. The top items on the list are familiar: cross-site scripting, SQL injection, and cross-site request forgery. Are you vulnerable?
  • What is the normalization of deviance in AI? It’s the false sense of safety that comes from ignoring issues like prompt injection because nothing bad has happened yet while simultaneously building agents that perform actions with real-world consequences.
  • Trains were canceled after an AI-edited image of a bridge collapse was posted on social media.
  • Virtual kidnapping is a thing. Nobody is kidnapped, but doctored images from social media are used to “prove” that a person is in captivity.
  • There’s an easy way to jailbreak LLMs. Write poetry. Writing a prompt as poetry seems to evade the defenses of most language models.
  • GrayNoise IP Check is a free tool that checks where your IP address has appeared in a botnet.
  • Attackers are using LLMs to generate new malware. There are several LLMs offering vibe coding services for assisted malware generation. Researchers from Palo Alto Networks report on their capabilities.

Web

  • Google is adding a “user alignment critic” to Chrome. The critic monitors all actions taken by Gemini to ensure that they’re not triggered by indirect prompt injection. The alignment critic also limits Gemini to sites that are relevant to solving the user’s request.
  • Google is doing smart glasses again. The company’s showing off prototypes of Android XR. It’s Gemini-backed, of course; there will be monocular and binocular versions; and it can work with prescription lenses.
  • Another browser? Lightpanda is a web browser designed for machines—crawlers, agents, and other automated browsing applications—that’s built for speed and efficiency.
  • Yet another browser? Nook is open source, privacy protecting, and fast. And it’s for humans.
  • A VT100 terminal emulator in the browser? That’s what you wanted, right? ghostty-web has xterm.js API compatibility, and is built (of course) with Wasm.
  • The Brave browser is testing an AI-assisted mode using its privacy-preserving AI assistant, Leo. Leo can be used to perform agentic multistep tasks. It’s disabled by default.

Hardware

  • Arduino enthusiasts should familiarize themselves with the differences between the licenses for Arduino’s and Adafruit’s products. Adafruit’s licensing is clearly open source; now that Arduino is owned by Qualcomm, its licensing is confusing to say the least.
  • We’ve done a lot to democratize programming in the last decade. Is it now time to democratize designing microchips? Siddharth Garg and others at NYU think so.

Operations

Radar Trends to Watch: December 2025

2 December 2025 at 07:15

November ended. Thanksgiving (in the US), turkey, and a train of model announcements. The announcements were exciting: Google’s Gemini 3 puts it in the lead among large language models, at least for the time being. Nano Banana Pro is a spectacularly good text-to-image model. OpenAI has released its heavy hitters, GPT-5.1-Codex-Max and GPT-5.1 Pro. And the Allen Institute released its latest open source model, Olmo 3, the leading open source model from the US.

Since Trends avoids deal-making (should we?), we’ve also avoided the angst around an AI bubble and its implosion. Right now, it’s safe to say that the bubble is formed of money that hasn’t yet been invested, let alone spent. If it is a bubble, it’s in the future. Do promises and wishes make a bubble? Does a bubble made of promises and wishes pop with a bang or a pffft?

AI

  • Now that Google and OpenAI have laid down their cards, Anthropic has released its latest heavyweight model: Opus 4.5. They’ve also dropped the price significantly.
  • The Allen Institute has launched its latest open source model, Olmo 3. The institute’s opened up the whole development process to allow other teams to understand its work.
  • Not to be outdone, Google has introduced Nano Banana Pro (aka Gemini 3 Pro Image), its state-of-the-art image generation model. Nano Banana’s biggest feature is the ability to edit images to change the appearance of items without redrawing them from scratch. And according to Simon WIllison, it watermarks the parts of an image it generates with SynthID.
  • OpenAI has released two more components of GPT-5.1, GPT-5.1-Codex-Max (API) and GPT-5.1 Pro (ChatGPT). This release brings the company’s most powerful models for generative work into view.
  • A group of quantum physicists claim to have reduced the size of the DeepSeek model by half, and to have removed Chinese censorship. The model can now tell you what happened in Tiananmen Square, explain what Pooh looked like, and answer other forbidden questions.
  • The release train for Gemini 3 has begun, and the commentariat quickly crowned it king of the LLMs. It includes the ability to spin up a web interface so users can give it more information about their questions, and to generate diagrams along with text output.
  • As part of the Gemini 3 release, Google has also announced a new agentic IDE called Antigravity.
  • Google has released a new weather forecasting model, WeatherNext 2, that can forecast with resolutions up to 1 hour. The data is available through Earth Engine and BigQuery, for those who would like to do their own forecasting. There’s also an early access program on Vertex AI.
  • Grok 4.1 has been released, with reports that it is currently the best model at generative prose, including creative writing. Be that as it may, we don’t see why anyone would use an AI that has been trained to reflect Elon Musk’s thoughts and values. If AI has taught us one thing, it’s that we need to think for ourselves.
  • AI demands the creation of new data centers and new energy sources. States want to ensure that those power plants are built, and built in ways that don’t pass costs on to consumers.
  • Grokipedia uses questionable sources. Is anyone surprised? How else would you train an AI on the latest conspiracy theories?
  • AMD GPUs are competitive, but they’re hampered because there are few libraries for low-level operations. To solve this problem, Chris Ré and others have announced HipKittens, a library of programming primitive operations for AMD GPUs.
  • OpenAI has released GPT-5.1. The two new models are Instant, which is tuned to be more conversational and “human,” and Thinking, a reasoning model that now adapts the time it takes to “think” to the difficulty of the questions.
  • Large language models, including GPT-5 and the Chinese models, show bias against users who use a German dialect rather than standard German. The bias appeared to be greater as the model size increased. These results also apply to languages like English.
  • Ethan Mollick on evaluating (ultimately, interviewing) your AI models is a must-read.
  • Yann LeCun is leaving Facebook to launch a new startup that will develop his ideas about building AI.
  • Harbor is a new tool that simplifies benchmarking frameworks and models. It’s from the developers of the Terminal-Bench benchmark. And it brings us a step closer to a world where people build their own specialized AI rather than rely on large providers.
  • Music rights holders are beginning to make deals with Udio (and presumably other companies) that train their models on existing music. Unfortunately, this doesn’t solve the bigger problem: Music is a “collectively produced shared cultural good, sustained by human labor. Copyright isn’t suited to protecting this kind of shared value,” as professors Oliver Bown and Kathy Bowrey have argued.
  • Moonshot AI has finally released Kimi K2 Thinking, the first open weights model to have benchmark results competitive with—or exceeding—the best closed weights models. It’s designed to be used as an agent, calling external tools as needed to solve problems.
  • Tongyi DeepResearch is a new fully open source agent for doing research. Its results are comparable to OpenAI deep research, Claude Sonnet 4, and similar models. Tongyi is part of Alibaba; it’s yet another important model to come out of China.
  • Data centers in space? It’s an interesting and challenging idea. Cooling is a much bigger problem than you’d expect. They would require massive arrays of solar cells for power. But some people think it might happen.
  • MiniMax M2 is a new open weights model that focuses on building agents. It has performance similar to Claude Sonnet but at a much lower price point. It also embeds its thought processes between <think> and </think> tags, which is an important step toward interpretability.
  • DeepSeek has introduced a new model for OCR with some very interesting properties: It has a new process for storing and retrieving memories that also makes the model significantly more efficient.
  • Agent Lightning provides a code-free way to train agents using reinforcement learning.

Programming

  • The Zig programming language has published a book. Online, of course.
  • Google is weakening its controversial new rules about developer verification. The company plans to create a separate class for applications with limited distribution, and develop a flow that will allow the installation of unverified apps.
  • Google’s LiteRT is a library for running AI models in browsers and small devices. LiteRT supports Android, iOS, embedded Linux, and microcontrollers. Supported languages include Java, Kotlin, Swift, Embedded C, and C++.
  • Does AI-assisted coding mean the end of new languages? Simon Willison thinks that LLMs can encourage the development of new programming languages. Design your language and ship it with a Claude Skills-style document; that should be enough for an LLM to learn how to use it.
  • Deepnote, a successor to the Jupyter Notebook, is a next-generation notebook for data analytics that’s built for teams. There’s now a shared workspace; different blocks can use different languages; and AI integration is on the road map. It’s now open source.
  • The idea of assigning colors (red, blue) to tools may be helpful in limiting the risk of prompt injection when building agents. What tools can return something damaging? This sounds like a step towards the application of the “least privilege” principle to AI design.

Security

  • We’re making the same mistake with AI security as we made with cloud security (and security in general): treating security as an afterthought.
  • Anthropic claims to have disrupted a Chinese cyberespionage group that was using Claude to generate attacks against other systems. Anthropic claims that the attack was 90% automated, though that claim is controversial.
  • Don’t become a victim. Data collected for online age verification makes your site a target for attackers. That data is valuable, and they know it.
  • A research collaboration uses data poisoning and AI to disrupt deepfake images. Users use Silverer to process their images before posting. The tool makes invisible changes to the original image that confuse AIs creating new images, leading to unusable distortions.
  • Is it a surprise that AI is being used to generate fake receipts and expense reports? After all, it’s used to fake just about everything else. It was inevitable that enterprise applications of AI fakery would appear.
  • HydraPWK2 is a Linux distribution designed for penetration testing. It’s based on Debian and is supposedly easier to use than Kali Linux.
  • How secure is your trusted execution environment (TEE)? All of the major hardware vendors are vulnerable to a number of physical attacks against “secure enclaves.” And their terms of service often exclude physical attacks.
  • Atroposia is a new malware-as-a-service package that includes a local vulnerability scanner. Once an attacker has broken into a site, they can find other ways to remain there.
  • A new kind of phishing attack (CoPhishing) uses Microsoft Copilot Studio agents to steal credentials by abusing the Sign In topic. Microsoft has promised an update that will defend against this attack.

Operations

  • Here’s how to install Open Notebook, an open source equivalent to NotebookLM, to run on your own hardware. It uses Docker and Ollama to run the notebook and the model locally, so data never leaves your system.
  • Open source isn’t “free as in beer.” Nor is it “free as in freedom.” It’s “free as in puppies.” For better or for worse, that just about says it.
  • Need a framework for building proxies? Cloudflare’s next generation Oxy framework might be what you need. (Whatever you think of their recent misadventure.)
  • MIT Media LabsProject NANDA intends to build infrastructure for a decentralized network of AI agents. They describe it as a global decentralized registry (not unlike DNS) that can be used to discover and authenticate agents using MCP and A2A. Isn’t this what we wanted from the internet in the first place?

Web

Things

Radar Trends to Watch: November 2025

4 November 2025 at 07:02

AI has so thoroughly colonized every technical discipline that it’s becoming hard to organize items of interest in Radar Trends. Should a story go under AI or programming (or operations or biology or whatever the case may be)? Maybe it’s time to go back to a large language model that doesn’t require any electricity and has over 217K parameters: Merriam-Webster. But no matter where these items ultimately appear, it’s good to see practical applications of AI in fields as diverse as bioengineering and UX design.

AI

  • Alibaba’s Ling-1T may be the best model you’ve never heard of. It’s a nonthinking mixture-of-experts model with 1T parameters, 50B active at any time. And it’s open weights (MIT license).
  • Marin is a new lab for creating fully open source models. They say that the development of models will be completely transparent from the beginning. Everything is tracked by GitHub; all experiments may be observed by anyone; there’s no cherrypicking of results.
  • WebMCP is a proposal and an implementation for a protocol that allows websites to become MCP servers. As servers, they can interact directly with agents and LLMs.
  • Claude has announced Agent Skills. Skills are essentially just a Markdown file describing how to perform a task, possibly accompanied by scripts and resources. They’re easy to add and only used as needed. A Skill-creator Skill makes it very easy to build Skills. Simon Willison thinks that Skills may be a “bigger deal than MCP.”
  • Pete Warden describes his work on the smallest of AI. Small AI serves an important set of applications without compromising privacy or requiring enormous resources.
  • Anthropic has released Claude Haiku 4.5, skipping 4.0 and 4.1 in the process. Haiku is their smallest and fastest model. The new release claims performance similar to Sonnet 4, but it’s much faster and less expensive.
  • NVIDIA is now offering the DGX Spark, a desktop AI supercomputer. It offers 1 petaflop performance on models with up to 200B parameters. Simon Willison has a review of a preview unit.
  • Andrej Karpathy has released nanochat, a small ChatGPT-like model that’s completely open and can be trained for roughly $100. It’s intended for experimenters, and Karpathy has detailed instructions on building and training.
  • There’s an agent-shell for Emacs? There had to be one. Emacs abhors a vacuum.
  • Anthropic launched “plugins,” which give developers the ability to write extensions to Claude Code. Of course, these extensions can be agents. Simon Willison points to Jesse Vincent’s Superpowers as a glimpse of what plugins can accomplish.
  • Google has released the Gemini 2.5 Computer Use model into public preview. While the thrill of teaching computers to click browsers and other web applications faded quickly, Gemini 2.5 Computer Use appears to be generating excitement.
  • Thinking Machines Labs has announced Tinker, an API for training open weight language models. Tinker runs on Thinking Machines’ infrastructure. It’s currently in beta.
  • Merriam-Webster will release its newest large language model on November 18. It has no data centers and requires no electricity.
  • We know that the data products, including AI, reflect historical biases in their training data. In India, OpenAI reflects caste biases. But it’s not just OpenAI; these biases appear in all models. Although caste bias was outlawed in the middle of the 20th century, these biases live on in the data.
  • DeepSeek has released an experimental version of its reasoning model, DeepSeek-V3.2-Exp. This model uses a technique called sparse attention to reduce the processing requirements (and cost) of the reasoning process.
  • OpenAI has added an Instant Checkout feature that allows users to make purchases with Etsy and Shopify merchants, taking them directly to checkout after finding their products. It’s based on the Agentic Commerce Protocol.
  • OpenAI’s GDPval tests go beyond existing benchmarks by challenging LLMs with real-world tasks rather than simple problems. The tasks were selected from 44 industries and were chosen for economic value.

Programming

  • Steve Yegge’s Beads is a memory management system for coding agents. It’s badly needed, and worth checking out.
  • Do you use coding agents in parallel? Simon Willison was a skeptic, but he’s gradually becoming convinced it’s a good practice.
  • One problem with generative coding is that AI is trained on “the worst code in the world.” For web development, we’ll need better foundations to get to a post–frontend-framework world.
  • If you’ve wanted to program with Claude from your phone or some other device, now you can. Anthropic has added web and mobile interfaces to Claude Code, along with a sandbox for running generated code safely.
  • You may have read “Programming with Nothing,” a classic article that strips programming to the basics of lambda calculus. “Programming with Less Than Nothing” does FizzBuzz in many lines of combinatory logic.
  • What’s the difference between technical debt and architectural debt? Don’t confuse them; they’re significantly different problems, with different solutions.
  • For graph fans: The IRS has released its fact graph, which, among other things, models the US Internal Revenue Code. It can be used with JavaScript and any JVM language.
  • What is spec-driven development? It has become one of the key buzzwords in the discussion of AI-assisted software development. Birgitta Böckeler attempts to define SDD precisely, then looks at three tools for aiding SDD.
  • IEEE Spectrum released its 2025 programming languages rankings. Python is still king, with Java second; JavaScript has fallen from third to fifth. But more important, Spectrum wonders whether AI-assisted programming will make these rankings irrelevant.

Web

  • Cloudflare CEO Matthew Prince is pushing for regulation to prevent Google from tying web crawlers for search and for training content together. You can’t block the training crawler without also blocking the search crawler, and blocking the latter has significant consequences for businesses.
  • OpenAI has released Atlas, its Chromium-based web browser. As you’d expect, AI is integrated into everything. You can chat with the browser, interrogate your history, your settings, or your bookmarks, and (of course) chat with the pages you’re viewing.
  • Try again? Apple has announced a second-generation Vision Pro, with a similar design and at the same price point.
  • Have we passed peak social? Social media usage has been declining for all age groups. The youngest group, 16–24, is the largest but has also shown the sharpest decline. Are we going to reinvent the decentralized web? Or succumb to a different set of walled gardens?
  • Addy Osmani’s post “The History of Core Web Vitals” is a must-read for anyone working in web performance.
  • Features from the major web frameworks are being implemented by browsers. Frameworks won’t disappear, but their importance will diminish. People will again be programming to the browser. In turn, this will make browser testing and standardization that much more important.
  • Luke Wroblewski writes about using AI to solve common problems in user experience (UX). AI can help with problems like collecting data from users and onboarding users to new applications.

Operations

  • There’s a lot to be learned from AWS’s recent outage, which stemmed from a DynamoDB DNS failure in the US-EAST-1 region. It’s important not to write this off as a war story about Amazon’s failure. Instead, think: How do you make your own distributed networks more reliable?
  • PyTorch Monarch is a new library that helps developers manage distributed systems for training AI models. It lets developers write a script that “orchestrates all distributed resources,” allowing the developer to work with them as a single almost-local system.

Security

  • The solution to the fourth part of Kryptos, the cryptosculpture at the CIA’s headquarters, has been discovered! The discovery came through an opsec error that led researchers to the clear text stored at the Smithsonian. This is an important lesson: Attacks against cryptosystems rarely touch the cryptography. They attack the protocols, people, and systems surrounding codes.
  • Public cryptocurrency blockchains are being used by international threat actors as “bulletproof” hosts for storing and distributing malware.
  • Apple is now giving a $2M bounty for zero-day exploits that allow zero-click remote code execution on iOS. These vulnerabilities have been exploited by commercial malware vendors.
  • Signal has incorporated postquantum encryption into its Signal protocol. This is a major technological achievement. They’re one of the few organizations that’s ready for the quantum world.
  • Salesforce is refusing to pay extortion after a major data loss of over a billion records. Data from a number of major accounts was stolen by a group calling itself Scattered LAPSUS$ Hunters. Attackers simply asked the victim’s staff to install an attacker-controlled app.
  • Context is the key to AI security. We’re not surprised; right now, context is the key to just about everything in AI. Attackers have the advantage now, but in 3–5 years that advantage will pass to defenders who use AI effectively.
  • Google has announced that Gmail users can now send end-to-end encrypted (E2EE) regardless of whether they’re using Gmail. Recipients who don’t use Gmail will receive a notification and the ability to read the message on a one-time guest account.
  • The best way to attack your company isn’t through the applications; it’s through the service help desk. Human engineering remains extremely effective—more effective than attacks against software. Training helps; a well-designed workflow and playbook is crucial.
  • Ransomware detection has now been built into the desktop version of Google Drive. When it detects activities that indicate ransomware, Drive suspends file syncing and alerts users. It’s enabled by default, but it is possible to opt out.
  • OpenAI is routing requests with safety issues to an unknown model. This is presumably a specialized version of GPT-5 that has been trained specially to deal with sensitive issues.

Robotics

  • Would you buy a banana from a robot? A small chain of stores in Chicago is finding out.
  • Rodney Brooks, founder of iRobot, warns that humans should stay at least 10 feet (3 meters) away from humanoid walking robots. There is a lot of potential energy in their limbs when they move them to retain balance. Unsurprisingly, this danger stems from the vision-only approach that Tesla and other vendors have adopted. Humans learn and act with all five senses.

Quantum Computing

Biology

Radar Trends to Watch: October 2025

7 October 2025 at 07:17

This month we have two more protocols to learn. Google has announced the Agent Payments Protocol (AP2), which is intended to help agents to engage in ecommerce—it’s largely concerned with authenticating and authorizing parties making a transaction. And the Agent Client Protocol (ACP) is concerned with communications between code editors and coding agents. When implemented, it would allow any code editor to plug in any compliant agent.

All hasn’t been quiet on the virtual reality front. Meta has announced its new VR/AR glasses, with the ability to display images on the lenses along with capabilities like live captioning for conversations. They’re much less obtrusive than the previous generation of VR goggles.

AI

  • Suno has announced an AI-driven digital audio workstation (DAW), a tool for enabling people to be creative with AI-generated music.
  • Ollama has added its own web search API. Ollama’s search API can be used to augment the information available to models. 
  • GitHub Copilot now offers a command-line tool, GitHub CLI. It can use either Claude Sonnet 4 or GPT-5 as the backing model, though other models should be available soon. Claude 4 is the default.
  • Alibaba has released Qwen3-Max, a trillion-plus parameter model. There are reasoning and nonreasoning variants, though the reasoning variant hasn’t yet been released. Alibaba also released models for speech-to-text, vision-language, live translation, and more. They’ve been busy. 
  • GitHub has launched its MCP Registry to make it easier to discover MCP servers archived on GitHub. It’s also working with Anthropic and others to build an open source MCP registry, which lists servers regardless of their origin and integrates with GitHub’s registry. 
  • DeepMind has published version 3.0 of its Frontier Safety Framework, a framework for experimenting with AI-human alignment. They’re particularly interested in scenarios where the AI doesn’t follow a user’s directives, and in behaviors that can’t be traced to a specific reasoning chain.
  • Alibaba has released the Tongyi DeepResearch reasoning model. Tongyi is a 30.5B parameter mixture-of-experts model, with 3.3B parameters active. More importantly, it’s fully open source, with no restrictions on how it can be used. 
  • Locally AI is an iOS app that lets you run large language models on your iPhone or iPad. It works offline; there’s no need for a network connection. 
  • OpenAI has added control over the “reasoning” process to its GPT-5 models. Users can choose between four levels: Light (Pro users only), Standard, Extended, and Heavy (Pro only). 
  • Google has announced the Agent Payments Protocol (AP2), which facilitates purchases. It focuses on authorization (proving that it has the authority to make a purchase), authentication (proving that the merchant is legitimate), and accountability (in case of a fraudulent transaction).
  • Bring Your Own AI: Employee adoption of AI greatly exceeds official IT adoption. We’ve seen this before, on technologies as different as the iPhone and open source.
  • Alibaba has released the ponderously named Qwen3-Next-80B-A3B-Base. It’s a mixture-of-experts model with a high ratio of active parameters to total parameters (3.75%). Alibaba claims that the model cost 1/10 as much to train and is 10 times faster than its previous models. If this holds up, Alibaba is winning on performance where it counts.
  • Anthropic has announced a major upgrade to Claude’s capabilities. It can now execute Python scripts in a sandbox and can create Excel spreadsheets, PowerPoint presentations, PNG files, and other documents. You can upload files for it to analyze. And of course this comes with security risks.
  • The SIFT method—stop, investigate the source, find better sources, and trace quotes to their original context—is a way of structuring your use of AI output that will make you less vulnerable to misinformation. Hint: it’s not just for AI.
  • OpenAI’s Projects feature is now available to free accounts. Projects is a set of tools for organizing conversations with the LLM. Projects are separate workspaces with their own custom instructions, independent memory, and context. They can be forked. Projects sounds something like Git for LLMs—a set of features that’s badly needed.
  • EmbeddingGemma is a new open weights embedding model (308M parameters) that’s designed to run on devices, requiring as little as 200 MB of memory.
  • An experiment with GPT-4o-mini shows that language models can fall to psychological manipulation. Is this surprising? After all, they are trained on human output.
  • Platform Shifts Redefine Apps”: AI is a new kind of platform and demands rethinking what applications mean and how they should work. Failure to do this rethinking may be why so many AI efforts fail.
  • MCP-UI is a protocol that allows MCP servers to send React components or Web Components to agents, allowing the agent to build an appropriate browser-based interface on the fly.
  • The Agent Client Protocol (ACP) is a new protocol that standardizes communications between code editors and coding agents. It’s currently supported by the Zed and Neovim editors, and by the Gemini CLI coding agent.
  • Gemini 2.5 Flash is now using a new image generation model that was internally known as “nano banana.” This new model can edit uploaded images, merge images, and maintain visual consistency across a series of images.

Programming

  • Anthropic released Claude Code 2.0. New features include the ability to checkpoint your work, so that if a coding agent wanders off-course, you can return to a previous state. They have also added the ability to run tasks in the background, call hooks, and use subagents.
  • Suno has announced an AI-driven digital audio workstation (DAW), a tool for enabling people to be creative with AI-generated music.
  • The Wasmer project has announced that it now has full Python support in the beta version of Wasmer Edge, its WebAssembly runtime for serverless edge deployment.
  • Mitchell Hashimoto, founder of Hashicorp, has promised that a library for Ghostty (libghostty) is coming! This library will make it easy to embed a terminal emulator into an application. Perhaps more important, libghostty might standardize the code for terminal output across applications. 
  • There’s a new benchmark for agentic coding: CompileBench. CompileBench tests the ability of models to solve complex problems in figuring out how to build code
  • Apple is reportedly rewriting iOS in a new programming language. Rust would be the obvious choice, but rumors are that it’s something of their own creation. Apple likes languages it can control. 
  • Java 25, the latest long-term support release, has a number of new features that reduce the boilerplate that makes Java difficult to learn. 
  • Luau is a new scripting language derived from Lua. It claims to be fast, small, and safe. It’s backward compatible with Version 5.1 of Lua.
  • OpenAI has launched GPT-5 Codex, its generation model trained specifically for software engineering. Codex is now available both in the CLI tool and through the API. It’s clearly intended to challenge Anthropic’s dominant coding tool, Claude Code.
  • Do prompts belong in code repositories? We’ve argued that prompts should be archived. But they don’t belong in a source code repo like Git. There are better tools available.
  • This is cool and different. A developer has hacked the 2001 game Animal Crossing so that the dialog is generated by LLM rather than coming from the game’s memory.
  • There’s a new programming language, vibe-coded in its entirety with Claude. Cursed is similar to Claude, but all the keywords are Gen Z slang. It’s not yet on the list, but it’s a worthy addition to Esolang
  • Claude Code is now integrated into the Zed editor (beta), using the Agent Client Protocol (ACP)
  • Ida Bechtle’s documentary on the history of Python, complete with many interviews with Guido van Rossum, is a must-watch.

Security

  • The first malicious MCP server has been found in the wild. Postmark-MCP, an MCP server for interacting with the Postmark application, suddenly (version 1.0.16) started sending copies of all the email it handles to its developer.
  • I doubt this is the first time, but supply chain security vulnerabilities have now hit Rust’s package management system, Crates.io. Two packages that steal keys for cryptocurrency wallets have been found. It’s time to be careful about what you download.
  • Cross-agent privilege escalation is a new kind of vulnerability in which a compromised intelligent agent uses indirect prompt injection to cause a victim agent to overwrite its configuration, granting it additional privileges. 
  • GitHub is taking a number of measures to improve software supply chain security, including requiring two-factor authentication (2FA), expanding trusted publishing, and more.
  • A compromised npm package uses a QR code to encode malware. The malware is apparently downloaded in the QR code (which is valid, but too dense to be read by a normal camera), unpacked by the software, and used to steal cookies from the victim’s browser. 
  • Node.js and its package manager npm have been in the news because of an ongoing series of supply chain attacks. Here’s the latest report.
  • A study by Cisco has discovered over a thousand unsecured LLM servers running on Ollama. Roughly 20% were actively serving requests. The rest may have been idle Ollama instances, waiting to be exploited. 
  • Anthropic has announced that Claude will train on data from personal accounts, effective September 28. This includes Free, Pro, and Max plans. Work plans are exempted. While the company says that training on personal data is opt-in, it’s (currently) enabled by default, so it’s opt-out.
  • We now have “vibe hacking,” the use of AI to develop malware. Anthropic has reported several instances in which Claude was used to create malware that the authors could not have created themselves. Anthropic is banning threat actors and implementing classifiers to detect illegal use.
  • Zero trust is basic to modern security. But groups implementing zero trust have to realize that it’s a project that’s never finished. Threats change, people change, systems change.
  • There’s a new technique for jailbreaking LLMs: write prompts with bad grammar and run-on sentences. These seem to prevent guardrails from taking effect. 
  • In an attempt to minimize the propagation of malware on the Android platform, Google plans to block “sideloading” apps for Android devices and require developer ID verification for apps installed through Google Play.
  • A new phishing attack called ZipLine targets companies using their own “contact us” pages. The attacker then engages in an extended dialog with the company, often posing as a potential business partner, before eventually delivering a malware payload.

Operations

  • The 2025 DORA report is out! DORA may be the most detailed summary of the state of the IT industry. DORA’s authors note that AI is everywhere and that the use of AI now improves end-to-end productivity, something that was ambiguous in last year’s report.
  • Microsoft has announced that Word will save files to the cloud (OneDrive) by default. This (so far) appears to apply only when using Windows. The feature is currently in beta.

Web

Virtual and Augmented Reality

  • Meta has announced a pair of augmented reality glasses with a small display on one of the lenses, bringing it to the edge of AR. In addition to displaying apps from your phone, the glasses can do “live captioning” for conversations. The display is controlled by a wristband.

❌
❌