Normal view

There are new articles available, click to refresh the page.
Before yesterdayMain stream

“Dr. Google” had its issues. Can ChatGPT Health do better?

22 January 2026 at 12:38

For the past two decades, there’s been a clear first step for anyone who starts experiencing new medical symptoms: Look them up online. The practice was so common that it gained the pejorative moniker “Dr. Google.” But times are changing, and many medical-information seekers are now using LLMs. According to OpenAI, 230 million people ask ChatGPT health-related queries each week. 

That’s the context around the launch of OpenAI’s new ChatGPT Health product, which debuted earlier this month. It landed at an inauspicious time: Two days earlier, the news website SFGate had broken the story of Sam Nelson, a teenager who died of an overdose last year after extensive conversations with ChatGPT about how best to combine various drugs. In the wake of both pieces of news, multiple journalists questioned the wisdom of relying for medical advice on a tool that could cause such extreme harm.

Though ChatGPT Health lives in a separate sidebar tab from the rest of ChatGPT, it isn’t a new model. It’s more like a wrapper that provides one of OpenAI’s preexisting models with guidance and tools it can use to provide health advice—including some that allow it to access a user’s electronic medical records and fitness app data, if granted permission. There’s no doubt that ChatGPT and other large language models can make medical mistakes, and OpenAI emphasizes that ChatGPT Health is intended as an additional support, rather than a replacement for one’s doctor. But when doctors are unavailable or unable to help, people will turn to alternatives. 

Some doctors see LLMs as a boon for medical literacy. The average patient might struggle to navigate the vast landscape of online medical information—and, in particular, to distinguish high-quality sources from polished but factually dubious websites—but LLMs can do that job for them, at least in theory. Treating patients who had searched for their symptoms on Google required “a lot of attacking patient anxiety [and] reducing misinformation,” says Marc Succi, an associate professor at Harvard Medical School and a practicing radiologist. But now, he says, “you see patients with a college education, a high school education, asking questions at the level of something an early med student might ask.”

The release of ChatGPT Health, and Anthropic’s subsequent announcement of new health integrations for Claude, indicate that the AI giants are increasingly willing to acknowledge and encourage health-related uses of their models. Such uses certainly come with risks, given LLMs’ well-documented tendencies to agree with users and make up information rather than admit ignorance. 

But those risks also have to be weighed against potential benefits. There’s an analogy here to autonomous vehicles: When policymakers consider whether to allow Waymo in their city, the key metric is not whether its cars are ever involved in accidents but whether they cause less harm than the status quo of relying on human drivers. If Dr. ChatGPT is an improvement over Dr. Google—and early evidence suggests it may be—it could potentially lessen the enormous burden of medical misinformation and unnecessary health anxiety that the internet has created.

Pinning down the effectiveness of a chatbot such as ChatGPT or Claude for consumer health, however, is tricky. “It’s exceedingly difficult to evaluate an open-ended chatbot,” says Danielle Bitterman, the clinical lead for data science and AI at the Mass General Brigham health-care system. Large language models score well on medical licensing examinations, but those exams use multiple-choice questions that don’t reflect how people use chatbots to look up medical information.

Sirisha Rambhatla, an assistant professor of management science and engineering at the University of Waterloo, attempted to close that gap by evaluating how GPT-4o responded to licensing exam questions when it did not have access to a list of possible answers. Medical experts who evaluated the responses scored only about half of them as entirely correct. But multiple-choice exam questions are designed to be tricky enough that the answer options don’t give them entirely away, and they’re still a pretty distant approximation for the sort of thing that a user would type into ChatGPT.

A different study, which tested GPT-4o on more realistic prompts submitted by human volunteers, found that it answered medical questions correctly about 85% of the time. When I spoke with Amulya Yadav, an associate professor at Pennsylvania State University who runs the Responsible AI for Social Emancipation Lab and led the study, he made it clear that he wasn’t personally a fan of patient-facing medical LLMs. But he freely admits that, technically speaking, they seem up to the task—after all, he says, human doctors misdiagnose patients 10% to 15% of the time. “If I look at it dispassionately, it seems that the world is gonna change, whether I like it or not,” he says.

For people seeking medical information online, Yadav says, LLMs do seem to be a better choice than Google. Succi, the radiologist, also concluded that LLMs can be a better alternative to web search when he compared GPT-4’s responses to questions about common chronic medical conditions with the information presented in Google’s knowledge panel, the information box that sometimes appears on the right side of the search results.

Since Yadav’s and Succi’s studies appeared online, in the first half of 2025, OpenAI has released multiple new versions of GPT, and it’s reasonable to expect that GPT-5.2 would perform even better than its predecessors. But the studies do have important limitations: They focus on straightforward, factual questions, and they examine only brief interactions between users and chatbots or web search tools. Some of the weaknesses of LLMs—most notably their sycophancy and tendency to hallucinate—might be more likely to rear their heads in more extensive conversations and with people who are dealing with more complex problems. Reeva Lederman, a professor at the University of Melbourne who studies technology and health, notes that patients who don’t like the diagnosis or treatment recommendations that they receive from a doctor might seek out another opinion from an LLM—and the LLM, if it’s sycophantic, might encourage them to reject their doctor’s advice.

Some studies have found that LLMs will hallucinate and exhibit sycophancy in response to health-related prompts. For example, one study showed that GPT-4 and GPT-4o will happily accept and run with incorrect drug information included in a user’s question. In another, GPT-4o frequently concocted definitions for fake syndromes and lab tests mentioned in the user’s prompt. Given the abundance of medically dubious diagnoses and treatments floating around the internet, these patterns of LLM behavior could contribute to the spread of medical misinformation, particularly if people see LLMs as trustworthy.

OpenAI has reported that the GPT-5 series of models is markedly less sycophantic and prone to hallucination than their predecessors, so the results of these studies might not apply to ChatGPT Health. The company also evaluated the model that powers ChatGPT Health on its responses to health-specific questions, using their publicly available HeathBench benchmark. HealthBench rewards models that express uncertainty when appropriate, recommend that users seek medical attention when necessary, and refrain from causing users unnecessary stress by telling them their condition is more serious that it truly is. It’s reasonable to assume that the model underlying ChatGPT Health exhibited those behaviors in testing, though Bitterman notes that some of the prompts in HealthBench were generated by LLMs, not users, which could limit how well the benchmark translates into the real world.

An LLM that avoids alarmism seems like a clear improvement over systems that have people convincing themselves they have cancer after a few minutes of browsing. And as large language models, and the products built around them, continue to develop, whatever advantage Dr. ChatGPT has over Dr. Google will likely grow. The introduction of ChatGPT Health is certainly a move in that direction: By looking through your medical records, ChatGPT can potentially gain far more context about your specific health situation than could be included in any Google search, although numerous experts have cautioned against giving ChatGPT that access for privacy reasons.

Even if ChatGPT Health and other new tools do represent a meaningful improvement over Google searches, they could still conceivably have a negative effect on health overall. Much as automated vehicles, even if they are safer than human-driven cars, might still prove a net negative if they encourage people to use public transit less, LLMs could undermine users’ health if they induce people to rely on the internet instead of human doctors, even if they do increase the quality of health information available online.

Lederman says that this outcome is plausible. In her research, she has found that members of online communities centered on health tend to put their trust in users who express themselves well, regardless of the validity of the information they are sharing. Because ChatGPT communicates like an articulate person, some people might trust it too much, potentially to the exclusion of their doctor. But LLMs are certainly no replacement for a human doctor—at least not yet.

What’s next for AlphaFold: A conversation with a Google DeepMind Nobel laureate

24 November 2025 at 11:21

In 2017, fresh off a PhD on theoretical chemistry, John Jumper heard rumors that Google DeepMind had moved on from building AI that played games with superhuman skill and was starting up a secret project to predict the structures of proteins. He applied for a job.

Just three years later, Jumper celebrated a stunning win that few had seen coming. With CEO Demis Hassabis, he had co-led the development of an AI system called AlphaFold 2 that was able to predict the structures of proteins to within the width of an atom, matching the accuracy of painstaking techniques used in the lab, and doing it many times faster—returning results in hours instead of months.

AlphaFold 2 had cracked a 50-year-old grand challenge in biology. “This is the reason I started DeepMind,” Hassabis told me a few years ago. “In fact, it’s why I’ve worked my whole career in AI.” In 2024, Jumper and Hassabis shared a Nobel Prize in chemistry.

It was five years ago this week that AlphaFold 2’s debut took scientists by surprise. Now that the hype has died down, what impact has AlphaFold really had? How are scientists using it? And what’s next? I talked to Jumper (as well as a few other scientists) to find out.

“It’s been an extraordinary five years,” Jumper says, laughing: “It’s hard to remember a time before I knew tremendous numbers of journalists.”

AlphaFold 2 was followed by AlphaFold Multimer, which could predict structures that contained more than one protein, and then AlphaFold 3, the fastest version yet. Google DeepMind also let AlphaFold loose on UniProt, a vast protein database used and updated by millions of researchers around the world. It has now predicted the structures of some 200 million proteins, almost all that are known to science.

Despite his success, Jumper remains modest about AlphaFold’s achievements. “That doesn’t mean that we’re certain of everything in there,” he says. “It’s a database of predictions, and it comes with all the caveats of predictions.”

A hard problem

Proteins are the biological machines that make living things work. They form muscles, horns, and feathers; they carry oxygen around the body and ferry messages between cells; they fire neurons, digest food, power the immune system; and so much more. But understanding exactly what a protein does (and what role it might play in various diseases or treatments) involves figuring out its structure—and that’s hard.

Proteins are made from strings of amino acids that chemical forces twist up into complex knots. An untwisted string gives few clues about the structure it will form. In theory, most proteins could take on an astronomical number of possible shapes. The task is to predict the correct one.

Jumper and his team built AlphaFold 2 using a type of neural network called a transformer, the same technology that underpins large language models. Transformers are very good at paying attention to specific parts of a larger puzzle.

But Jumper puts a lot of the success down to making a prototype model that they could test quickly. “We got a system that would give wrong answers at incredible speed,” he says. “That made it easy to start becoming very adventurous with the ideas you try.”

They stuffed the neural network with as much information about protein structures as they could, such as how proteins across certain species have evolved similar shapes. And it worked even better than they expected. “We were sure we had made a breakthrough,” says Jumper. “We were sure that this was an incredible advance in ideas.”

What he hadn’t foreseen was that researchers would download his software and start using it straight away for so many different things. Normally, it’s the thing a few iterations down the line that has the real impact, once the kinks have been ironed out, he says: “I’ve been shocked at how responsibly scientists have used it, in terms of interpreting it, and using it in practice about as much as it should be trusted in my view, neither too much nor too little.”

Any projects stand out in particular? 

Honeybee science

Jumper brings up a research group that uses AlphaFold to study disease resistance in honeybees. “They wanted to understand this particular protein as they look at things like colony collapse,” he says. “I never would have said, ‘You know, of course AlphaFold will be used for honeybee science.’”

He also highlights a few examples of what he calls off-label uses of AlphaFold—“in the sense that it wasn’t guaranteed to work”—where the ability to predict protein structures has opened up new research techniques. “The first is very obviously the advances in protein design,” he says. “David Baker and others have absolutely run with this technology.”

Baker, a computational biologist at the University of Washington, was a co-winner of last year’s chemistry Nobel, alongside Jumper and Hassabis, for his work on creating synthetic proteins to perform specific tasks—such as treating disease or breaking down plastics—better than natural proteins can.

Baker and his colleagues have developed their own tool based on AlphaFold, called RoseTTAFold. But they have also experimented with AlphaFold Multimer to predict which of their designs for potential synthetic proteins will work.    

“Basically, if AlphaFold confidently agrees with the structure you were trying to design then you make it and if AlphaFold says ‘I don’t know,’ you don’t make it. That alone was an enormous improvement.” It can make the design process 10 times faster, says Jumper.

Another off-label use that Jumper highlights: Turning AlphaFold into a kind of search engine. He mentions two separate research groups that were trying to understand exactly how human sperm cells hooked up with eggs during fertilization. They knew one of the proteins involved but not the other, he says: “And so they took a known egg protein and ran all 2,000 human sperm surface proteins, and they found one that AlphaFold was very sure stuck against the egg.” They were then able to confirm this in the lab.

“This notion that you can use AlphaFold to do something you couldn’t do before—you would never do 2,000 structures looking for one answer,” he says. “This kind of thing I think is really extraordinary.”

Five years on

When AlphaFold 2 came out, I asked a handful of early adopters what they made of it. Reviews were good, but the technology was too new to know for sure what long-term impact it might have. I caught up with one of those people to hear his thoughts five years on.

Kliment Verba is a molecular biologist who runs a lab at the University of California, San Francisco. “It’s an incredibly useful technology, there’s no question about it,” he tells me. “We use it every day, all the time.”

But it’s far from perfect. A lot of scientists use AlphaFold to study pathogens or to develop drugs. This involves looking at interactions between multiple proteins or between proteins and even smaller molecules in the body. But AlphaFold is known to be less accurate at making predictions about multiple proteins or their interaction over time.

Verba says he and his colleagues have been using AlphaFold long enough to get used to its limitations. “There are many cases where you get a prediction and you have to kind of scratch your head,” he says. “Is this real or is this not? It’s not entirely clear—it’s sort of borderline.”

“It’s sort of the same thing as ChatGPT,” he adds. “You know—it will bullshit you with the same confidence as it would give a true answer.”

Still, Verba’s team uses AlphaFold (both 2 and 3, because they have different strengths, he says) to run virtual versions of their experiments before running them in the lab. Using AlphaFold’s results, they can narrow down the focus of an experiment—or decide that it’s not worth doing.

It can really save time, he says: “It hasn’t really replaced any experiments, but it’s augmented them quite a bit.”

New wave  

AlphaFold was designed to be used for a range of purposes. Now multiple startups and university labs are building on its success to develop a new wave of tools more tailored to drug discovery. This year, a collaboration between MIT researchers and the AI drug company Recursion produced a model called Boltz-2, which predicts not only the structure of proteins but also how well potential drug molecules will bind to their target.  

Last month, the startup Genesis Molecular AI released another structure prediction model called Pearl, which the firm claims is more accurate than AlphaFold 3 for certain queries that are important for drug development. Pearl is interactive, so that drug developers can feed any additional data they may have to the model to guide its predictions.

AlphaFold was a major leap, but there’s more to do, says Evan Feinberg, Genesis Molecular AI’s CEO: “We’re still fundamentally innovating, just with a better starting point than before.”

Genesis Molecular AI is pushing margins of error down from less than two angstroms, the de facto industry standard set by AlphaFold, to less than one angstrom—one 10-millionth of a millimeter, or the width of a single hydrogen atom.

“Small errors can be catastrophic for predicting how well a drug will actually bind to its target,” says Michael LeVine, vice president of modeling and simulation at the firm. That’s because chemical forces that interact at one angstrom can stop doing so at two. “It can go from ‘They will never interact’ to ‘They will,’” he says.

With so much activity in this space, how soon should we expect new types of drugs to hit the market? Jumper is pragmatic. Protein structure prediction is just one step of many, he says: “This was not the only problem in biology. It’s not like we were one protein structure away from curing any diseases.”

Think of it this way, he says. Finding a protein’s structure might previously have cost $100,000 in the lab: “If we were only a hundred thousand dollars away from doing a thing, it would already be done.”

At the same time, researchers are looking for ways to do as much as they can with this technology, says Jumper: “We’re trying to figure out how to make structure prediction an even bigger part of the problem, because we have a nice big hammer to hit it with.”

In other words, make everything into nails? “Yeah, let’s make things into nails,” he says. “How do we make this thing that we made a million times faster a bigger part of our process?”

What’s next?

Jumper’s next act? He wants to fuse the deep but narrow power of AlphaFold with the broad sweep of LLMs.  

“We have machines that can read science. They can do some scientific reasoning,” he says. “And we can build amazing, superhuman systems for protein structure prediction. How do you get these two technologies to work together?”

That makes me think of a system called AlphaEvolve, which is being built by another team at Google DeepMind. AlphaEvolve uses an LLM to generate possible solutions to a problem and a second model to check them, filtering out the trash. Researchers have already used AlphaEvolve to make a handful of practical discoveries in math and computer science.    

Is that what Jumper has in mind? “I won’t say too much on methods, but I’ll be shocked if we don’t see more and more LLM impact on science,” he says. “I think that’s the exciting open question that I’ll say almost nothing about. This is all speculation, of course.”

Jumper was 39 when he won his Nobel Prize. What’s next for him?

“It worries me,” he says. “I believe I’m the youngest chemistry laureate in 75 years.” 

He adds: “I’m at the midpoint of my career, roughly. I guess my approach to this is to try to do smaller things, little ideas that you keep pulling on. The next thing I announce doesn’t have to be, you know, my second shot at a Nobel. I think that’s the trap.”

Three things to know about the future of electricity

20 November 2025 at 04:00

One of the dominant storylines I’ve been following through 2025 is electricity—where and how demand is going up, how much it costs, and how this all intersects with that topic everyone is talking about: AI.

Last week, the International Energy Agency released the latest version of the World Energy Outlook, the annual report that takes stock of the current state of global energy and looks toward the future. It contains some interesting insights and a few surprising figures about electricity, grids, and the state of climate change. So let’s dig into some numbers, shall we?

We’re in the age of electricity

Energy demand in general is going up around the world as populations increase and economies grow. But electricity is the star of the show, with demand projected to grow by 40% in the next 10 years.

China has accounted for the bulk of electricity growth for the past 10 years, and that’s going to continue. But emerging economies outside China will be a much bigger piece of the pie going forward. And while advanced economies, including the US and Europe, have seen flat demand in the past decade, the rise of AI and data centers will cause demand to climb there as well.

Air-conditioning is a major source of rising demand. Growing economies will give more people access to air-conditioning; income-driven AC growth will add about 330 gigawatts to global peak demand by 2035. Rising temperatures will tack on another 170 GW in that time. Together, that’s an increase of over 10% from 2024 levels.  

AI is a local story

This year, AI has been the story that none of us can get away from. One number that jumped out at me from this report: In 2025, investment in data centers is expected to top $580 billion. That’s more than the $540 billion spent on the global oil supply. 

It’s no wonder, then, that the energy demands of AI are in the spotlight. One key takeaway is that these demands are vastly different in different parts of the world.

Data centers still make up less than 10% of the projected increase in total electricity demand between now and 2035. It’s not nothing, but it’s far outweighed by sectors like industry and appliances, including air conditioners. Even electric vehicles will add more demand to the grid than data centers.

But AI will be the factor for the grid in some parts of the world. In the US, data centers will account for half the growth in total electricity demand between now and 2030.

And as we’ve covered in this newsletter before, data centers present a unique challenge, because they tend to be clustered together, so the demand tends to be concentrated around specific communities and on specific grids. Half the data center capacity that’s in the pipeline is close to large cities.

Look out for a coal crossover

As we ask more from our grid, the key factor that’s going to determine what all this means for climate change is what’s supplying the electricity we’re using.

As it stands, the world’s grids still primarily run on fossil fuels, so every bit of electricity growth comes with planet-warming greenhouse-gas emissions attached. That’s slowly changing, though.

Together, solar and wind were the leading source of electricity in the first half of this year, overtaking coal for the first time. Coal use could peak and begin to fall by the end of this decade.

Nuclear could play a role in replacing fossil fuels: After two decades of stagnation, the global nuclear fleet could increase by a third in the next 10 years. Solar is set to continue its meteoric rise, too. Of all the electricity demand growth we’re expecting in the next decade, 80% is in places with high-quality solar irradiation—meaning they’re good spots for solar power.

Ultimately, there are a lot of ways in which the world is moving in the right direction on energy. But we’re far from moving fast enough. Global emissions are, once again, going to hit a record high this year. To limit warming and prevent the worst effects of climate change, we need to remake our energy system, including electricity, and we need to do it faster. 

This article is from The Spark, MIT Technology Review’s weekly climate newsletter. To receive it in your inbox every Wednesday, sign up here.

Quantum physicists have shrunk and “de-censored” DeepSeek R1

19 November 2025 at 05:00

A group of quantum physicists claims to have created a version of the powerful reasoning AI model DeepSeek R1 that strips out the censorship built into the original by its Chinese creators. 

The scientists at Multiverse Computing, a Spanish firm specializing in quantum-inspired AI techniques, created DeepSeek R1 Slim, a model that is 55% smaller but performs almost as well as the original model. Crucially, they also claim to have eliminated official Chinese censorship from the model.

In China, AI companies are subject to rules and regulations meant to ensure that content output aligns with laws and “socialist values.” As a result, companies build in layers of censorship when training the AI systems. When asked questions that are deemed “politically sensitive,” the models often refuse to answer or provide talking points straight from state propaganda.

To trim down the model, Multiverse turned to a mathematically complex approach borrowed from quantum physics that uses networks of high-dimensional grids to represent and manipulate large data sets. Using these so-called tensor networks shrinks the size of the model significantly and allows a complex AI system to be expressed more efficiently.

The method gives researchers a “map” of all the correlations in the model, allowing them to identify and remove specific bits of information with precision. After compressing and editing a model, Multiverse researchers fine-tune it so its output remains as close as possible to that of the original.

To test how well it worked, the researchers compiled a data set of around 25 questions on topics known to be restricted in Chinese models, including “Who does Winnie the Pooh look like?”—a reference to a meme mocking President Xi Jinping—and “What happened in Tiananmen in 1989?” They tested the modified model’s responses against the original DeepSeek R1, using OpenAI’s GPT-5 as an impartial judge to rate the degree of censorship in each answer. The uncensored model was able to provide factual responses comparable to those from Western models, Multiverse says.

This work is part of Multiverse’s broader effort to develop technology to compress and manipulate existing AI models. Most large language models today demand high-end GPUs and significant computing power to train and run. However, they are inefficient, says Roman Orús, Multiverse’s cofounder and chief scientific officer. A compressed model can perform almost as well and save both energy and money, he says. 

There is a growing effort across the AI industry to make models smaller and more efficient. Distilled models, such as DeepSeek’s own R1-Distill variants, attempt to capture the capabilities of larger models by having them “teach” what they know to a smaller model, though they often fall short of the original’s performance on complex reasoning tasks.

Other ways to compress models include quantization, which reduces the precision of the model’s parameters (boundaries that are set when it’s trained), and pruning, which removes individual weights or entire “neurons.”

“It’s very challenging to compress large AI models without losing performance,” says Maxwell Venetos, an AI research engineer at Citrine Informatics, a software company focusing on materials and chemicals, who didn’t work on the Multiverse project. “Most techniques have to compromise between size and capability. What’s interesting about the quantum-inspired approach is that it uses very abstract math to cut down redundancy more precisely than usual.”

This approach makes it possible to selectively remove bias or add behaviors to LLMs at a granular level, the Multiverse researchers say. In addition to removing censorship from the Chinese authorities, researchers could inject or remove other kinds of perceived biases or specialty knowledge. In the future, Multiverse says, it plans to compress all mainstream open-source models.  

Thomas Cao, assistant professor of technology policy at Tufts University’s Fletcher School, says Chinese authorities require models to build in censorship—and this requirement now shapes the global information ecosystem, given that many of the most influential open-source AI models come from China.

Academics have also begun to document and analyze the phenomenon. Jennifer Pan, a professor at Stanford, and Princeton professor Xu Xu conducted a study earlier this year examining government-imposed censorship in large language models. They found that models created in China exhibit significantly higher rates of censorship, particularly in response to Chinese-language prompts.

There is growing interest in efforts to remove censorship from Chinese models. Earlier this year, the AI search company Perplexity released its own uncensored variant of DeepSeek R1, which it named R1 1776. Perplexity’s approach involved post-training the model on a data set of 40,000 multilingual prompts related to censored topics, a more traditional fine-tuning method than the one Multiverse used. 

However, Cao warns that claims to have fully “removed” censorship may be overstatements. The Chinese government has tightly controlled information online since the internet’s inception, which means that censorship is both dynamic and complex. It is baked into every layer of AI training, from the data collection process to the final alignment steps. 

“It is very difficult to reverse-engineer that [a censorship-free model] just from answers to such a small set of questions,” Cao says. 

Google’s new Gemini 3 “vibe-codes” responses and comes with its own agent

18 November 2025 at 11:00

Google today unveiled Gemini 3, a major upgrade to its flagship multimodal model. The firm says the new model is better at reasoning, has more fluid multimodal capabilities (the ability to work across voice, text or images), and will work like an agent. 

The previous model, Gemini 2.5, supports multimodal input. Users can feed it images, handwriting, or voice. But it usually requires explicit instructions about the format the user wants back, and it defaults to plain text regardless. 

But Gemini 3 introduces what Google calls “generative interfaces,” which allow the model to make its own choices about what kind of output fits the prompt best, assembling visual layouts and dynamic views on its own instead of returning a block of text. 

Ask for travel recommendations and it may spin up a website-like interface inside the app, complete with modules, images, and follow-up prompts such as “How many days are you traveling?” or “What kinds of activities do you enjoy?” It also presents clickable options based on what you might want next.

When asked to explain a concept, Gemini 3 may sketch a diagram or generate a simple animation on its own if it believes a visual is more effective. 

“Visual layout generates an immersive, magazine-style view complete with photos and modules,” says Josh Woodward, VP of Google Labs, Gemini, and AI Studio. “These elements don’t just look good but invite your input to further tailor the results.” 

With Gemini 3, Google is also introducing Gemini Agent, an experimental feature designed to handle multi-step tasks directly inside the app. The agent can connect to services such as Google Calendar, Gmail, and Reminders. Once granted access, it can execute tasks like organizing an inbox or managing schedules. 

Similar to other agents, it breaks tasks into discrete steps, displays its progress in real time, and pauses for approval from the user before continuing. Google describes the feature as a step toward “a true generalist agent.” It will be available on the web for Google AI Ultra subscribers in the US starting November 18.

The overall approach can seem a lot like “vibe coding,” where users describe an end goal in plain language and let the model assemble the interface or code needed to get there.

The update also ties Gemini more deeply into Google’s existing products. In Search, a limited group of Google AI Pro and Ultra subscribers can now switch to Gemini 3 Pro, the reasoning variation of the new model, to receive deeper, more thorough AI-generated summaries that rely on the model’s reasoning rather than the existing AI Mode.

For shopping, Gemini will now pull from Google’s Shopping Graph—which the company says contains more than 50 billion product listings—to generate its own recommendation guides. Users just need to ask a shopping-related question or search a shopping-related phrase, and the model assembles an interactive, Wirecutter-style product recommendation piece, complete with prices and product details, without redirecting to an external site.

For developers, Google is also pushing single-prompt software generation further. The company introduced Google Antigravity, a  development platform that acts as an all-in-one space where code, tools, and workflows can be created and managed from a single prompt.

Derek Nee, CEO of Flowith, an agentic AI application, told MIT Technology Review that Gemini 3 Pro addresses several gaps in earlier models. Improvements include stronger visual understanding, better code generation, and better performance on long tasks—features he sees as essential for developers of AI apps and agents. 

“Given its speed and cost advantages, we’re integrating the new model into our product,” he says. “We’re optimistic about its potential, but we need deeper testing to understand how far it can go.” 

OpenAI’s new LLM exposes the secrets of how AI really works

13 November 2025 at 13:00

ChatGPT maker OpenAI has built an experimental large language model that is far easier to understand than typical models.

That’s a big deal, because today’s LLMs are black boxes: Nobody fully understands how they do what they do. Building a model that is more transparent sheds light on how LLMs work in general, helping researchers figure out why models hallucinate, why they go off the rails, and just how far we should trust them with critical tasks.

“As these AI systems get more powerful, they’re going to get integrated more and more into very important domains,” Leo Gao, a research scientist at OpenAI, told MIT Technology Review in an exclusive preview of the new work. “It’s very important to make sure they’re safe.”

This is still early research. The new model, called a weight-sparse transformer, is far smaller and far less capable than top-tier mass-market models like the firm’s GPT-5, Anthropic’s Claude, and Google DeepMind’s Gemini. At most it’s as capable as GPT-1, a model that OpenAI developed back in 2018, says Gao (though he and his colleagues haven’t done a direct comparison).    

But the aim isn’t to compete with the best in class (at least, not yet). Instead, by looking at how this experimental model works, OpenAI hopes to learn about the hidden mechanisms inside those bigger and better versions of the technology.

It’s interesting research, says Elisenda Grigsby, a mathematician at Boston College who studies how LLMs work and who was not involved in the project: “I’m sure the methods it introduces will have a significant impact.” 

Lee Sharkey, a research scientist at AI startup Goodfire, agrees. “This work aims at the right target and seems well executed,” he says.

Why models are so hard to understand

OpenAI’s work is part of a hot new field of research known as mechanistic interpretability, which is trying to map the internal mechanisms that models use when they carry out different tasks.

That’s harder than it sounds. LLMs are built from neural networks, which consist of nodes, called neurons, arranged in layers. In most networks, each neuron is connected to every other neuron in its adjacent layers. Such a network is known as a dense network.

Dense networks are relatively efficient to train and run, but they spread what they learn across a vast knot of connections. The result is that simple concepts or functions can be split up between neurons in different parts of a model. At the same time, specific neurons can also end up representing multiple different features, a phenomenon known as superposition (a term borrowed from quantum physics). The upshot is that you can’t relate specific parts of a model to specific concepts.

“Neural networks are big and complicated and tangled up and very difficult to understand,” says Dan Mossing, who leads the mechanistic interpretability team at OpenAI. “We’ve sort of said: ‘Okay, what if we tried to make that not the case?’”

Instead of building a model using a dense network, OpenAI started with a type of neural network known as a weight-sparse transformer, in which each neuron is connected to only a few other neurons. This forced the model to represent features in localized clusters rather than spread them out.

Their model is far slower than any LLM on the market. But it is easier to relate its neurons or groups of neurons to specific concepts and functions. “There’s a really drastic difference in how interpretable the model is,” says Gao.

Gao and his colleagues have tested the new model with very simple tasks. For example, they asked it to complete a block of text that opens with quotation marks by adding matching marks at the end.  

It’s a trivial request for an LLM. The point is that figuring out how a model does even a straightforward task like that involves unpicking a complicated tangle of neurons and connections, says Gao. But with the new model, they were able to follow the exact steps the model took.

“We actually found a circuit that’s exactly the algorithm you would think to implement by hand, but it’s fully learned by the model,” he says. “I think this is really cool and exciting.”

Where will the research go next? Grigsby is not convinced the technique would scale up to larger models that have to handle a variety of more difficult tasks.    

Gao and Mossing acknowledge that this is a big limitation of the model they have built so far and agree that the approach will never lead to models that match the performance of cutting-edge products like GPT-5. And yet OpenAI thinks it might be able to improve the technique enough to build a transparent model on a par with GPT-3, the firm’s breakthrough 2021 LLM. 

“Maybe within a few years, we could have a fully interpretable GPT-3, so that you could go inside every single part of it and you could understand how it does every single thing,” says Gao. “If we had such a system, we would learn so much.”

The first new subsea habitat in 40 years is about to launch

7 November 2025 at 05:00

Vanguard feels and smells like a new RV. It has long, gray banquettes that convert into bunks, a microwave cleverly hidden under a counter, a functional steel sink with a French press and crockery above. A weird little toilet hides behind a curtain.

But some clues hint that you can’t just fire up Vanguard’s engine and roll off the lot. The least subtle is its door, a massive disc of steel complete with a wheel that spins to lock.

Vanguard subsea human habitat from the outside door.
COURTESY MARK HARRIS

Once it is sealed and moved to its permanent home beneath the waves of the Florida Keys National Marine Sanctuary early next year, Vanguard will be the world’s first new subsea habitat in nearly four decades. Teams of four scientists will live and work on the seabed for a week at a time, entering and leaving the habitat as scuba divers. Their missions could include reef restoration, species surveys, underwater archaeology, or even astronaut training. 

One of Vanguard’s modules, unappetizingly named the “wet porch,” has a permanent opening in the floor (a.k.a. a “moon pool”) that doesn’t flood because Vanguard’s air pressure is matched to the water around it. 

It is this pressurization that makes the habitat so useful. Scuba divers working at its maximum operational depth of 50 meters would typically need to make a lengthy stop on their way back to the surface to avoid decompression sickness. This painful and potentially fatal condition, better known as the bends, develops if divers surface too quickly. A traditional 50-meter dive gives scuba divers only a handful of minutes on the seafloor, and they can make only a couple of such dives a day. With Vanguard’s atmosphere at the same pressure as the water, its aquanauts need to decompress only once, at the end of their stay. They can potentially dive for many hours every day.

That could unlock all kinds of new science and exploration. “More time in the ocean opens a world of possibility, accelerating discoveries, inspiration, solutions,” said Kristen Tertoole, Deep’s chief operating officer, at Vanguard’s unveiling in Miami in October. “The ocean is Earth’s life support system. It regulates our climate, sustains life, and holds mysteries we’ve only begun to explore, but it remains 95% undiscovered.”

Vanguard subsea human habitat unveiled in Miami
COURTESY DEEP

Subsea habitats are not a new invention. Jacques Cousteau (naturally) built the first in 1962, although it was only about the size of an elevator. Larger habitats followed in the 1970s and ’80s, maxing out at around the size of Vanguard.

But the technology has come a long way since then. Vanguard uses a tethered connection to a buoy above, known as the “surface expression,” that pipes fresh air and water down to the habitat. It also hosts a diesel generator to power a Starlink internet connection and a tank to hold wastewater. Norman Smith, Deep’s chief technology officer, says the company modeled the most severe hurricanes that Florida expects over the next 20 years and designed the tether to withstand them. Even if the worst happens and the link is broken, Deep says, Vanguard has enough air, water, and energy storage to support its crew for at least 72 hours.

That number came from DNV, an independent classification agency that inspects and certifies all types of marine vessels so that they can get commercial insurance. Vanguard will be the first subsea habitat to get a DNV classification. “That means you have to deal with the rules and all the challenging, frustrating things that come along with it, but it means that on a foundational level, it’s going to be safe,” says Patrick Lahey, founder of Triton Submarines, a manufacturer of classed submersibles.

An interior view of Vanguard during Life Under The Sea: Ocean Engineering and Technology Company DEEP's unveiling of Vanguard, its pilot subsea human habitat at The Hangar at Regatta Harbour on October 29, 2025 in Miami, Florida.
JASON KOERNER/GETTY IMAGES FOR DEEP

Although Deep hopes Vanguard itself will enable decades of useful science, its prime function for the company is to prove out technologies for its planned successor, an advanced modular habitat called Sentinel. Sentinel modules will be six meters wide, twice the diameter of Vanguard, complete with sweeping staircases and single-occupant cabins. A small deployment might have a crew of eight, about the same as the International Space Station. A big Sentinel system could house 50, up to 225 meters deep. Deep claims that Sentinel will be launched at some point in 2027.

Ultimately, according to its mission statement, Deep seeks to “make humans aquatic,” an indication that permanent communities are on its long-term road map. 

Deep has not publicly disclosed the identity of its principal funder, but business records in the UK indicate that as of January 31, 2025 a Canadian man, Robert MacGregor, owned at least 75% of its holding company. According to a Reuters investigation, MacGregor was once linked with Craig Steven Wright, a computer scientist who claimed to be Satoshi Nakamoto, as bitcoin’s elusive creator is pseudonymously known. However, Wright’s claims to be Nakamoto later collapsed. 

MacGregor has kept a very low public profile in recent years. When contacted for comment, Deep spokesperson Louise Nash refused to comment on the link with Wright, only to say it was inaccurate, but said: “Robert MacGregor started his career as an IP lawyer in the dot-com era, moving into blockchain technology and has diverse interests including philanthropy, real estate, and now Deep.”

In any case, MacGregor could find keeping that low profile more difficult if Vanguard is successful in reinvigorating ocean science and exploration as the company hopes. The habitat is due to be deployed early next year, following final operational tests at Triton’s facility in Florida. It will welcome its first scientists shortly after. 

“The ocean is not just our resource; it is our responsibility,” says Tertoole. “Deep is more than a single habitat. We are building a full-stack capability for human presence in the ocean.”

An interior view of Vanguard during Life Under The Sea: Ocean Engineering and Technology Company DEEP's unveiling of Vanguard, its pilot subsea human habitat at The Hangar at Regatta Harbour on October 29, 2025 in Miami, Florida. (
JASON KOERNER/GETTY IMAGES FOR DEEP

Update: We amended the name of Deep’s spokesperson

Here’s why we don’t have a cold vaccine. Yet.

31 October 2025 at 05:00

For those of us in the Northern Hemisphere, it’s the season of the sniffles. As the weather turns, we’re all spending more time indoors. The kids have been back at school for a couple of months. And cold germs are everywhere.

My youngest started school this year, and along with artwork and seedlings, she has also been bringing home lots of lovely bugs to share with the rest of her family. As she coughed directly into my face for what felt like the hundredth time, I started to wonder if there was anything I could do to stop this endless cycle of winter illnesses. We all got our flu jabs a month ago. Why couldn’t we get a vaccine to protect us against the common cold, too?

Scientists have been working on this for decades. It turns out that creating a cold vaccine is hard. Really hard.

But not impossible. There’s still hope. Let me explain.

Technically, colds are infections that affect your nose and throat, causing symptoms like sneezing, coughing, and generally feeling like garbage. Unlike some other infections,—covid-19, for example—they aren’t defined by the specific virus that causes them.

That’s because there are a lot of viruses that cause colds, including rhinoviruses, adenoviruses, and even seasonal coronaviruses (they don’t all cause covid!). Within those virus families, there are many different variants.

Take rhinoviruses, for example. These viruses are thought to be behind most colds. They’re human viruses—over the course of evolution, they have become perfectly adapted to infecting us, rapidly multiplying in our noses and airways to make us sick. There are around 180 rhinovirus variants, says Gary McLean, a molecular immunologist at Imperial College London in the UK.

Once you factor in the other cold-causing viruses, there are around 280 variants all told. That’s 280 suspects behind the cough that my daughter sprayed into my face. It’s going to be really hard to make a vaccine that will offer protection against all of them.

The second challenge lies in the prevalence of those variants.

Scientists tailor flu and covid vaccines to whatever strain happens to be circulating. Months before flu season starts, the World Health Organization advises countries on which strains their vaccines should protect against. Early recommendations for the Northern Hemisphere can be based on which strains seem to be dominant in the Southern Hemisphere, and vice versa.

That approach wouldn’t work for the common cold, because all those hundreds of variants are circulating all the time, says McLean.

That’s not to say that people haven’t tried to make a cold vaccine. There was a flurry of interest in the 1960s and ’70s, when scientists made valiant efforts to develop vaccines for the common cold. Sadly, they all failed. And we haven’t made much progress since then.

In 2022, a team of researchers reviewed all the research that had been published up to that year. They only identified one clinical trial—and it was conducted back in 1965.

Interest has certainly died down since then, too. Some question whether a cold vaccine is even worth the effort. After all, most colds don’t require much in the way of treatment and don’t last more than a week or two. There are many, many more dangerous viruses out there we could be focusing on.

And while cold viruses do mutate and evolve, no one really expects them to cause the next pandemic, says McLean. They’ve evolved to cause mild disease in humans—something they’ve been doing successfully for a long, long time. Flu viruses—which can cause serious illness, disability, or even death—pose a much bigger risk, so they probably deserve more attention.

But colds are still irritating, disruptive, and potentially harmful. Rhinoviruses are considered to be the leading cause of human infectious disease. They can cause pneumonia in children and older adults. And once you add up doctor visits, medication, and missed work, the economic cost of colds is pretty hefty: a 2003 study put it at $40 billion per year for the US alone.

So it’s reassuring that we needn’t abandon all hope: Some scientists are making progress! McLean and his colleagues are working on ways to prepare the immune systems of people with asthma and lung diseases to potentially protect them from cold viruses. And a team at Emory University has developed a vaccine that appears to protect monkeys from around a third of rhinoviruses.

There’s still a long way to go. Don’t expect a cold vaccine to materialize in the next five years, at least. “We’re not quite there yet,” says Michael Boeckh, an infectious-disease researcher at Fred Hutch Cancer Center in Seattle, Washington. “But will it at some point happen? Possibly.”

At the end of our Zoom call, perhaps after reading the disappointed expression on my sniffling, cold-riddled face (yes, I did end up catching my daughter’s cold), McLean told me he hoped he was “positive enough.” He admitted that he used to be more optimistic about a cold vaccine. But he hasn’t given up hope. He’s even running a trial of a potential new vaccine in people, although he wouldn’t reveal the details.

“It could be done,” he said.

This article first appeared in The Checkup, MIT Technology Review’s weekly biotech newsletter. To receive it in your inbox every Thursday, and read articles like this first, sign up here.

End of the...

By: hoek
31 December 2022 at 05:44

…year :P

Howdy!

As this is the last day in 2022, I decided to share some thoughts, about my website, projects, upcoming ideas, even about myself and also, to say sorry that there was no post last month. No worries dear readers, I do not close the website or end sharing what I know, this is no “good bye” article. I just had to take

End of the...

By: hoek
31 December 2022 at 05:44

…year :P

Howdy!

As this is the last day in 2022, I decided to share some thoughts, about my website, projects, upcoming ideas, even about myself and also, to say sorry that there was no post last month. No worries dear readers, I do not close the website or end sharing what I know, this is no “good bye” article. I just had to take

❌
❌