Researchers from MIT, Northeastern University, and Meta recently released a paper suggesting that large language models (LLMs) similar to those that power ChatGPT may sometimes prioritize sentence structure over meaning when answering questions. The findings reveal a weakness in how these models process instructions that may shed light on why some prompt injection or jailbreaking approaches work, though the researchers caution their analysis of some production models remains speculative since training data details of prominent commercial AI models are not publicly available.
The team, led by Chantal Shaib and Vinith M. Suriyakumar, tested this by asking models questions with preserved grammatical patterns but nonsensical words. For example, when prompted with “Quickly sit Paris clouded?” (mimicking the structure of “Where is Paris located?”), models still answered “France.”
This suggests models absorb both meaning and syntactic patterns, but can overrely on structural shortcuts when they strongly correlate with specific domains in training data, which sometimes allows patterns to override semantic understanding in edge cases. The team plans to present these findings at NeurIPS later this month.
The Allen Institute for AI (Ai2) released a new generation of its flagship large language models, designed to compete more squarely with industry and academic heavyweights.
The Seattle-based nonprofit unveiled Olmo 3, a collection of open language models that it says outperforms fully open models such as Stanford’s Marin and commercial open-weight models like Meta’s Llama 3.1.
Earlier versions of Olmo were framed mainly as scientific tools for understanding how AI models are built. With Olmo 3, Ai2 is expanding its focus, positioning the models as powerful, efficient, and transparent systems suitable for real-world use, including commercial applications.
“Olmo 3 proves that openness and performance can advance together,” said Ali Farhadi, the Ai2 CEO, in a press release Thursday morning announcing the new models.
It’s part of a broader evolution in the AI world. Over the past year, increasingly powerful open models from companies and universities — including Meta, DeepSeek, Qwen, and Stanford — have started to rival the performance of proprietary systems from big tech companies.
Many of the latest open models are designed to show their reasoning step-by-step — commonly called “thinking” models — which has become a key benchmark in the field.
Ai2 is releasing Olmo 3 in multiple versions: Olmo 3 Base (the core foundation model); Olmo 3 Instruct (tuned to follow user directions); Olmo 3 Think (designed to show more explicit reasoning); and Olmo 3 RL Zero (an experimental model trained with reinforcement learning).
Open models have been gaining traction with startups and businesses that want more control over costs and data, along with clearer visibility into how the technology works.
Ai2 is going further by releasing the full “model flow” behind Olmo 3 — a set of snapshots showing how the model progressed through each stage of training. In addition, an updated OlmoTrace tool will let researchers link a model’s reasoning steps back to the specific data and training decisions that influenced them.
In terms of energy and cost efficiency, Ai2 says the new Olmo base model is 2.5 times more efficient to train than Meta’s Llama 3.1 (based on GPU-hours per token, comparing Olmo 3 Base to Meta’s 8B post-trained model). Much of this gain comes from training Olmo 3 on far fewer tokens than comparable systems, in some cases six times fewer than rival models.
Among other improvements, Ai2 says Olmo 3 can read or analyze much longer documents at once, with support for inputs up to 65,000 tokens, about the length of a short book chapter.
Founded in 2014 by the late Microsoft co-founder Paul Allen, Ai2 has long operated as a research-focused nonprofit, developing open-source tools and models while bigger commercial labs dominated the spotlight. The institute has made a series of moves this year to elevate its profile while preserving its mission of developing AI to solve the world’s biggest problems.
In August, Ai2 was selected by the National Science Foundation and Nvidia for a landmark $152 million initiative to build fully open multimodal AI models for scientific research, positioning the institute to serve as a key contributor to the nation’s AI backbone.
It also serves as the key technical partner for the Cancer AI Alliance, helping Fred Hutch and other top U.S. cancer centers train AI models on clinical data without exposing patient records.
Magdalena Balazinska, director of the UW Allen School of Computer Science & Engineering, opens the school’s annual research showcase Wednesday in Seattle. (GeekWire Photo / Todd Bishop)
The University of Washington’s Paul G. Allen School of Computer Science & Engineering is reframing what it means for its research to change the world.
In unveiling six “Grand Challenges” at its annual Research Showcase and Open House in Seattle on Wednesday, the Allen School’s leaders described a blueprint for technology that protects privacy, supports mental health, broadens accessibility, earns public trust, and sustains people and the planet.
The idea is to “organize ourselves into some more specific grand challenges that we can tackle together to have an even greater impact,” said Magdalena Balazinska, director of the Allen School and a UW computer science professor, opening the school’s annual Research Showcase and Open House.
Here are the six grand challenges:
Anticipate and address security, privacy, and safety issues as tech permeates society.
Make high-quality cognitive and mental health support available to all.
Design technology to be accessible at its inception — not as an add-on.
Design AI in a way that is transparent and equally beneficial to all.
Build systems that can be trusted to do exactly what we want them to do, every time.
Create technologies that sustain people and the planet.
Balazinska explained that the list draws on the strengths and interests of its faculty, who now number more than 90, including 74 on the tenure track.
With total enrollment of about 2,900 students, last year the Allen School graduated more than 600 undergrads, 150 master’s students, and 50 Ph.D. students.
The Allen School has grown so large that subfields like systems and NLP (natural language processing) risk becoming isolated “mini departments,” said Shwetak Patel, a University of Washington computer science professor. The Grand Challenges initiative emerged as a bottom-up effort to reconnect these groups around shared, human-centered problems.
Patel said the initiative also encourages collaborations on campus beyond the computer science school, citing examples like fetal heart rate monitoring with UW Medicine.
A serial entrepreneur and 2011 MacArthur Fellow, Patel recalled that when he joined UW 18 years ago, his applied and entrepreneurial focus was seen as unconventional. Now it’s central to the school’s direction. The grand challenges initiative is “music to my ears,” Patel said.
In tackling these challenges, the Allen School has a unique advantage against many other computer science schools. Eighteen faculty members currently hold what’s known as “concurrent engagements” — formally splitting time between the Allen School and companies and organizations such as Google, Meta, Microsoft, and the Allen Institute for AI (Ai2).
University of Washington computer science professor Shwetak Patel at the Paul G. Allen School’s annual research showcase and open house. (GeekWire Photo / Taylor Soper)
This is a “superpower” for the Allen School, said Patel, who has a concurrent engagement at Google. These arrangements, he explained, give faculty and students access to data, computing resources, and real-world challenges by working directly with companies developing the most advanced AI systems.
“A lot of the problems we’re trying to solve, you cannot solve them just at the university,” Patel said, pointing to examples such as open-source foundation models and AI for mental-health research that depend on large-scale resources unavailable in academia alone.
These roles can also stretch professors thin. “When somebody’s split, there’s only so much mental energy you can put into the university,” Patel said. Many of those faculty members teach just one or two courses a year, requiring the school to rely more on lecturers and teaching faculty.
Still, he said, the benefits outweigh the costs. “I’d rather have 50% of somebody than 0% of somebody, and we’ll make it work,” he said. “That’s been our strategy.”
The Madrona Prize, an annual award presented at the event by the Seattle-based venture capital firm, went to a project called “Enhancing Personalized Multi-Turn Dialogue with Curiosity Reward.” The system makes AI chatbots more personal by giving them a “curiosity reward,” motivating the AI to actively learn about a user’s traits during a conversation to create more personalized interactions.
On the subject of industry collaborations, the lead researcher on the prize-winning project, UW Ph.D. student Yanming Wan, conducted the research while working as an intern at Google DeepMind. (See full list of winners and runners-up below.)
At the evening poster session, graduate students filled the rooms to showcase their latest projects — including new advances in artificial intelligence for speech, language, and accessibility.
DopFone: Doppler-based fetal heart rate monitoring using commodity smartphones
Poojita Garg, a second-year PhD student.
DopFone transforms phones into fetal heart rate monitors. It uses the phone speaker to transmit a continuous sine wave and uses the microphone to record the reflections. It then processes the audio recordings to estimate fetal heart rate. It aims to be an alternative to doppler ultrasounds that require trained staff, which aren’t practical for frequent remote use.
“The major impact would be in the rural, remote and low-resource settings where access to such maternity care is less — also called maternity care deserts,” said Poojita Garg, a second-year PhD student.
CourseSLM: A Chatbot Tool for Supporting Instructors and Classroom Learning
Marquiese Garrett, a sophomore at the UW.
This custom-built chatbot is designed to help students stay focused and build real understanding rather than relying on quick shortcuts. The system uses built-in guardrails to keep learners on task and counter the distractions and over-dependence that can come with general large language models.
Running locally on school devices, the chatbot helps protect student data and ensures access even without Wi-Fi.
“We’re focused on making sure students have access to technology, and know how to use it properly and safely,” said Marquiese Garrett, a sophomore at the UW.
Efficient serving of SpeechLMs with VoxServe
Keisuke Kamahori, a third-year PhD student at the Allen School.
VoxServe makes speech-language models run more efficiently. It uses a standardized abstraction layer and interface that allows many different models to run through a single system. Its key innovation is a custom scheduling algorithm that optimizes performance depending on the use case.
The approach makes speech-based AI systems faster, cheaper, and easier to deploy, paving the way for real-time voice assistants and other next-gen speech applications.
“I thought it would be beneficial if we can provide this sort of open-source system that people can use,” said Keisuke Kamahori, third-year Ph.D. student at the Allen School.
ConvFill: Model collaboration for responsive conversational voice agents
Zachary Englhardt (left), a fourth-year PhD student, and Vidya Srinivas, a third-year PhD student.
ConvFill is a lightweight conversational model designed to reduce the delay in voice-based large language models. The system responds quickly with short, initial answers, then fills in more detailed information as larger models complete their processing.
By combining small and large models in this way, ConvFill delivers faster responses while conserving tokens and improving efficiency — an important step toward more natural, low-latency conversational AI.
“This is an exciting way to think about how we can combine systems together to get the best of both worlds,” said Zachary Englhardt, a third-year Ph.D. student. “It’s an exciting way to look at problems.”
ConsumerBench: Benchmarking generative AI on end-user devices
Yile Gu, a third-year PhD student at the Allen School.
Running generative AI locally — on laptops, phones, or other personal hardware — introduces new system-level challenges in fairness, efficiency, and scheduling.
ConsumerBench is a benchmarking framework that tests how well generative AI applications perform on consumer hardware when multiple AI models run at the same time. The open-source tool helps researchers identify bottlenecks and improve performance on consumer devices.
There are a number of benefits to running models locally: “There are privacy purposes — a user can ask for questions related to email or private content, and they can do it efficiently and accurately,” said Yile Gu, a third-year Ph.D. student at the Allen School.
Designing Chatbots for Sensitive Health Contexts: Lessons from Contraceptive Care in Kenyan Pharmacies
Lisa Orii, a fifth-year Ph.D. student at the Allen School.
A project aimed at improving contraceptive access and guidance for adolescent girls and young women in Kenya by integrating low-fidelity chatbots into healthcare settings. The goal is to understand how chatbots can support private, informed conversations and work effectively within pharmacies.
“The fuel behind this whole project is that my team is really interested in improving health outcomes for vulnerable populations,” said Lisa Orii, a fifth-year Ph.D. student.
See more about the research showcase here. Here’s the list of winning projects.
Runner up: “VAMOS: A Hierarchical Vision-Language-Action Model for Capability-Modulated and Steerable Navigation” Mateo Guaman Castro, Sidharth Rajagopal, Daniel Gorbatov, Matt Schmittle, Rohan Baijal, Octi Zhang, Rosario Scalise, Sidharth Talia, Emma Romig, Celso de Melo, Byron Boots, Abhishek Gupta
Runner up: “Dynamic 6DOF VR reconstruction from monocular videos” Baback Elmieh, Steve Seitz, Ira-Kemelmacher, Brian Curless
People’s Choice: “MolmoAct” Jason Lee, Jiafei Duan, Haoquan Fang, Yuquan Deng, Shuo Liu, Boyang Li, Bohan Fang, Jieyu Zhang, Yi Ru Wang, Sangho Lee, Winson Han, Wilbert Pumacay, Angelica Wu, Rose Hendrix, Karen Farley, Eli VanderBilt, Ali Farhadi, Dieter Fox, Ranjay Krishna
Editor’s Note: The University of Washington underwrites GeekWire’s coverage of artificial intelligence. Content is under the sole discretion of the GeekWire editorial team. Learn more about underwritten content on GeekWire.