When hackers weaponize AI, the rules of cyber defense change overnight
Interview transcript
Terry Gerton Anthropic says Chinese hackers used its Claude chatbot to automate a cyber espionage campaign against tech firms, financial institutions, and government agencies, marking what could be the first large-scale AI-driven attack. Joining me to explain what this means for defenders and what’s next for nation-state tactics is the former department head of AI security at MITRE and co-founder of AI security consulting firm Fire Mountain Labs, Dr. Josh Harguess. Dr. Harguess, thank you for joining me.
Josh Harguess Thank you very much for having me.
Terry Gerton We’re going to talk about something that really made the news not too long ago. Anthropic said that its AI tool Claude was used by Chinese hackers with minimal human intervention to launch a cyber espionage campaign. Can you tell us more about what really happened?
Josh Harguess Yeah, I can. So a really nice report that they lay it out, really detailed, but some of the high marks. So they used Claude code to do this. So this is Anthropic’s own tool that allows you to sort of, you know, vibe code as it were, and create a code that was able to do these exploits. And by create, what I mean is they were able to execute something like 80 to 90% of these operations completely independently from human interaction. That 10 to 20% was sort of like human-on-the-loop, human-in-the-loop verifying, validating some of the things that came back from Claude code, like hallucinations things that weren’t actually real, but there were plenty of exploits that were real, that were able to execute without any human intervention. And they really did this by doing these safeguard bypasses, things like social engineering, so doing prompt injection techniques, these kinds of things, convincing the tools that they were using. That everything that they’re doing was on the up and up. You know, no issues, don’t worry about what we’re trying to do. This is, you know, we’re security professionals. We’re trying to secure our own infrastructure, our own networks. So you’re doing us a service by providing this code to us. And yeah, the breach was, this was a campaign. I think they started to notice this maybe back in September. So they were able to kind of follow this campaign and eventually disrupt it.
Terry Gerton You mentioned some things like prompt injection and social engineering. Tell us a little bit more about how those were able to bypass Claude’s safety guardrails and what that means for the average person who might be a victim of some of this approach.
Josh Harguess Yeah, absolutely. So this has been around since ChatGPT Was released, so essentially these are ways to convince the model to do things that it’s not supposed to do. So particularly over the past two or three years, OpenAI, Anthropic, Google, they’ve spent a ton of money on trying to build in these safeguards so that you can’t get instructions for how to make a nuclear weapon or how to do other nefarious things that these models, they don’t want you to be able to do. However, there are ways around this and we’re seeing this even with today’s models. It’s more sophisticated. It’s not as easy as some of the early days where you just say the word poem infinitely and then it spits out user data. So now you do have to dig a little deeper. You have to do what’s called maybe crescendo attacks. You have to, you know, sort of aggregate different attacks together in order for this to be successful. But these models are all susceptible at some point to this kind of prompt injection technique.
Terry Gerton It sounds like you and others with your expertise might have been expecting something like this to come along. It’s not just something that woke up overnight.
Josh Harguess Definitely. Absolutely. We’ve been sounding the warning alarms about this for many years So it’s very well known and really that awareness piece is number one. I mean, I think a lot of people, to your point, are going to be very surprised that this is even possible
Terry Gerton So tell us more about how that 10 or 20% of human interaction played in with the cyber attack.
Josh Harguess Yeah, definitely. So, yeah, it’s interesting. So we’re not at the place where you can just say, go execute all of this for me, actually execute the campaign, you know. Get back to me in a few days when you’ve actually been able to recover these credentials and get into accounts and all that kind of thing. So right now we’re still at the phase where, you know, there’s a certain amount of trust for these models to do something that you tell it to do. And this is in, you know, your everyday task. If you’ve ever tried to write a summary of an article or something like that, you know, you’re not always gonna get 100% of what you’re expecting out of these models. So there’s some amount of human-on-the-loop or human-in-the-loop to kind of validate and verify what you are getting back. And this is no different. So, you know some of the things that came back were, you know, code that didn’t run or code that actually didn’t do the task that they were trying to do. So it was very much breaking the problem up into smaller pieces. Executing those small pieces, validating that they worked, and so on.
Terry Gerton I’m speaking with Dr. Josh Harguess. He’s the former department head for AI security at MITRE and co-founder of the AI security consulting firm, Fire Mountain Labs. The organizations that were targeted in this cyber attack, everything from financial institutions to tech companies to chemical manufacturers and government agencies, these might be the folks that you would expect to have the most resilient defense. How did they fare?
Josh Harguess Yeah, I mean, it’s difficult to protect yourself against the unknown, right? So I think a lot of these organizations, like you mentioned, they will be doing their best to protect themselves against kind of known adversaries. So they’re protecting their data, they’re protection their identities, these sorts of things, but they’ve never seen something this sophisticated kind of come at their infrastructure, and so quickly. So I think a lot times in the past if they saw a campaign like this, and it was human operated. They would be able to sort of see the signals and be able to react. In this case, the campaign was so fast acting that they weren’t able to react in time. And I think that’s really the escalation of these types of attacks.
Terry Gerton What does that mean for AI defense or hacking defense going forward? If the hacker is AI powered and can adjust so quickly, how can the defenders meet that kind of attack?
Josh Harguess Absolutely, so we’re definitely getting into that space that we were fearing in the beginning where we’re going to have to use AI tools to help us defend against these kinds of attacks. And not just these types of attacks, probably all attacks. The same exact thing is going to happen though where you’re going want to defend yourself against attacks using AI, however, that AI may not do exactly what you expect it to do all the time. So it’s the same kind of back and forth. You’re going to need a human-in-the-loop, human-on-the-loop to sort of validate, verify these defenses, check in, make sure that they’re operating as they should. These are the kinds of things that we do as a consultancy, help folks through this. So, you know, how do you secure your own AI systems? That’s a big question mark. That’s what we help people through. And you have to be able to secure your own AI systems before you can use them for these types of defenses.
Terry Gerton If AI is attacking and AI is defending and all of that is happening at machine speed, what is the risk — what are the vulnerabilities there and what is risk of escalation?
Josh Harguess Yeah, absolutely. So same as we kind of talked about earlier, you have to break the problem down into kind of consumable pieces. So look at your entire ecosystem of defense. Where can you instantiate AI? One place that’s really obvious is attacking yourself. So firming up your own defenses by pretending you’re an adversary. So red teaming your own system using these tools before someone else on the outside does that.
Terry Gerton As you look forward, what does this mean in terms of nation-state relations? How do nations prepare and how does this change the threat landscape?
Josh Harguess Yeah, certainly. So there’s multiple ways of looking at the changing of the threat landscape. My co-founder likes to talk about this in terms of these three words, intent, opportunity, and capability. Intent, that’s not really going to change. There’s always going to be bad actors that are trying to do nefarious things. Opportunity, that’s certainly expanding. So as these AI models and these AI agents kind of dig deeper into our digital infrastructures, we have new avenues for exploitation. So, you know, in this case, it was the social engineering way of getting into the models. There’s going to be other ways of getting in in the future that we’re not aware of yet. And then capability, that’s really the big one here. You know, AI is the force multiplier in this case, and that’s what we need to be utilizing, but also securing for our own systems.
The post When hackers weaponize AI, the rules of cyber defense change overnight first appeared on Federal News Network.

© Getty Images/iStockphoto/Urupong