Reading view

There are new articles available, click to refresh the page.

Virginia Tech and Amazon Web Services are teaming up to train the next generation of national security leaders in generative AI

Interview transcript: 

Terry Gerton Virginia Tech has just launched a generative AI training program. Tell us about what the program is and why you decided to start it now.

Jamie Cogbill Okay, great. Well, this partnership between Virginia Tech and Amazon Web Services is really about preparing the next generation of national security leaders for an AI-driven world. As one of our nation’s six senior military colleges, Virginia Tech has had the chance to pilot AWS’s new generative AI training, which is the first of its kind, before it’s rolled out to nationwide, or at least to the other senior military colleges. It directly supports the recent White House call to make senior military colleges hubs of AI research and talent development. And our cadets are already finding it incredibly valuable training as they prepare to lead in a defense environment that’s rapidly being transformed by artificial intelligence.

Terry Gerton There’s a lot of AI courses out there. What sets this collaboration apart? What’s unique in terms of content or focus or tools?

Jamie Cogbill Okay. Well, this is the first generative AI training program of its kind offered specifically at senior military colleges. And it directly supports the recent White House AI Action Plan, which was released last July, which calls on senior military colleges to become hubs of AI talent and innovation. And our cadets are getting hands-on experience with the same AI tools and problem solving approaches that are being used in real defense and intelligence missions.

Terry Gerton You mentioned cadets a couple of times here. For folks who may not know that Virginia Tech has a Corps of Cadets, tell us a little bit about that and how many cadets are actually taking the course.

Jamie Cogbill Okay. So yes, Virginia Tech is, as I mentioned, is one of six senior military colleges, which means that they have a Corps of Cadets, just like Virginia Military Institute or the Citadel, or our closest comparison is Texas A&M. There’s currently close to 1,400 cadets in the Corps of Cadets at Virginia Tech. But for this first pilot, it was offered to a total of about 75 students, and the intent was that at least half of them be cadets. And in this case, it was. We had about 38 total cadets that participated in the program.

Terry Gerton And who filled the other seats?

Jamie Cogbill The other seats were mostly people who are affiliated with Virginia Tech’s National Security Institute, which is a hub for defense-related research, but also for preparing future national security leaders here at Virginia Tech. And so the advertisement went out to both cadets and to the students who are affiliated with the Virginia Tech’s National Security Institute.

Terry Gerton It sounds like you didn’t have any trouble filling the seats. What does that tell you about the interest in this topic from future military and civilian defense leaders?

Jamie Cogbill There’s definitely a huge interest and our cadets who I talked to after the training just found it to be very valuable for them with just learning about AI in general, because they know it’s going to be an important part of their future careers, but also learning how to use it more effectively through effective prompt engineering and other methods that they learned throughout the training.

Terry Gerton Talk to us about some of the specific defense AI applications that you’re covering in this course. We all think about Chat GPT and Copilot, but how are those topics specifically coming across in defense-related issues?

Jamie Cogbill That’s a great question. And I don’t know the exact answer to that, but I can say that it’s teaching the core Amazon Gen AI services, which is something they call Amazon Bedrock, which Department of Defense has partnered with Amazon Web Services in a lot of ways, so it’s likely already using some of these AWS services. And so some of the people who are participating in the training will likely go into defense- or national security-related careers and already be expected to use or quickly learn how to use AWS software and AI tools. But I think the big takeaway is just learning AI in general, which is clearly going to be part of their future in national security and defense.

Terry Gerton I’m speaking with Jamie Cogbill. He’s the deputy director of the Defense Civilian Training Corps at Virginia Tech’s National Security Institute. Well, we talk a lot about AI on this program and all of its different applications. One thing we do know about it is it’s powerful but it’s also risky. So in this kind of training, how are you preparing students not just to use the tools, but to really lead responsibly with AI when the risks could be pretty high?

Jamie Cogbill I don’t have specifics about how this training addressed those kind of risks. I haven’t taken the course myself. It was Amazon Web Services who provided it. Talking to my cadets, I think it was a pretty intense curriculum. They did have two different instructor-led sessions, both four hour sessions, and each session was about three hours of content and an hour lab. And then they had a final competitive kind of gamified lab at the end. It was another four hour session where they practiced with real world challenges and in using AI. So I would assume that some of the training in the instructor-led portions was related to the risks of using AI, how to avoid hallucinations that AI can provide. And but also, in the Department of Defense, a key thing is ensuring the use of responsible AI, or RAI as they call it. And so I imagine that was also covered in the curriculum.

Terry Gerton This is cohort one this fall, first time you’ve rolled out the course. What do you think happens next? Where does it go from here?

Jamie Cogbill So we’re hoping that, and this is partially up to Amazon Web Services, but AWS is actively exploring how to scale the program for our spring semester here at Virginia Tech, potentially bringing it back in the spring, but also for 2026 in general. AWS originally intended to expand this training to all six senior military colleges across the country. And I think the success here at Virginia Tech with the pilot proved that our cadets and probably other cadets across the nation are eager to learn and ready to lead in the AI space. And we’re hoping that it set the standard for what other programs could look like.

Terry Gerton Well, always in a pilot there are lots of lessons learned in the process. What do you at Virginia Tech and Amazon take away in terms of needing to improve or broaden the program as you tried it out?

Jamie Cogbill Well, I think as you mentioned earlier, I think the demand is there. So if we can scale it up even here at Virginia Tech and and offer it to more than just 75 cadets and students. But I think that the big takeaway is really that partnerships like this are essential. And AI is changing the nature of national security. And we need to ensure our future military and civilian leaders can lead confidently in that environment. And I think this program shows how academia, industry and government can come together to make that happen.

Terry Gerton AI is such a fast changing space. How do you imagine that the curriculum might have to adjust even from one semester to the next just to stay current?

Jamie Cogbill Absolutely. And I’m sure the folks at AWS are right there on the cusp of all that change. And so my guess is that they are constantly updating their curriculum to keep pace with that.

Terry Gerton Are you hearing from senior leaders in the Department of Defense about how they view the program and what their hopes for it are?

Jamie Cogbill So far, no, not directly. My guess is at the senior levels at AWS, they are talking to senior leaders in the Department of Defense and potentially at the most senior levels of our government, since it was a key goal of the White House AI Action Plan to offer this type of training.

The post Virginia Tech and Amazon Web Services are teaming up to train the next generation of national security leaders in generative AI first appeared on Federal News Network.

© Federal News Network

The hot new thing at AWS re:Invent has nothing to do with AI

AWS CEO Matt Garman unveils the crowd-pleasing Database Savings Plans with just two seconds remaining on the “lightning round” shot clock at the end of his re:Invent keynote Tuesday morning. (GeekWire Photo / Todd Bishop)

LAS VEGAS — After spending nearly two hours trying to impress the crowd with new LLMs, advanced AI chips, and autonomous agents, Amazon Web Services CEO Matt Garman showed that the quickest way to a developer’s heart isn’t a neural network. It’s a discount.

One of the loudest cheers at the AWS re:Invent keynote Tuesday was for Database Savings Plans, a mundane but much-needed update that promises to cut bills by up to 35% across database services like Aurora, RDS, and DynamoDB in exchange for a one-year commitment.

The reaction illustrated a familiar tension for cloud customers: Even as tech giants introduce increasingly sophisticated AI tools, many companies and developers are still wrestling with the basic challenge of managing costs for core services.

The new savings plans address the issue by offering flexibility that didn’t exist before, letting developers switch database engines or move regions without losing their discount. 

“AWS Database Savings Plans: Six Years of Complaining Finally Pays Off,” is the headline from the charmingly sardonic and reliably snarky Corey Quinn of Last Week in AWS, who specializes in reducing AWS bills as the chief cloud economist at Duckbill.

Quinn called the new “better than it has any right to be” because it covers a wider range of services than expected, but he pointed out several key drawbacks: the plans are limited to one-year terms (meaning you can’t lock in bigger savings for three years), they exclude older instance generations, and they do not apply to storage or backup costs.

He also cited the lack of EC2 (Elastic Cloud Compute) coverage, calling the inability to move spending between computing and databases a missed opportunity for flexibility.

But the database pricing wasn’t the only basic upgrade to get a big reaction. For example, the crowd also cheered loudly for Lambda durable functions, a feature that lets serverless code pause and wait for long-running background tasks without failing.

Garman made these announcements as part of a new re:Invent gimmick: a 10-minute sprint through 25 non-AI product launches, complete with an on-stage shot clock. The bit was a nod to the breadth of AWS, and to the fact that not everyone in the audience came for AI news.

He announced the Database Savings Plans in the final seconds, as the clock ticked down to zero. And based on the way he set it up, Garman knew it was going to be a hit — describing it as “one last thing that I think all of you are going to love.”

Judging by the cheers, at least, he was right.

AWS, 미 동부 리전 장애 대비해 DNS 복원력 기능 강화

AWS가 미국 버지니아 북부에 위치한 미 동부 리전의 안정성을 강화하고 서비스 중단을 줄이기 위한 새로운 DNS(Domain Name Service) 복원력 기능을 도입했다.

지난 10월, AWS 미 동부 리전에서는 DNS 장애로 다이나모DB API가 불안정해지면서 70종이 넘는 서비스가 광범위하게 영향을 받았다. 이로 인해 다수 고객사의 서비스가 수 시간 동안 중단됐고, AWS는 결국 DNS를 수동으로 복구해야 했다.

서비스가 완전히 정상화되는 데는 더 많은 시간이 소요됐다. 네트워크 구성 지연과 누적된 작업 처리가 뒤따랐기 때문이다.

AWS는 이번에 도입한 DNS 복원력 기능이 ‘공용 DNS 레코드 관리를 위한 신속 복구 기능(Accelerated recovery for managing public DNS records)’라는 이름으로 제공되며, 10월 장애를 촉발한 문제와 같은 DNS 관련 이슈를 해결하기 위해 설계됐다고 설명했다.

이 기능은 사람이 이해하기 쉬운 도메인 이름을 숫자로 된 IP 주소로 변환해 시스템 간 통신을 돕는 AWS의 클라우드 기반 웹서비스 라우트(Route) 53에 추가됐다. AWS는 26일 블로그를 통해 이 기능이 향후 장애 발생 시 복구 목표 시간(RTO)을 60분으로 보장하도록 설계됐다고 밝혔다.

AWS는 “이번 기능 강화로 고객은 리전 장애 상황에서도 DNS 변경과 인프라 프로비저닝을 계속 진행할 수 있어, 미션 크리티컬한 애플리케이션 운영의 예측성과 복원력을 높일 수 있다”라고 전했다.

데이터 계층과 제어 계층의 차이

AWS가 겪어 온 DNS 문제는 주로 트래픽 방향을 결정하는 관리 계층인 제어계층에 영향을 주는 경우가 많았으며, 실제 DNS 질의를 목적지까지 전달하는 데이터계층에는 문제가 발생하지 않는 경우가 일반적이었다.

HFS리서치의 부문 책임자 악샤트 티야기는 “AWS에서 큰 장애가 발생할 때 DNS 데이터 계층은 대체로 정상적으로 유지된다. 즉 인프라 자체는 계속 작동하지만, 미 동부의 제어계층이 멈추면 DNS를 제때 갱신해 트래픽을 우회할 수 없게 되고, 그 지점이 실제 장애가 되는 것”이라고 설명했다.

티야기는 이어 “이번에 추가된 기능은 그 빈틈을 보완하려는 것”이라며 “여러 리전에 걸쳐 강화된 제어 경로를 제공해 ‘ChangeResourceRecordSets’와 같은 핵심 API가 보장된 60분 복구 시간 내에 항상 사용 가능하도록 한다. 이를 통해 기업은 AWS의 복구를 기다리지 않고도 백업 리전으로 사용자 트래픽을 돌리거나, 대기 엔드포인트로 전환하거나, 재해복구 환경으로 즉시 전환할 수 있다”라고 전했다.

미 동부 리전, AWS의 구조적 병목으로 지적돼

미국 버지니아 북부에 위치한 AWS 미 동부 리전은 오랫동안 AWS 전체 아키텍처의 핵심 병목으로 꼽혀 왔다.

악샤트 티야기는 “AWS의 글로벌 서비스 상당수가 역사적으로 버지니아 북부 리전의 제어계층에 의존해 왔다. 이 리전이 흔들리면 전 세계가 그 여파를 고스란히 느낀다”고 말했다.

티야기 분석가는 이번 신규 기능이 여러 중요한 결함 중 하나를 개선하기는 했지만, 향후 발생할 장애의 영향을 완전히 막기에는 충분하지 않을 수 있다고 경고했다. 그는 “AWS가 핵심 API에 대해 더 강력한 교차 리전 장애 조치(failover)를 보장하고, 제어계층 책임을 여러 독립 리전에 분산하기 전까지 위험은 계속 남아 있다”고 설명했다.

티야기는 AWS가 향후 다중 리전 DNS 구성이나 제어계층 격리를 위한 더 구체적이고 일관된 설계 템플릿을 제공함으로써, 고객들이 장애 때마다 복잡한 아키텍처를 다시 구성해야 하는 부담을 줄일 수 있다고 조언했다.

DNS 복원력 경쟁에서 앞설 수도

이번 DNS 복원력 기능은 네트워크 장애를 계속 겪고 있는 다른 하이퍼스케일러와 비교해 AWS에 우위를 제공할 수 있다는 평가도 나온다.

티야기는 “애저, 구글클라우드, 클라우드플레어 모두 전세계 분산된 강력한 DNS 시스템을 운영하지만, 리전 장애 상황에서 DNS 제어계층 업데이트의 복구 시간을 명확히 보장하는 곳은 없다. 이 부분이 결정적 차이”라고 말했다. 그는 “이들 경쟁사는 DNS 질의 자체는 계속 처리된다고 보장하지만, 제어계층 장애가 발생했을 때 DNS 레코드를 얼마나 빨리 갱신할 수 있는지에 대해서는 구체적으로 밝히지 않는다”라고 덧붙였다.

AWS는 기업 고객의 다운타임을 줄이기 위한 기능을 꾸준히 강화하고 있다. 지난해 10월 장애 직후, AWS는 클라우드와치(CloudWatch)에 자동 사고 생성 기능을 추가한 바 있다.
dl-ciokorea@foundryco.com

How the AWS outage happened: Amazon blames rare software bug and ‘faulty automation’ for massive glitch

(GeekWire Photo / Todd Bishop)

A detailed explanation of this week’s Amazon Web Services outage, released Thursday morning, confirms that it wasn’t a hardware glitch or an outside attack but a complex, cascading failure triggered by a rare software bug in one of the company’s most critical systems.

The company said a “faulty automation” in its internal systems — two independent programs that began racing each other to update records — erased key network entries for its DynamoDB database service, triggering a domino effect that temporarily broke many other AWS tools.

AWS said it has turned off the flawed automation worldwide and will fix the bug before bringing it back online. The company also plans to add new safety checks and improve how quickly its systems recover if something similar happens again.

Amazon apologized and acknowledged the widespread disruption caused by the outage.

“While we have a strong track record of operating our services with the highest levels of availability, we know how critical our services are to our customers, their applications and end users, and their businesses,” the company said, promising to learn from the incident.

The outage began early Monday and impacted sites and online services around the world, again illustrating the internet’s deep reliance on Amazon’s cloud and showing how a single failure inside AWS can quickly ripple across the web.

Related: The AWS outage is a warning about the risks of digital dependance and AI infrastructure

Tech Moves: Allen Institute gets new exec; AWS leader shifts roles; NuScale names legal officer

Susan Kaech. (Allen Institute Photo)

Award-winning immunologist ​​Susan Kaech is the new executive vice president of the Allen Institute’s Immunology Moonshot, an initiative that aims to understand the immune system’s role in human health and disease.

Kaech currently leads the NOMIS Center for Immunobiology and Microbial Pathogenesis at the Salk Institute for Biological Studies and will join the Allen Institute in January.

“The appointment comes at a critical time in bioscience when the immune system is regarded as the cornerstone of all diseases and understanding its foundational principles is vital to unlocking new treatments and therapies,” the institute said in a statement.

Kaech’s research includes the investigation of how the immune system remembers infections to develop immunity, T-cell communications, and the role of metabolism in the immune system’s fight against cancer.

Arthur Valdez Jr. (LinkedIn Photo)

—  Seattle RFID company Impinj named Arthur Valdez Jr. to its board of directors.

Valdez recently left the role of executive VP of global supply chain and customer solutions at Starbucks and his career includes leadership roles at Amazon, Target and elsewhere.

“Arthur’s expertise transforming and optimizing strategic supply chain and logistics networks for large consumer-facing companies will be invaluable as we continue to advance our vision of connecting every thing,” said Impinj CEO Chris Diorio in a statement.

Jason Bennett. (LinkedIn Photo)

Jason Bennett has taken a new role at Amazon Web Services, shifting from VP of U.S. enterprise to VP of worldwide startups and venture capital. Bennett has been with the company for more than 17 years.

On LinkedIn Bennett shared his fondness for working with startups and said he was eager to return to a position serving that community.

“I’m energized by the opportunity to work alongside our teams to support a thriving startup ecosystem — from founders and VCs, to accelerators, and the broader innovation community,” he said, adding that the work “has a lasting impact on the direction of industries and the future of AI.”

James Canafax. (NuScale Photo)

NuScale Power named James Canafax as chief legal officer and corporate secretary. The Tigard, Ore.-based nuclear energy company is developing small modular reactors.

Canafax has decades of legal experience and joins NuScale from Maritime Partners. Past positions include executive leadership at BWX Technologies, which supplies nuclear components and services.

“[Canafax’s] extensive experience in the nuclear industry, deep familiarity with the regulatory environment and track record of guiding organizations through key growth periods make him uniquely suited to support NuScale at this important moment for our company,” CEO John Hopkins said in statement.

Elvis Dieguez. (symphonie Photo)

— Seattle entrepreneur Elvis Dieguez is now VP of data science, analytics and platforms for the healthcare startup hims & hers. Diegeuz joins the company from symphonie, a Seattle e-commerce marketing platform where he was CEO and co-founder. He was previously at Amazon for more than four years working in business analytics and as a senior manager.

Hims & hers offers a telehealth platform for conditions including sexual health, hair loss, mental health, skincare and weight loss.

“I look forward to leading and working with a ~70 person team who’ve been working hard to make the #healthcare system work for all Americans,” Dieguez said on LinkedIn.

Ariel Brumbaugh. (LinkedIn Photo)

— Biotech startup Synthesize Bio named Ariel Brumbaugh as senior director of business development. In the role, Brumbaugh will help the company partner with biopharma companies interested in using Synthesize’s AI-based research platform to accelerate and de-risk drug development.

Seattle’s Synthesize Bio was founded by leaders from Fred Hutchinson Cancer Center. Last month it announced $10 million in funding from Madrona.

Brumbaugh joined the startup from the San Francisco biotech company Gladstone Institutes.

Sophie Brougham is director of philanthropic operations for the recently launched Clean Economy Project. Nicknamed CleanEcon, the effort includes past employees of the Bill Gates-led Breakthrough Energy and is a policy and advocacy platform promoting clean power.

Prior to Breakthrough, Brougham was with the Paul Allen holding company Vulcan (now known as Vale Group) for more than a decade, where she was a senior manager and led programs including philanthropic and grants management.

— Seattle’s Jake Laes is now executive director of AI Tinkerers, a global network of AI engineers and builders. Laes joined the group from Deel, where he helped facilitate partnerships between investors and accelerator programs. Laes is the founder of YoungTech Seattle, and his background includes mentoring and leadership roles at the University of Washington’s CoMotion and Techstars.

Pranam Kolari, VP of search and recommendations at Coupang, is resigning from his role next month. Coupang is South Korea’s largest e-commerce platform and is headquartered in Seattle. Kolari, based in San Jose, Calif., was previously at Walmart Labs for nearly a decade where his roles included vice president of engineering for search.

Datavault AI appointed Pete Scobell as VP of global security. The Beaverton, Ore.-based company helps businesses monetize their data and create digital twins of physical objects. Scobell is a decorated U.S. Navy SEAL veteran and will oversee Datavault AI’s security operations, risk management and asset logistics.

Erin McHugh Saif, a former Massachusetts-based Microsoft executive, is CEO of an as-yet unnamed data and AI venture to serve “place-based partnerships,” which are networks of nonprofits, government agencies, and educational entities that aim to address education, jobs and housing needs.

“With better access to data, these organizations will leap ahead in this moment of AI transformation, gaining faster insight into which programs deliver the greatest improvement to significantly scale their impact,” Saif said on LinkedIn.

The effort has the support of the Ballmer Group, a philanthropic organization co-founded by former Microsoft CEO Steve Ballmer and his wife Connie, and the nonprofit TechSoup.

Karen Ng was promoted to executive VP of product at HubSpot. Ng has been with the company since 2022, joining as senior VP of product and partnerships. Past employers include Common Room, Google and Microsoft, where she was chief of staff across the company’s developer tools business. Ng is based in the Seattle area.

The AWS outage is a warning about the risks of digital dependance and AI infrastructure

The show floor at AWS re:Invent 2024 in Las Vegas. (GeekWire File Photo)

Unless you’ve been on a “digital cleanse” this week, you know that Amazon Web Services (AWS) had a major outage at the start of the week.

You know this because apps and sites you use were down. Credible reports estimate at least 1,000 sites and apps were affected. Large swaths of modern digital life went dark: from finance (Venmo and Robinhood) to gaming (Roblox and Fortnite) to communications (Signal and Slack). Some people couldn’t even get a good night’s sleep because the outage took out “smart beds.” Even sporting events were impacted when Ticketmaster failed.

We’ve seen outages before, but this one seemed broader and harder to ignore.

In the wake of the outage, many well-intentioned hot takes boiled down to: “They should’ve used more cloud providers.”

Setting aside the subtle victim-blaming, there’s also the fact that in a world with only three major cloud providers (AWS, Microsoft Azure, Google Cloud) if you want to “diversify” there’s not a lot of diversity out there.

And the argument for diversity in cloud providers is really about market diversity, not individual organizations juggling multiple vendors. More competition in the cloud market would mean fewer cascading failures when one provider goes down.

The key question when something like this happens is whether we’re taking the risk lessons and expanding them beyond the immediate problem to see the emerging problems. 

Instead of saying organizations need to have multiple cloud providers, we should be asking how we’re dealing with the reality of highly concentrated risks with exceptionally broad impact because we just had an object lesson in what that really means.

In this recent outage there’s a pointer to where we should be looking proactively to apply this lesson: generative AI. This recent AWS outage gives us two lessons for the emerging generative AI ecosystem.

Concentration crisis in AI

With the generative AI ecosystem, I’m talking not about chatbots — I mean AI-native applications that are built on generative AI as a platform. We just saw that when there’s no cloud, there’s no cloud-native application. Likewise, when there’s no generative AI provider, there’s no AI-native application.

The first lesson from the AWS outage for AI-native applications is what happens to an industry when there’s a limited number of providers for centralized resources and there’s an outage. We just saw: it has huge rippling effects across the industry and all walks of life built on it.

It’s a throwback to the mainframe era: when “the computer” is down, it’s down for everyone.

There are as few, if not fewer, generative AI providers as there are cloud providers. A major outage is inevitable — that’s just engineering reality. When that happens, every AI-native app built on that generative AI platform will also go down, full stop.

The impact could be even more severe than the AWS outage. It will be more like “the computer is down, and the people are gone” for many different industries and services. Ironically, the “smarter” the industry and service, the greater the potential fallout.

The second lesson is one of intertwined risk. OpenAI itself was affected by this week’s AWS outage. 

That means AI-native apps have double exposure to the risks around a limited number of providers for critical, centralized resources. For AI-native apps, it’s like the mainframe era squared. If the generative AI platform fails, everything built on it fails. And if the cloud that hosts the AI platform fails, it all goes down, too.

This is not to say don’t do cloud or don’t do AI. But it is to say we need to understand this new, complex intertwining of risks inherent in a world where everything is relying on a small number of key providers and that small number of key providers also rely on a small number of key providers.

The realities of physical requirements and capital investment required for cloud and generative AI make a truly diverse ecosystem impracticable for either. I don’t think anyone sees more than a literal handful of providers for either of these in the future. 

The bottom line

Highly concentrated risks with exceptionally broad impact aren’t going away anytime soon. 

But the growth of generative AI providers — and their reliance on cloud providers — show where there is going to be growth and where and what those risks will be. The growth will be upwards, as technologies stack on top of and rely on each other. And that means these risks are only going to become more concentrated and the impacts even broader.

In the world of security, there’s the “CIA” triad: “confidentiality”, “integrity” and “availability.” In the first days of “Trustworthy Computing” at Microsoft, the principles included “availability.” But in recent years, availability has been overlooked often as security and privacy concerns understandably dominate.

A thoughtful application of the AWS outage tells us that outages like this are a kind of problem that isn’t an anomaly: it’s inherent in the nature of today’s technology reality. And since there are no easy solutions and only increasingly complex problems around this, we need to start understanding this new reality and thinking seriously about how to mitigate these risks.

AWS outage affects Ticketmaster for pivotal Mariners vs. Blue Jays playoff game in Toronto

(Photo by appshunter.io on Unsplash)

The effects of the massive AWS outage reached the sports world on Monday.

Ticketmaster was dealing with ticket management issues as a result of the outage, according to messages shared by several sports teams hosting games on Monday, including the Toronto Blue Jays and Seattle Seahawks.

The Blue Jays, facing off against the Seattle Mariners in a Game 7 MLB playoff bout at Rogers Centre in Toronto, posted a statement earlier Monday about the outage and advised fans to “hold off on managing your tickets as we work through this.”

A few hours later, the team said ticket management was returning to normal.

>World Series appearance on the line
>AWS outage sends Ticketmaster down
>Blue Jays fans can't access Game 7 tickets
>Blue Jays opponent…Seattle
>Amazon headquarters…Seattle https://t.co/OYjjDj5cdf pic.twitter.com/rbNnwKYegG

— Morning Brew ☕️ (@MorningBrew) October 20, 2025

The Seahawks, which are hosting the Houston Texans for Monday Night Football in Seattle, issued a statement about the outage “that may impact access to Ticketmaster, Seahawks Account Manager, and the Seahawks Mobile App.”

The Detroit Lions, hosting their own Monday Night Football game, also had ticketing impacted.

The outage effects went beyond just ticketing. The Premier League said its VAR tech system, used to determine offside calls in soccer, would not be available for Monday’s match between West Ham and Brentford.

Amazon’s outage began shortly after midnight Pacific in Amazon’s Northern Virginia (US-EAST-1) region, which is AWS’s oldest and largest cloud region, a popular nerve center for online services.

In an initial update, AWS said the outage was related to a DNS resolution issue with its DynamoDB product, meaning the internet’s phone book failed to find the correct address for a database service used by thousands of apps to store and find data.

Amazon later said the root cause of the outage was an “underlying internal subsystem responsible for monitoring the health of our network load balancers.”

By 3 p.m. PT, the company said all AWS services had returned to normal operations.

Major sites and services including Facebook, Snapchat, Coinbase and Amazon itself were impacted — reviving concerns about the internet’s heavy reliance on the cloud giant.

The outage suggests that many sites have not adequately implemented the redundancy needed to quickly fall back to other regions or cloud providers in the event of AWS outages.

Previously:

AWS outage was not due to a cyberattack — but shows potential for ‘far worse’ damage

(GeekWire Photo / Taylor Soper)

The massive outage that hit Amazon Web Services early Monday and took down several major sites and services was due to an internal issue within the cloud giant’s infrastructure.

In a new update Monday at 8:43 a.m. PT, Amazon said the root cause of the outage was an “underlying internal subsystem responsible for monitoring the health of our network load balancers.”

The outage impacted everything from sites including Facebook, Coinbase, and Amazon itself, to check-in kiosks at LaGuardia Airport.

Amazon said it was seeing connectivity and API recovery for AWS services.

Dr. Aybars Tuncdogan, an associate professor at King’s College London, said it serves as warning sign for a potentially more disruptive situation.

“If a comparable vulnerability were deliberately targeted by malicious actors, the damage would be far worse,” Tuncodgan said.

The problems began shortly after midnight Pacific in Amazon’s Northern Virginia (US-EAST-1) region, which is AWS’s oldest and largest cloud region, a popular nerve center for online services. Major outages originating from this same region also caused widespread disruptions in 20172021, and 2023.

In an initial update, AWS said the outage was related to a DNS resolution issue with its DynamoDB product, meaning the internet’s phone book failed to find the correct address for a database service used by thousands of apps to store and find data.

The latest outage suggests that many sites have not adequately implemented the redundancy needed to quickly fall back to other regions or cloud providers in the event of AWS outages.

“Organizations that use public cloud services like AWS should ensure they follow guidance for shared responsibility in the cloud model for resiliency, including using multi-regional failover for critical applications, and ideally, multi-provider failover, to help minimize the impact of disruptions,” said Marc Laliberte, director of security operations at Seattle-based WatchGuard.

Tuncodgan said the deeper issue is “tech monoculture” in a global infrastructure with little diversity in platforms or providers.

“It’s like agricultural monoculture — when everything relies on a single strain, one disease can wipe out entire plantations, because they all have the same genetics,” he said.

He said that while customers can design redundancy themselves, the providers can also develop different competing infrastructures within their own ecosystems.

“This incident will likely be resolved quickly,” he said. “However, unless we rethink the architecture (that is, we decentralize and diversify), we should expect more outages of this scale, whether from glitches or targeted attacks.”

Vaibhav Tupe, a senior member with technical professional organization IEEE, said cloud service providers should isolate critical networking components more aggressively to prevent cascading failures when core systems malfunction.

“This outage shows that even the largest cloud providers are vulnerable when failure occurs at the control-plane level,” he said. “It raises fundamental questions about over reliance on a single provider or region and may accelerate demand for multi-cloud and multi-region architectures as a baseline expectation for resilience.”

❌