❌

Reading view

There are new articles available, click to refresh the page.

Why AI agents won’t replace government workers anytime soon

The vendor demo looks flawless, the script even cleaner. A digital assistant breezes through forms, updates systems and drafts policy notes while leaders watch a progress bar. The pitch leans on the promised agentic AI advantage.

Then the same agents face real public-sector work and stall on basic steps. The newest empirical benchmark from researchers at the nonprofit Center for AI Safety and data annotation company Scale AI finds current AI agents completing only a tiny fraction of jobs at a professional standard. Agents struggled to deliver production-ready outcomes on practical projects, including an explorer for World Happiness data, a short 2D promo, a 3D product animation, a container-home concept, a simple Suika-style game, and an IEEE-formatted manuscript. This new study should help provide some grounding on what agents can do inside federal programs today, why they will not replace government workers soon, and how to harvest benefits without risking mission, compliance or trust.

Benchmarks, not buzzwords, tell the story

Bold marketing favors smooth narratives of autonomy. Public benchmarks favor reality. In the WebArena benchmark, an agent built on GPT-4 achieved low end-to-end task success compared with human performance on real websites that require navigation, form entry and retrieval. The OSWorld benchmark assembles hundreds of desktop tasks across common apps with file handling and multi-step workflows, and documents persistent brittleness when agents face inconsistent interfaces or long sequences. Software results echo the same pattern. The original SWE-bench evaluates real GitHub issues across live repositories and shows that models generate useful patches, but need scaffolding and review to land working changes.

Duration matters. The H-CAST report correlates agent performance with human task time and finds strong results on short, well-bounded steps and sharp drop-offs on long, multi-hour work. That split maps directly to government operations. Agents can draft a memo outline or a SQL snippet. They falter when the job spans multiple systems, requires policy nuance, or demands meticulous document hygiene.

Building a public dashboard, as in the study run by researchers at the Center for AI Safety and Scale AI, is not a single chart; it is a reliable pipeline with provenance, documentation and accessible visuals. A 2D promo is not a storyboard alone; it is consistent assets, rights-safe media, captions and export settings that pass accessibility checks. A container-home concept is not a render; it is geometry, constraints and safety considerations that survive a technical review.

Federal teams must also contend with rules that raise the bar for autonomy. The AI Risk Management Framework from the National Institute of Standards and Technology gives a shared vocabulary for mapping risks and controls. These guardrails do not block Gen AI in government, they just make unsupervised autonomy a poor bet.

What this means for mission delivery, compliance and the workforce

The near-term value is clear. Treat agents as accelerators for specific tasks inside projects, not substitutes for the people who own outcomes. That approach matches field evidence. A large deployment in customer support showed double-digit gains in resolutions per hour when a generative assistant helped workers with suggested responses and knowledge retrieval, with the biggest lift for less-experienced staff. Translate that into federal work and you get faster first drafts, cleaner queries, more consistent formatting, and quicker starts on visuals, all checked by employees who understand policy, context and stakeholders.

Compliance reinforces the same division of labor. To run in production, systems must pass FedRAMP authorization, recordkeeping requirements and privacy controls. Content must meet Section 508 standards for accessibility. Security teams will lean on the joint secure AI development guidelines from the Cybersecurity and Infrastructure Security Agency and international partners to push model and system builders toward stronger practices. Auditors will use the Government Accountability Office’s accountability framework to probe governance, data quality and human oversight. Every one of those checkpoints increases the value of staff who can judge quality, interpret rules and stitch outputs into agency processes.

The fear that the large majority of federal work will be automated soon does not match the evidence. Agents still miss long sequences, stall at brittle interfaces, and struggle with multi-file deliverables. They produce assets that look plausible but fail validation or policy review. They need context from the people who understand stakeholders, statutes, and mission tradeoffs. That leaves plenty of room for productivity gains without mass replacement. It also shifts work toward specification, review and integration, roles that exist across headquarters and field offices.

A practical playbook federal leaders can use now

Plan for augmentation, not substitution. When I help government agencies adopt AI tools, we start by mapping projects into linked steps and flag the ones that benefit from an assistive agent. Drafting a response to a routine inquiry, summarizing a meeting transcript, extracting fields from a form, generating a chart scaffold, and proposing test cases are all candidates. Require a human owner for every deliverable, and publish acceptance criteria that catch the common failure modes seen in the benchmarks, including missing assets, inconsistent naming, broken links and unreadable exports. Maintain an audit trail that shows prompts, sources and edits so the work is FOIA-ready.

Ground the program in federal policy. Adopt the AI Risk Management Framework for risk mapping, and scope pilots to systems that can inherit or achieve FedRAMP authorization. Treat models and agents as components, not systems of record. Keep sensitive data inside authorized boundaries. Validate accessibility against Section 508 standards before anything goes public. For procurement, require vendors to demonstrate performance on public benchmarks like WebArena, OSWorld or SWE-bench using your agency’s constraints rather than glossy demos.

Staff and labor planning should reflect the new shape of work. Expect fewer hours on rote drafting and more time on specification, review and integration. Upskill employees to write good task definitions, evaluate model outputs, and enforce standards. Track acceptance rates, rework and defects by category so leaders can decide where to expand scope and where to hold the line. Publish internal guidance that explains when to use agents, how to attribute sources, and where human approval is mandatory. Share outcomes with the AI.gov community and look for common building blocks across agencies.

A brief scenario shows how this plays out without wishful thinking. A program office stands up a pilot for public-facing dashboards using open data. An agent produces first-pass code to ingest and visualize the dataset, similar to the World Happiness example. A data specialist verifies source URLs, adds documentation, and applies the agency’s color and accessibility standards. A policy analyst reviews labels and context language for accuracy and plain English. The team stores prompts, code and decisions with metadata for audit. In the same sprint, a communications specialist uses an agent to draft a 30-second script for a social clip and a designer converts it into a simple 2D animation. The outputs move faster, quality holds steady, and the people who understand mission and policy remain responsible for the results.

AI agents deliver lift on specific tasks and stumble on long, cross-tool projects. Public benchmarks on the web, desktop and code back that statement with reproducible evidence. Federal policy adds governance that rewards augmentation over autonomy. The smart move for agencies is to put agents to work inside projects while employees stay accountable for outcomes, compliance and trust. That plan banks real gains today and sets agencies up for more automation tomorrow, without betting programs and reputations on a hype cycle.

Dr. Gleb Tsipursky is CEO of the future-of-work consultancy Disaster Avoidance Experts.

The post Why AI agents won’t replace government workers anytime soon first appeared on Federal News Network.

Β© Federal News Network

Russians Offered Ready-made Crypto Exchange Accounts Amid Restrictions

Russians Offered Ready-made Crypto Exchange Accounts Amid Restrictions

Russian crypto traders have been looking to obtain unrestricted accounts for global exchanges as their access to such platforms is limited. Over the past year, the offering of such accounts on the dark web has increased significantly, cybersecurity experts told the Russian press.

Supply of Crypto Exchange Accounts for Russian Users Doubles in a Year of Sanctions

More and more ready-to-use accounts for cryptocurrency exchanges are being sold to Russian residents. While this is not a new phenomenon β€” such accounts are often employed by fraudsters and money launderers β€” the current growth in supply has been attributed to the restrictions imposed by the trading platforms on customers from Russia, as a result of compliance with sanctions over the war in Ukraine.

Russian residents have been buying these accounts despite the dangers, including the risk that whoever created them could maintain access after the sale, the Kommersant reported. But they are inexpensive and offers on darknet markets have doubled since early 2022, Nikolay Chursin from the Positive Technologies information security threat analysis group told the business daily.

According to Peter Mareichev, an analyst at Kaspersky Digital Footprint Intelligence, the number of new ads for ready-made and verified wallets on various exchanges reached 400 in December. Proposals to prepare fake documents for passing know-your-customer procedures also rose, the newspaper revealed in an earlier article last month.

Simple login data, username and password, is typically priced at around $50, Chursin added. And for a fully set up account, including the documents with which it was registered, a buyer would have to pay an average of $300. Dmitry Bogachev from digital threat analysis firm Jet Infosystems explained that the price depends on factors such as the country and date of registration as well as the activity history. Older accounts are more expensive.

Sergey Mendeleev, CEO of defi banking platform Indefibank, pointed out that there are two categories of buyers β€” Russians that have no other choice as they need an account for everyday work and those who use these accounts for criminal purposes. Igor Sergienko, director of development at cybersecurity services provider RTK-Solar, is convinced that demand is largely due to crypto exchanges blocking Russian accounts or withdrawals to Russian bank cards in recent months.

Major crypto service providers, including leading digital asset exchanges, have complied with financial restrictions introduced by the West in response to Russia’s invasion of Ukraine. Last year, the world’s largest crypto trading platform, Binance, indicated that, while restricting sanctioned individuals and entities, it was not banning all Russians.

However, since the end of 2022, a number of Russian users of Binance have complained about having their accounts blocked without explanation, as reported by Forklog. Many experienced problems for weeks, including suspended withdrawals amid prolonged checks, affected customers said. The company told the crypto news outlet that the blocking of users from Eastern Europe and the Commonwealth of Independent States was related to the case with the seized crypto exchange Bitzlato.

Do you think the restrictions will push more Russians towards buying ready-made accounts for cryptocurrency exchanges? Share your thoughts on the subject in the comments section below.

❌