Speed, scale, and blind spots: Stress-testing cyber stability in the age of AI

By Donavan Cheah (Senior Cybersecurity Consultant, Thales), in conversation with Anastasiya Kazakova (Cyber Diplomacy Knowledge Fellow and Geneva Dialogue Project Coordinator, DiploFoundation).

Artificial intelligence is often described as a technological revolution in cyber conflict. However, revolutions are not only about new capabilities; they are also about pressure on existing systems.

To understand AI’s real impact on cybersecurity and international stability, it is useful to stress-test key assumptions: speed, concentration, automation, responsibility, and governance.

The reflections below are drawn from an interview with Donavan Cheah, a Senior Cybersecurity Consultant at Thales.

Stress Test 1: Speed – what happens when timelines collapse?

In practice, most AI development, on the offensive side, revolves around scaling up existing operations. This includes quicker reconnaissance and faster pre-generation of fuzzing lists for brute force attacks. While there have been proof-of-concept demonstrations such as PROMPTSTEALER and PROMPTLOCK, we remain far from a scenario where AI introduces a fundamentally novel attack method that cannot be detected by defence.

On the defensive side, AI (particularly machine learning, ML) has already been used extensively to detect attacks better. ML algorithms tend to perform better when more data is available, and defensive AI has improved significantly over time.

Speed, however, is a force multiplier. In February 2026, Singapore released details of the largest cyber operation mounted to deal with damage caused by the APT group UNC3886. More than 100 cyber defenders were involved over a period of 11 months. If APT attacks were to scale up even by a factor of two or three, smaller states could easily be overwhelmed unless they scale their defensive capabilities correspondingly.

AI does not necessarily change the nature of cyber operations. But it changes their scale and tempo, and that matters operationally.

Stress Test 2: Vulnerability – what new risks does AI introduce?

This depends on the nature of implementation.

There are many use cases for AI to receive data from critical infrastructure for analysis and monitoring purposes, passed to ML algorithms. Such ML algorithms are not new and have been used for many years, particularly where the provenance of data is trusted.

However, AI systems can be susceptible to data poisoning and model poisoning risks, which are not always easy to comprehend. The key difference lies in the architecture of the AI system. AI systems consist of multiple layers (e.g. foundation models, deployment environments, data storage, evaluation benchmarks) and each layer introduces its own risks.

Even relatively simple systems, such as customer-facing chatbots, can produce misleading outputs that encourage users to execute malicious commands or follow poor advice. In more advanced agent-based systems, risks emerge when agents leverage inputs without properly assessing provenance, reliability, or distinguishing between information and executable instruction.

Given the low tolerance for error in critical infrastructure, AI adoption in such environments has, so far, been largely limited to auxiliary systems rather than core operational systems.

Another vulnerability lies in the manner some AI systems are developed and deployed. For example, OpenClaw (first named Clawdbot, later renamed Moltbot) was released with significant security flaws, including plaintext credential storage and exposed interfaces without proper remote authentication. Other design features of OpenClaw also remain fundamentally insecure to date. Such examples illustrate that rapid prototyping does not always meet enterprise-level security demands.

This highlights broader issues around identity and access management (IAM). Agent-based systems may operate with excessive privileges if IAM controls are not properly implemented, such as OpenClaw operating with the same privileges as that of the user, without the contextual knowledge and intuition of the user. IAM remains foundational, but it has not necessarily kept pace with the speed of AI experimentation.

The risk is less about AI being inherently insecure and more about governance and security architecture lagging deployment.

Stress Test 3: Concentration – are we creating strategic dependence?

From a supply chain perspective, concerns about digital sovereignty already exist, particularly in regions such as the EU. These concerns extend to cyber-defence infrastructure due to reliance on major cloud and AI providers, primarily originating from the United States and China.

However, this creates a structural dilemma. Infrastructure-based technologies such as cloud and AI scale with size. Large technology ecosystems benefit from this scale, whereas many other regions, even populous ones, may find it difficult to replicate comparable foundational AI capabilities.

Training advanced AI models requires immense investment. Estimates suggest that model training costs can range from tens of millions to billions of US dollars. Given these economics, it is difficult to see the number of global AI providers increasing significantly.

This implies structural concentration.

A pragmatic approach may involve balancing strategic dependence through diversification of AI providers, supply chain risk management, and supporting collaborative efforts among smaller actors. Open-source AI models may also offer partial mitigation, such as how free and open-source software underpins much of today’s digital infrastructure.

Nonetheless, strategic dependence on AI-enabled cyber defence infrastructure is already a reality.

Stress Test 4: Automation – are we moving toward machine-driven cyber conflict?

Speculation about ‘machine versus machine’ cyber conflict is understandable, particularly given AI’s capacity to scale operations.

However, cyber operations must be understood considering their objectives. In the case of UNC3886, the apparent aim was long-term persistence, likely for intelligence purposes. Stealth was important. In such cases, triggering automated defensive responses may undermine the attacker’s objectives.

In other contexts, cyber operations may pursue psychological effects, such as repeated targeting of critical infrastructure to destabilise trust. These operations also remain strategically directed.

At a more mundane level, cyber operations are already highly automated. Probing, reconnaissance, detection, and prevention are largely machine-driven today. In that sense, cyber interaction is already automated. AI increases the scale and speed of automation, but it does not eliminate the role of strategic intent.

Stress Test 5: Responsibility – does AI complicate attribution and accountability?

AI cannot take accountability for its actions. Humans remain responsible for the systems they deploy. However, as AI systems grow in complexity, it becomes harder for humans to fully understand their behaviour.

If an AI patch generation system produces a patch that passes testing but contains a hidden backdoor, questions about responsibility arise. Precedents exist in cybersecurity where trusted systems have been compromised. Similar concerns apply if AI systems autonomously generate exploit code that is later deployed.

AI complicates attribution and due diligence. There is limited public precedent to guide how state responsibility frameworks may evolve in this space. Much may depend on how states articulate their positions in forums such as the UN Open-Ended Working Group (and in the Global Mechanism on ICTs to be launched very soon).

AI does not remove human decision authority. But it can make exercising that authority more difficult.

Stress Test 6: Governance – are international discussions keeping pace?

International discussions on cybersecurity, including efforts around confidence-building measures (CBMs), continue to evolve. However, significant divergence remains among states.

Perhaps it is important to recognise that nation-states differ considerably in their maturity levels when dealing with cybersecurity campaigns targeted at national infrastructure. Cybersecurity also intersects significantly with other domains, including geopolitics and international law. These overlapping domains contribute to the complexity of multilateral negotiations.

Governance challenges are further complicated by rapid developments in AI, including the emergence of ‘agentic AI’. Different jurisdictions are adopting different regulatory approaches. While some frameworks have been enacted, there is also evidence of regulatory recalibration.

A useful way to understand why this issue remains systemic is to put ourselves in the shoes of a national leader. When cybersecurity is discussed at the state level, what do domestic stakeholders see as top priorities?

At the forefront is, typically, the ability to maintain trust in national digital infrastructure: operational technology systems such as utilities, or the continued functioning of financial markets and essential services. These are immediate, politically visible concerns.

In that context, international norms are unlikely to rank highly on the list of urgent priorities, particularly for states that are still addressing more foundational resilience challenges.

The gap between technological acceleration and diplomatic negotiation may therefore reflect structural differences in national capacity and priorities, rather than simple inertia in multilateral processes.

Conclusion

AI does not fundamentally reinvent cyber operations. It primarily scales them.

It accelerates reconnaissance and detection. It introduces layered architectural risks. It creates structural concentration in foundational AI capability. It increases automation while preserving strategic intent. And it complicates attribution and accountability without eliminating them.

The stress test is not whether AI changes the nature of cyber conflict in theory. It is whether technical, operational, and diplomatic institutions can absorb the acceleration AI brings in practice.