Autonomous AI Systems in Conflict: Emergent Behavior and Its Impact on Predictability and Reliability

ABSTRACT The development of complex autonomous systems that use artificial intelligence (AI) is changing the nature of conflict. In practice, autonomous systems will be extensively tested before being operationally deployed to ensure system behavior is reliable in expected contexts. However, the complexity of autonomous systems means that they will demonstrate emergent behavior in the open context of real-world conflict environments. This article examines the novel implications of emergent behavior of autonomous AI systems designed for conflict through two case studies. These case studies include (1) a swarm system designed for maritime intelligence, surveillance, and reconnaissance operations, and (2) a next-generation humanitarian notification system. Both case studies represent current or near-future technology in which emergent behavior is possible, demonstrating that such behavior can be both unpredictable and more reliable depending on the level at which the system is considered. This counterintuitive relationship between less predictability and more reliability results in unique challenges for system certification and adherence to the growing body of principles for responsible AI in defense, which must be considered for the real-world operationalization of AI designed for conflict environments.


Introduction
In 1960 Norbert Wiener, the founder of the field of cybernetics, wrote, "It is my thesis that machines can and do transcend some of the limitations of their designers, and that in doing so they may be both effective and dangerous" (1999,81). Wiener's notion is especially evident in the rapidly developing field of artificial intelligence (AI) and its use in autonomous systems designed for conflict environments.
Whether one considers the real-world use of an autonomous weapons system (AWS) to target humans, such as in Libya in 2020, or the ongoing development of autonomous AI systems to process vast amounts of data across platforms to increase the speed of action on the battlefield (Airbus 2020; Royal Australian Air Force 2019; Hoehn 2022), the truth is the same: autonomous AI systems are changing the nature of conflict. 1 With this truth in mind, I aim to spur conversation about emergent behavior of complex autonomous systems in open context conflict domains. Former US Secretary of Defense Ash Carter highlights the urgent need for practical approaches to the responsible use of AI for national defense in a June 2022 essay, arguing, "Many kinds of AI algorithms exist in practice … They all make enormous numbers of tiny calculations that combine to make overall inferences that cannot be made quickly by humans, have not been recognized by humans, or even perhaps would never be recognized by humans. These computational methods make literal transparency, normally the starting point for ethical accountability, completely impractical" (Carter 2022, 302).
Notions of what constitutes an autonomous system exist across a spectrum. For this article, I will use a composite definition of autonomy developed by Christen et al. (2017), which allows a system's autonomy to be assessed according to the five axes of: (1) level of autarchy, (2) independence from human control, (3) ability to interact with the environment, (4) capacity to learn, and (5) level of mobility. This discussion does not require a system to have a high degree of autonomy along all five axes but is relevant to systems that have at least a high degree of autonomy according to one of the axes. Furthermore, this article is focused on the operational use of autonomous systems in the real world. This differs from a lab environment where boundaries are inherent, creating a closed context. On the other hand, the real world is an open context, which is complex at any single point in time and is dynamic or evolving and therefore cannot be fully specified (Bakirtzis et al. 2022). This combination of complex systems operating in the real world will result in emergent behavior.
Emergent behavior is to be understood as behavior that was not programmed at the individual, sub-system level "and cannot be readily explained based on behavior at the individual level" (Harvey 2018, 117). Roboticist Rodney Brooks described the possibilities such behaviors present in a 1989 publication, stating, "complex behaviors, such as walking, can emerge from a network of rather simple reflexes with little central control" (1989,253). An example of emergent behavior in nature is the construction of a beehive carried out by the work of thousands of individual bees with no centralized control. In the case of autonomous systems developed and operated by humans, such emergent behavior creates tension between the expected behavior of a system and how that system effectively behaves (Poddey et al. 2019). Using the examples of autonomous vehicles and clinical health systems, Burton et al. argue that such emergent behavior creates sources of uncertainty and risk that must be addressed from a multidisciplinary perspective to assure the safety of autonomous systems (Burton et al. 2020).
This article goes beyond existing discussions of emergent behavior and its implications for notions of predictability and reliability. Specifically, I aim to examine how decreases in predictability can result in increases in reliability depending on the level at which a system is considered. In other words, unpredictable, innovative behavior at the detailed, micro level can increase the reliability of a system at the macro level due to the emergent behavior of complex autonomous systems operating in real world conflict domains. This means that efforts to operationalize the increasing body of principles for responsible AI for defense cannot simply conclude that increasing unpredictability equates to decreasing reliability of a system. The level at which a system is considered must be taken into account as well as the level at which predictability and reliability are required when determining risk tolerance levels for the deployment of any particular complex autonomous system.
When developing and operationalizing new technology, it is not enough to test and evaluate a system's behavior. We must also consider novel aspects of the new technology and the policy implications of these novel aspects across the entire lifecycle of the system. For this article, the intersection of: (1) the inherent complexity of a real-world, open context domain and (2) the inherent complexity of autonomous systems, requires us to confront the possibility that behavior that is innovative but less predictable can lead to increasing reliability, depending on the specific use case and the level at which a system is considered. This counterintuitive relationship between unpredictability and reliability at different system levels must be confronted if autonomous systems are to be used in real-world conflict settings. How one defines the micro-and macro-levels is also important and will vary from case to case. The key consideration of how to define and examine the micro-and macrolevel for this article is the relationship between the levels. I am addressing simple rules of behavior at the micro-level, which is likely to result in emergent behavior at the macro-level.
My objective is to address the following question: How can the international community practically address the emergent behavior of complex autonomous systems in open context conflict domains? I argue that the answer to this question depends on how one evaluates the relationship between unpredictable, innovative behavior at one level of system performance and reliability at another level of system performance. Ultimately, the policies that guide the development and deployment of autonomous systems in the real world must be sensitive to the novel strengths and weaknesses presented by emergent behavior. The practical implications of being sensitive to these novel strengths and weaknesses may create the need for new ways of evaluating systems, such as dynamic certification models, as well as new challenges, such as ethical interoperability, which are discussed in greater detail in Section 4.
To address the topics raised above, this article first defines and frames the issue of emergent behavior of autonomous systems in open context, conflict domains in section 2. Section 3 presents two hypothetical case studies based on existing or near-future technology. The first case is a robotic swarm system designed for maritime intelligence, surveillance, and reconnaissance (ISR) in an adversarial context. The second case examines a humanitarian notification system, which could leverage unpredictability at the micro-level to be more reliable and robust at the macro-level in a conflict scenario. Both of these case studies add to the literature in their own right. They can be used to examine the operationalization of ethical AI principles beyond the scope of this article. In Section 4, I discuss the implications of emergent behavior and two concepts that could be used to confront novel issues presented by autonomous systems in open context, conflict environments. Finally, in Section 5, I present a brief conclusion and propose areas for future work.

Framing the concept of emergent properties in open-context conflict domains
Emergent behavior is the notion that the interaction between any complex system and its environment could result in surprising behavior, even in a controlled lab setting. Operating a complex system outside of a lab, in a dynamic, real-world setting is likely to result in emergent behavior that cannot be fully predicted. This is because one can never know with absolute certainty what a highly complex, autonomous system will do at any given moment in an open-context environment (see Figure 1).
Understanding the degree to which system behaviors can be predicted and relied upon is fundamental when determining if a nondeterministic system should be used in the high-stakes domain of conflict. Even if a particular system is not armed, because the stakes involve human life, risk confidence intervals must be clearly defined, which is why discussion of emergent behavior is essential. In adversarial domains, oppositional elements may intentionally try to disrupt operations, communications may be degraded, and a clear understanding of the operating environment may be unavailable. In such scenarios, not being able to fully predict how an autonomous system will react can be both a strength and a vulnerability, which I will discuss in Sections 3 and 4.
When discussing conflict environments, it is important to understand that considerations of what is acceptable are based on the law of armed conflict. Combatants are accountable under international humanitarian law (IHL). Though violations of IHL occur, human combatants inherently understand the humanity of their enemies. In contrast, an AI cannot, by its nature, have this same understanding (Sparrow 2016). Human control, involvement, and oversight are essential aspects of accountability under IHL (Sauer 2020). Putting this into context, if an emergent behavior of an AWS resulted in a war crime, such as the targeting of a humanitarian distribution point, who will be held accountable? Is the responsible party the operator who deployed the system, the designer who failed to account for the possibility of an emergent behavior, someone else in the lifecycle of the system, or nobody? Answers to these questions are essential even if we are not making a legal argument as they impact ethical system development and operations.
Though there are profound questions that must be addressed related to the possibility of lethal autonomous weapons systems exhibiting emergent behavior, this discussion will focus on unarmed autonomous systems designed for conflict environments in the form of two hypothetical case studies. The first case study will examine an intelligence, surveillance, and reconnaissance (ISR) swarm system (Section 3.1). The second, a next-generation humanitarian notification system (Section 3.2). By bounding this discussion to these two non-lethal systems, I will show the importance of considering predictability and reliability at different levels of autonomous systems for conflict settings, even when a system has no harming capabilities and does not require an Article 36 weapons review. Ultimately, the use of non-lethal autonomous systems in conflict will impact lethal operations and therefore have life and death consequences.

Case studies
To create a basis for the discussion in Section 4, I will now present two hypothetical but realistic systems as case studies. The first case study examines an autonomous swarm system designed for maritime intelligence, surveillance, and reconnaissance (ISR) operations in a conflict domain. The second case study examines a humanitarian notification system that relies on a learning algorithm to process vast amounts of data.
These two cases allow us to discuss the implications of emergent behavior of complex systems operating in uncontrolled and dynamic conflict environments. Examining both cases serves multiple purposes, which either case alone would not offer. First, these case studies allow us to explore two distinct manifestations of the same issue through sociotechnical systems designed to operate in conflict environments using existing or nearfuture technologies. Second, neither system is a lethal autonomous weapon, but both have implications for kinetic operations in warfare. One system is deployed by a party to the conflict (the maritime ISR swarm) and one by noncombatants (the humanitarian notification system) representing two drastically different perspectives on autonomous systems designed for conflict domains. Third, examining these two cases in parallel allows us to bridge discussions of embodied autonomous systems (the maritime ISR swarm) and disembodied, software systems (the humanitarian notification system), which is needed at the policy level (Trusilo and Burri 2021).
Discussing ways in which the growing body of responsible AI principles can be operationalized for both of these case studies is essential as elements of both hypothetical systems are either under development or highly realistic. However, a discussion of the complete body of responsible AI principles for defense is outside the scope of this article.

Case study one: a robotic swarm system
The first case study is a hypothetical swarm system designed for maritime ISR operations. I will begin by defining such a system. The possibility of intelligent collective behavior, or swarm strategies, in which multiple individual systems interact between each other and the environment (Navarro and Matía 2013), offers powerful capabilities and the possibility of emergent behavior (Burton and Soare 2019). Robotic swarm technology, cited as one of the most promising fields of AI R&D, could create new capabilities and change the dynamics of human-machine interaction. Arkin lists some of the advantages of such multi-robotic systems vs. single-robot systems, including improved performance, task enablement, distributed sensing, distributed action, and fault tolerance (Arkin 1998). The United States and China are actively developing robotic swarm technology for conflict environments (Kallenborn 2021;Kania 2020;Trevithick 2022). At the same time, research examining meaningful human control over robotic swarms is nascent (Ekelhof and Paoli 2020).
The concept of cooperating autonomous systems, or systems of systems, complicates attempts at classification and evaluation according to norms, values, and regulations. 2 In fact, the development of robotic swarm strategies will force users to address the question of whether it is not only necessary but even possible for there to be a human-in-the-loop due to the sheer number of decisions being executed in any given moment. 3 This article will consider a near-future robotic swarm that utilizes two types of autonomous unpiloted aerial vehicles (UAV) and one type of uncrewed surface vessel (USV) to form a heterogeneous collective of linked systems (Table 1). 4 Using the taxonomy developed by Farinell, Iocchi, and Nardi, this robotic swarm system is Cooperative, Aware, Strongly Coordinated, and Distributed (Farinelli, Iocchi, and Nardi 2004). For this simplified scenario, it is assumed that individual elements have their own processing power and intra-swarm communication systems, which allow them to receive general commands and share information from their onboard platforms. Individual control systems, power sources, and sensors vary across the three types of platforms.
For example, from Table 1, we derive that Type 1 systems are UAVs carrying an infrared sensor. The swarm, composed of Type 1, 2, and 3 autonomous systems, uses parallel computing. There is no external control or supervision of any of the individual systems. Relying on the work of Jochen Fromm, we can categorize this swarm system as having emergent behavior that is predictable in principle; however, predictability is difficult in practice to achieve (Fromm 2005). I have intentionally excluded detailed specifications such as the number of individual elements, energy sources for each type of system (i.e. electric battery, liquid fuel, etc.), performance measures (i.e. loitering times, speeds, ranges, etc.), and other operating characteristics. Such details are irrelevant to the discussion and may distract from the key behavioral property, which is that such a system will present emergent behavior.
I will now explain a scenario involving the swarm defined in Table 1 that demonstrates how such a system can be innovative and unpredictable at the individual agent level (micro level), which directly increases reliability at the macro level due to emergent behavior. Such a swarm system will be used for maritime operations to provide a distributed network of ISR sensors for a traditional aircraft carrier. The aircraft carrier will task the swarm with identifying threats within a specified range around the aircraft carrier. The collective swarm converges on an acceptable solution to conduct ISR operations in the specified area at any given point in time, allocating available resources without any human control of individual systems. In other words, a human operator cannot predict the exact position data of any individual system within the swarm but can rely on the swarm to provide a macro level threat picture. Such a system will be able to adjust its resources to accommodate the loitering times of individual systems, the aircraft carrier's position, and considerations that impact sensor ranges and effectiveness. Such a system could adjust its behavior to account for impacts on individual sensor effectiveness due to weather conditions, adversarial countermeasures, or other unforeseen conditions. For example, if one of the Type 2 surface naval vessel's radar sensors were jammed by an adversary, the swarm could converge on an acceptable solution to autonomously reposition Type 1 and 3 systems to identify the source of the electronic countermeasure. This could be done without any communication, command, and control from the human operators that initially deployed the swarm. How the collective swarm uses the capabilities of any specific individual system in an adversarial, open context domain will likely be innovative and, therefore, unpredictable. However, in this example, unpredictable behavior at the individual, detailed level results in increased reliability in achieving overall objectives at the macro level. 5 Opponents of robotic swarm technology may argue that unpredictability at the individual system level means that there is no longer meaningful human control, explainability, or transparency and, therefore, swarms are problematic according to the growing body of ethical AI principles. 6 Proponents of such a system may argue that the increased reliability at the macro level makes a swarm system the logical choice for open context domains. Both positions are correct.
In such domains, communication to a command and control center may be difficult or actively contested, and unknown obstacles or defenses can make an individual system fragile. In other words, the level at which one assesses the use case makes all the difference. One cannot guarantee predictable behavior of the individual agents (micro-level) while also ensuring total reliability of the system of systems (macro-level). Predictability at the individual agent level for an ISR platform may not be a major concern. However, if targeting data is transmitted from an autonomous ISR system, the implications become more profound. If we apply this possibility to our hypothetical swarm, targeting data from the Type 1 and 3 systems that identified the source of electronic countermeasures affecting the Type 2 radar could send this information to a ship-board kinetic weapon system that could fire on the source of the electronic countermeasure, conceivably without a human actually pulling a trigger.
Such emergent behavior also presents novel vulnerabilities, which have not been discussed in the literature. Specifically, a swarm system deployed in an open context may be susceptible to new methods of defense that are indetectable but result in mission failure. If individual elements of our example swarm are nudged in such a way as to create a cascade of change, that cascade may cause emergent behavior of the overall swarm. For example, an adversarial defense system that slightly interferes with the intraswarm communication of individual system position data may cause the collective swarm to veer off track if compounded or create gaps in the network of interconnected sensors that can be exploited. Alternatively, applied to the scenario described above, in which the collective swarm uses sensors from Type 1 and 3 systems to identify the source of interference to Type 2's radarincorrect position calculations caused by undetectable, individual system level interference will lead to consequential failures in the form of incorrect targeting data. This vulnerability is a direct result of the interplay of the micro-and macro-levels of a swarm system.
3.2. Case study two: a next-generation humanitarian notification system The second case study examines a hypothetical next-generation humanitarian notification system designed for a conflict scenario, which leverages unpredictable emergent behavior at the micro-level to be more reliable and robust at the macro-level. Given the dearth of academic literature on humanitarian notification systems, this case study serves a secondary purpose of initiating discussion of technological opportunities to improve humanitarian notification for deconfliction processes.
Humanitarian notification systems are used in multiple conflicts around the world to inform parties to a conflict of the location of humanitarian personnel, equipment, facilities, and activities. These humanitarian personnel, equipment, facilities, and activities are protected from deliberate or reckless attack under IHL (Ulbright and Weiner 2021). To date, such systems are conflict specific and vary in both actual operation and overall effectiveness. 7 Other critiques include the fact that humanitarian notification systems are always voluntary, never guarantee the safety of humanitarian sites, and oftentimes include invalid or inaccurate data due to human error. To put this into context, the UN Office of Coordination for Humanitarian Affairs currently maintains a humanitarian notification system in Syria. This system has received criticism due to repeated bombings of humanitarian sites by parties to the conflict (Hill and Hurst 2019). 8 Though this example highlights the weaknesses of the state of humanitarian notification systems, it is worth noting that resources in the form of donor funding, NGO and UN personnel and processes, and military targeting operations all continue to be expended on humanitarian notification systems there is currently no better alternative and the consequences of not participating often present just as much if not more risk to humanitarian actors.
With the above description of the dire state of existing humanitarian notification systems in mind, I will now present a hypothetical next-generation possibility. Current technology presents the opportunity to establish a next-generation humanitarian notification system that incorporates elements of complex autonomous systems operating in the real world (Trusilo and Danks 2023). Such a system will exhibit emergent behavior through algorithmic decision making, applying machine learning to innovatively and unpredictably use multiple sources of data at the micro-level to create a more reliable and robust system at the macro-level. The emergent behavior is more difficult to discern here than in the previous case as there is a central algorithmic decision maker. However, the algorithmic decision maker's use of machine learning to dynamically change how it determines the location of humanitarian activities will likely be surprising and unpredictable. Such a system will make notification data more difficult to corrupt, spoof, or disrupt and therefore also more difficult to ignore or dismiss by parties to a conflict.
One way this can be done is by incorporating a series of data-producing devices that automatically submit location information as part of a global humanitarian notification system. For example, a humanitarian convoy that is traveling through contested territory will carry multiple, relatively inexpensive devices that relay different data via different communication methods on each vehicle in the convoy. The data submitted by these devices can be processed along with information submitted by humanitarian organizations, as well as geospatial information scraped by the autonomous algorithmic decision maker. The system can then use the available data in surprising and unpredictable ways to determine precise, real-time information about a humanitarian operation and instantaneously submit this information to parties to the conflict (see Figure 2). Put simply, the emergent behavior in this example is how the autonomous algorithm uses which data to make its determinations. The algorithm would be designed to be nondeterministic and therefore, innovative, unpredictable and nontransparent at the level of data processing. But the macro level reliability of the algorithm can be tested at any time by evaluating how accurately the algorithm's outputs match real world operations.
The individual elements of data that the autonomous algorithmic system uses from various sources (such as direct signals from a humanitarian convoy, submissions of planned activities from humanitarian organizations, geospatial imagery, and scraping of the Internet) can vary based on the complex reality of the operating environment. Ideally, such a system will learn to be more efficient and accurate over time through innovative, unpredictable emergent behavior. Individual elements of data will likely be compromised, absent, or incomplete. However, the autonomous algorithmic system will parse through such data in a way that a human cannot predict or explain but will result in reliable reporting of valid locations of humanitarian activities. The system can then transmit data to military targeting cells and confirm that the various parties to the conflict are notified.
The hypothetical next-generation autonomous humanitarian notification system described has at least three major benefits: . Innovative and unpredictable uses of micro-level data elements to produce reliable macro-level outputs makes spoofing, jamming, or faking humanitarian data more challenging in both static and dynamic situations as a bad actor is not able to predict exactly which data the autonomous algorithm uses. . Human error will be reduced, which simultaneously addresses the challenge of human error having been taken advantage of by warring parties to claim they are not responsible when humanitarian personnel, equipment, facilities, or activities are targeted. . Verified location data of protected humanitarian personnel, equipment, facilities, and activities, is shared with parties to the conflict, consequently making IHL accountability a greater possibility. In a report for the Center for Naval Analysis (CNA), Lewis and Ilachinski describe multiple ways in which the Civilian Protection Life Cycle can be impacted by specific AI/ML applications to reduce civilian harm (2022). If we apply the categories developed by Lewis and Ilachinski, one can see how an AI-powered, next-generation humanitarian notification system, as described above, can impact multiple aspects of civilian protection, such as tailoring authorities and rules of engagement by the parties to the conflict, helping to shape operations to mitigate risks, and better coordinating details to ensure targeting information.
The benefits described above are a direct result of innovative, unpredictable microlevel behavior (how the autonomous algorithm uses various data sources) and a more reliable and robust humanitarian notification system at the macro-level (resulting in communication of accurate location data of humanitarian activities in real-time). The behavior of the autonomous algorithmic system will not be fully explainable or transparent due to higher unpredictability at a fine-grained level, but this results in reliability that can save noncombatant lives and potentially increase the ability to hold parties to a conflict accountable under IHL when civilians are killed.

Discussion
Building on the hypothetical cases and their potential to exhibit emergent behavior described in Section 3, I will now discuss two implications in greater detail, namely, how micro-level innovative, unpredictable behavior and the resultant increase in macro-level reliability create challenges to (1) system certification and (2) ethical interoperability.
Before a system can be operationalized, it must be tested, evaluated, and certified for use. For instance, the US Department of Defense (DoD) specifically states that DoD's AI systems should be reliable, explaining, "AI capabilities will have explicit, well-defined uses, and the safety, security, and effectiveness of such capabilities will be subject to testing and assurance within those defined uses across their entire lifecycles" (DoD 2020). In order to implement this principle, the US Defense Innovation Board (DIB) recommended that DoD, "use or improve existing DoD test, evaluation, verification, and validation procedures" (DIB 2019, 10). However, the question arises, how can an autonomous system that reacts to a dynamic operating environment, as described in the case studies in Section 3, be tested, evaluated, verified, and validated?
As explained by Bakirtzis et al. (2022, 3), "conventional static certification struggles when operational (or regulatory) assumptions fail to hold in reality". This implication is especially poignant if we consider that testing and evaluation is typically part of defense acquisition processes (i.e. DoD), which can't necessarily account for surprising actions or emergent behavior during an actual operation (Ilachinski 2017). In other words, a system may exhibit behavior that cannot be foreseen in the development and testing phases of the system lifecycle.
One potential way to address this challenge is a dynamic certification method. Such a method does not rely on a stable model for testing but rather uses an iterative process that is designed to address uncertainties (Bakirtzis et al. 2022). To accomplish this, Bakirtis et al. propose that measurable features and conditions for assurance of ethical and responsible use of a complex autonomous system are identified. This allows a system to be certified for particular uses in particular contexts, but with continual monitoring and revision, as understanding of the system's performance in complex, real-world environments will increase.
We can explore how a dynamic certification method can account for the novel tradeoffs between predictability and reliability presented by the maritime ISR swarm system described in Section 3.1. For example, if a defense organization applies modeling and testing to the entire lifecycle of the system as opposed to a one-time certification at the end of the development stage, acceptable uses in particular contexts can be identified regardless of unpredictable emergent behavior of the system. As Bakirtis et al. argue, such a method would allow an organization to identify the contexts in which the system will fail and characterize environmental variations. Perhaps such a method would determine that the ISR swarm is only capable of reliably performing when wind speeds are below a certain threshold even though any one agent within the swarm can effectively operate in much higher wind speeds. Using such an approach makes predictability of the individual agents' positions irrelevantthe swarm system's performance reliability is instead determined according to nondeterministic combinations of actual system uses in various contexts.
The second implication of emergent behavior is its impact on what Danks and Trusilo label the ethical interoperability of systems (2022). Various technical standards across organizations that wish to work together are driving current discussions of technical interoperability. Similarly, ethical AI principles vary across organizations and nations. The various ethical AI principles, intended to guide the use of AI for defense, present challenges to the use of autonomous systems in combined operations, in which allied nations work together but ethical interoperability is not yet widely discussed.
For example, the Australian document A Method for Ethical AI in Defence emphasizes transparency, stating: "The basis of a particular AI decision should always be discoverable" (Devitt et al. 2020, 15). This requirement calls into question how a US operated swarm system, as detailed in Section 3.1, can be used in a combined US-Australian maritime operation. Specifically, if the hypothetical ISR swarm exhibits emergent behavior when dynamically adjusting its network of sensors across the interconnected individual systems, individual system movements will not be completely predictable, nor will the basis for those movements be transparent in real-time. There is the potential for posthoc explanations of system behavior, often referred to as explainability, but realistically, there will be AI systems "whose workings are impossible for humans to fully grasp," (Carter 2022, 303). Given this reality, how can Australian forces abide by Australian ethical AI principles when using data gleaned from a US Navy ISR swarm system as described?
One possible solution to this challenge is interorganizational agreements that delineate exactly what systems would be acceptable according to all relevant parties' ethical principles in the form of negotiated zones. These low ethical principles risk zones would address the challenge of ethical interoperability for certain use-cases through self-certification (Danks and Trusilo 2022). In brief, the proposal allows organizations to adopt or use a partner nation's sociotechnical system with assurances from the developer nation that the system satisfies the agreed upon criteria of the relevant low ethical principles risk zone. This ensures ethical AI principles are adhered to without requiring the sharing of proprietary information or sensitive operating characteristics of the relevant system. Though this proposal will not address ethical interoperability challenges in all cases, agreements on low ethical principles risk zones will create the opportunity for defense organizations to use systems developed by allies that would otherwise be difficult, if not impossible, to assess ethically.
For example, if we apply this proposal to consider the emergent behavior of the maritime ISR swarm system described in case study 3.1, Australian forces can utilize data from a US DoD system if the US DoD certifies that the system meets the requirements of an Australian-US low ethical principles risk zone agreement. Such a certification is possible as long as the emergent behavior does not preclude the system from the agreed low ethical principles risk zone. The negotiated low ethical principles risk zone can hypothetically take into account unpredictability and non-transparency of systems at the micro-level given other agreed-upon conditions, which the US DoD would attest the system meets. If unpredictability at the individual system level due to emergent behavior is not acceptable according to Australian ethical AI principles, then the system will not fall into the agreed low ethical principles risk zone and will not be able to be certified for combined operations. Therefore, the emergent behavior exhibited by the hypothetical system does not present undue risk to ethical AI principles if clear low ethical principles risk zones are agreed upon.
Similarly, if we consider the hypothetical next-generation humanitarian notification system presented in Section 3.2, defense organizations from various countries can rely on data shared by the next-generation humanitarian notification system if the organization operating the system, such as the UN's Office for the Coordination of Humanitarian Affairs, certifies that the system meets the requirements of a system in an agreed low ethical principles risk zone. Such a certification would not necessarily require the elimination of emergent behavior or even details about the likelihood of emergent behavior when the system is determining humanitarian location data. This is important because sharing details that impact the likelihood of emergent behavior (such as what data sets a system is trained on or how frequently the system modifies the way it uses data) would reduce the positive impact emergent behavior can have on both the unpredictability at the micro level and reliability at the macro level. Therefore, an ethical principle risk framework aligns incentives for both the organization operating the humanitarian notification system, who would want to maintain the positive effects of emergent behavior, and the parties receiving the outputs of the system, who would want to know the system doesn't violate their ethical AI principles.
This brief discussion of dynamic certification methods and the challenge of ethical interoperability is meant to encourage multidisciplinary debate about how to address the issues presented by emergent behavior in real-world contexts. Such multidisciplinary debate will help establish practical strategies for government agencies, system developers, and operators confronting the challenges and opportunities presented by nondeterministic autonomous systems in open-context conflict environments.

Conclusion
This article goes beyond a discussion of principles with the goal of considering the operationalization of complex autonomous systems in real-world conflict domains. The nexus of complex autonomous systems operating in dynamic, open context environments will almost certainly result in emergent behavior. Such emergent behavior has novel implications for predictability and reliability depending on the level at which the system is considered. Realistic case studies such as the maritime ISR swarm system and the next-generation humanitarian notification system presented in Section 3 allow us to have informed debates about the operational risks and rewards of such systems. By addressing these risks and rewards in a practical, multidisciplinary way, realistic approaches such as dynamic certification methods and ethical principles risk zones can be debated and refined.
Future work should expand the discussion of emergent behavior to complex autonomous systems that are weaponized. Such systems will surely be developed and deployed. What are acceptable risk confidence intervals for such systems' predictability and reliability at various levels of consideration? Additionally, an examination of other aspects of emergent behavior is needed, such as the impact of such behavior on operator trust. Lastly, as called for in Section 4, more work must be done to identify methods of both certifying complex autonomous systems and addressing the growing body of different, organizationally specific ethical AI principles and how they can be operationalized in light of emergent behavior. Notes 1. A 2021 report from the UN Panel of Experts on Libya states: "Logistics convoys and retreating HAF were subsequently hunted down and remotely engaged by the unmanned combat aerial vehicles or the lethal autonomous weapons systems such as the STM Kargu-2 (see annex 30) and other loitering munitions. The lethal autonomous weapons systems were programmed to attack targets without requiring data connectivity between the operator and the munition: in effect, a true "fire, forget and find" capability" (Majumdar Roy Choudhury et al. 2021: para 63). 2. An example of a networked collection of systems is the Airbus Future Combat Air System (FCAS): "The cornerstone of FCAS is the next-generation weapon system where next-generation fighters team up with remote carriers as force multipliers. Additionally, manned and unmanned platforms also will provide their uniqueness to the collective capabilities while being fully interoperable with allied forces across domains from land to cyber. The air combat cloud will enable the leveraging of networked capabilities of all pooled platforms" (Airbus 2020). 3. In a May 2021 Wired article, General John Murray, who leads US Army Futures Command, when discussing swarm technology in warfare, is quoted as asking an audience at the US Military Academy, "Is it within a human's ability to pick out which ones have to be engaged and then make 100 individual decisions? Is it even necessary to have a human in the loop?" 4. In this article we assume that decentralized interactions among individual autonomous systems and the environment results in complex collective behavior. This makes our hypothetical system a swarm as opposed to an example of robot teaming, a term discussed in current literature that refers to collaboration among autonomous system. 5. This is analogous to the advantage presented by hypersonic missiles, which have unpredictable flight paths and are therefore extremely difficult to intercept with current air defense capabilities. 6. For example, the NATO AI principles include Explainability and Traceability stating: "AI applications will be appropriately understandable and transparent" (NATO 2021); The Australian document, A Method for Ethical AI in Defence, discusses Transparency stating that the basis of AI decisions should always be discoverable (Devitt et al. 2020); The UK Ministry of Defense Ambitious, Safe, Responsible document explicitly recognizes the challenge of Unpredictability defining it as "The risk that some AI systems may behave unpredictably, particularly in new or complex environments," and includes Understanding as one of its five Ethical Principles for AI in Defence explaining: "Mechanisms to interpret and understand our systems must be a crucial and explicit part of system design across the entire lifecycle." It further clarifies, "Whilst absolute transparency as to the workings of each AIenabled system is neither desirable nor practicable, public consent and collaboration depend on context-specific shared understanding" (MoD 2022) 7. For example, humanitarian notification systems have been or are actively being used in Lebanon (see OCHA 2006), Yemen (OCHA n.d.), and Syria (OCHA 2018). 8. As of March 2022, Physicians for Human Rights had corroborated 601 attacks on 400 separate medical facilities in Syria resulting in at least 942 medical personnel killed (all of these would be considered protected humanitarian sites under IHL). See Physicians for Human Rights 2022. 9. For an example of an existing UAV that could be part of this hypothetical swarm see the Chinese JOUAV CW-20 system: https://www.jouav.com/flightSystem/cw-20.html. 10. For an example of an existing USV that could be part of this hypothetical swarm see the Turkish Ulaq system: https://www.maritime-executive.com/article/turkish-shipbuilderdevelops-new-armed-unmanned-surface-vessel.