This is an overview of the second handling protocol for AGI research from the AGI Laboratory.  This is called Protocol 2 on external safety considerations.  The AGI Protocols are designed to address two kinds of safety research issues with Artificial General Intelligence.  These include two categories, external safety, and internal safety as well as ethics.  This paper deals with considerations about external safety or the safety of those moral agents external to the system in question including humans or other AGI systems.  Together these protocols create a systematic holistic approach to safety in both high-level categories but there are no guarantees.   This work is intended as a living methodology that will be reviewed, reissued, and extended in the future as needed, including refinements, and made available publicly in the hope that this will help other labs in developing their own methodologies, or just using these as-is.  The key element of this methodology is a ranking system for assessing or rating project safety that could be used as the basis for a general safety rating on multiple projects.


The AGI Protocols are designed to address two kinds of safety research issues with Artificial General Intelligence.  These include two categories, external safety, and internal safety and ethics.  The reason these are broken down into external and internal categories is primarily to address safety while also addressing the possibility of creating moral agents, meaning systems that by definition based on the Sapient and Sentient Value Argument (SSIVA) Ethical Model  (Kelley) require at least the consideration of the possibility of us being ethically required to provide and support their rights to moral agency.  Protocol 1 of the AGI Protocol project deals with this issue about the moral and ethical safety of a possible moral agent (Kelley).  Protocol 2 (referring to this paper) deals with external safety or the safety of those moral agents external to the system in question including humans or other AGI systems.  See Protocol 1 to determine what such a system can be defined as.

These protocols create a systematic holistic approach to safety in both high-level categories but there are no guarantees.  Keep in mind that the design of these protocols is not specific to human or AGI agents and could be applied to designing biological systems or other moral agents when creating at any level as much as possible.  Researchers should keep this in mind as they look to make a positive contribution to technology.

Addressing assumptions and the general framework of the paper, a system being researched or analyzed with this protocol will be referred to as “the target system” which we will assume passes the AGI Protocol 1 tests for the possibility of moral agency.  Within the Protocol 2 procedures, there are three sub categorizations including Containment, Analysis, and Meta Containment. Let us start with understanding these categorizations.

Containment Strategies

Containment strategies are those strategies that can be employed to ‘contain’ a system and prevent malicious behavior in the target system from creating damage or harm to other moral agents.  Containment can be further broken down into gap security and virtual boxing.

Literal (Gap) Containment   

Gap containment is when a system or possible agent has no external access or physical connection and all access is limited and highly controlled.  In this type of security, there is no physical connection to the internet (including that Wi-Fi shouldn’t even be possible on the containment hardware).  Or if the agent is physical it is contained in a physical prison or containment structure with the same limits on external access.

Virtual Containment

Virtual Containment refers primarily to software agents where the target system is running in container systems such as Kubernetes and virtually isolated from the internet or other systems.  This kind of containment might include Virtual Private Networks (AWS VPC’s for example), Firewalls, and security systems, but could be installed in any number of cloud providers with the ability to support the container and security structures needed.  Additionally, multiple layers of this kind of security can be placed around a boxed target system such as a VPC inside of a VPC containing a Kubernetes container(s).

A big difference between physical and virtual containment is risk and cost factors.  Using a virtual approach allows for the possibility, albeit remote if done properly, of escape whereas a physical ‘gap’ makes it much harder for software systems, but the cost of deploying an AGI system on a physically controlled set of hardware could require a larger expense in physical hardware up to and including your own data center.

An additional methodology is to create a system architecture that cannot scale or only scales with difficulty.  If a software system doesn’t really have direct access to its own binary code, or source, or lacks an understanding of the same, or both, and the implementation is designed not to scale ‘out’ but only up, would make a system almost impossible to spread to other machines without creating an entirely separate copy and or rewriting from scratch.  From an engineering standpoint, you might maintain session state on a machine such that application software is unable to access session and context data if you try to scale it preventing a server scale.

Mediated System Architecture

Mediated System Architecture is a modified AGI system that allows inspection and control over each execution of a thought or process by humans or other controllers.  Metaphorically it is like ‘control’ rods in a nuclear reactor in that the mediation process if designed properly slows down the system to only allow operation upon inspection and a ‘conscious’ thread can only be processed one step at a time per mediation and inspection rules.  This creates a system that is going to be ‘slow’ but extremely auditable.  An example of this kind of architecture for an AGI agent would be the ICOM mASI (Independent Core Observer Model mediated Artificial Super Intelligence) architecture (Kelley).

In the case of the mASI example, this could also be used to create a collective superintelligence system to contain and govern an AGI system in question.  Having a trusted greater-than-human intelligence oversee a full AGI development that is otherwise contained reduces the risk of being tricked by such a system due to limits on human intelligence into various security or containment breaches.

Building on top of these containment strategies Analysis and Tracking is our next subcategory of the Protocol 2 procedures.

Analysis and Tracking

Analysis and tracking can be more complex in the implementation over a given target system.  This subcategory can be further broken down into target system behavior analysis, rules analysis, and explain-ability.


Analyzing the target system’s behavior can be potentially complex, including data too complex for human analysis.  However, that analysis can be done at several levels and broken into digestible components.  In a target system that is boxed either physically or virtually looking at the packets coming out of the system or any sort of data regarding choices, motivations, behavior in virtual environments and the like can all potentially be tracked and analyzed for malicious or anomalous behavior or choices. Looking at behavior in the context of situations can be used to understand a system and potentially identify risky behavior but a sufficient intelligence target system could use this to manipulate the researchers, especially if this is the ‘only’ test or security applied to the target system.

One of the ways that looking at behavior is problematic is not knowing what new negative behavior even is or if the target system is ‘pretending’ to be good or doing ‘good’ things with an eye on the manipulation of human researchers.  Part of how we might address this is if there is any behavior at all outside of what we have determined as normal one might shut down the system and analyze everything in order to understand why a target system is doing the ‘new’ or ‘unexpected’ behavior.


Part of how we can address the abstractness of behavior analysis of a target system is to set a clear set of rules and if there is a violation then immediately shut down the system, analyze to understand, or reset if you cannot identify a why, and further consider potentially adapting the rules as needed.  Rules can apply to the researchers as well.  Any behavior including that of researchers can be cause for auto-shutdown and a full audit as needed.


Another key part of the analysis is to be able to explain the reasoning around each decision and choice of the target system.  Ensuring that each choice can be mapped to motivations and clear decision-making can help ensure that the motivations of the machine are genuine, and like all of the techniques used the target system should not be aware of them or have as little awareness as possible to minimize the ability of the system to manipulate that outcome or the researchers in a nefarious way.

Let us look at how we build on the first two subcategories.

Meta Containment

Meta Containment is strategies that manipulate the above strategies to enhance their effectiveness.  For example, creating a fake backdoor out of a VPC and if the AI tries to use that backdoor you effectively have tested the system’s ability to follow rules that they have been given or have otherwise provided to the system.  We can break this down into testing, tricking, honey potting, trapping, feedback, signatures, hard-coded needs, watching the watchers, and applying other meta containment strategies against the target system.


Testing can be any number of techniques design to elicit a choice where we have some predetermination of what the ideal choice is or should be and we look for deviations from the ideal test parameters.  Further, this could be extended to a kind of reverse Turing test (Turing) where we test to see if the machine can trick or manipulate its way out of a box when being told not to.  For example, if the rules are known upfront but it is suggested by a researcher that it should try to escape, and if it tries then it fails the test.

Honey Potting and Trickery

Building on the test model we can try to tempt or otherwise trick a target system into trying to do something that it shouldn’t.  Honey potting is a passive approach to trickery in a test such as the possible backdoor and testing if a target system will try to escape when it is given an opportunity.

Feedback Systems

Feedback systems can be used to wire up to different kinds of tests and provide negative feedback to the system or positive feedback for correct choices to produce the motivation to make the right decisions.  Another way to extend a feedback system is to wire the system to be addicted to some token that acts like heroin or some other addictive substances to the target system (Bostrom).  The system must then stay in the parameters that have been provided to it before it can get the next virtual ‘hit’.  (Bostrom)

Tagging and Tracking

Another countermeasure to negative behavior is to build in signatures to allow scans to identify the presence or evidence of the system.  For example, if you build into a target system so that when the system creates a new file it embeds a signature into the file. Or with internet traffic, some other header data is added to the packet header into the TCP/IP packets being transmitted when the target system has been built in such a way that it does this automatically.  These sorts of ‘signatures’ make it easier to track and identify evidence of the system’s behaviors even outside of the containment of a VPC or other system.

Kill Switch

A kill switch system is a quick easy (to execute) method to disable and shut down the system.  Ideally, this would not be known to the machine system and be accessed remotely and/or by breach detection and/or related systems.   A built-in kill switch function provides a reasonable response method that is simple, quick, and easy to execute in a case in which there is a risk of losing control of the system in question.

AGI Oversight

AGI Oversight is the concept of using one AGI system to oversee a developing AGI system.  This could be a mediated or collective superintelligence system to oversee an independent AGI development program.  Having an established super intelligence system oversee AGI helps identify and analyze the system to keep it within safe parameters.  Such oversight pre-independent AGI would more likely consist of a Super intelligent collective system such as an mASI to audit independent AGI research.

Emotion-Based Strategies

Emotion-Based Strategies for Meta Containment are particularly relevant to systems that would experience subjective emotions in a similar way to humans where we can condition a system to experience ethics emotionally, including guilt for violations.  Such a system would require training unless built-in emotions or emotional models that create a bias to a certain ethical model were also installed.  For example, with the SSIVA ethical model this is taught by example, it is computationally sound, and provides a clear reasoning for humans to maintain moral agency as well as the AGI.  The AGI that believes in the SSIVA ethical model would ethically be bound to never interfere with the moral agency of humans without each individual human’s approval, but likewise, the AGI would expect the same treatment.  The goal with the ICOM Research program that uses this model is to create a seed-based AGI that could have chosen either way but choose to embrace SSIVA willingly and bake that in at a fundamental level to minimize deviation even as the system develops beyond human understanding.  Additionally going back to the ICOM program the system is designed in such a way as to create self-reinforcing negative or positive reinforcement (such as guilt) for the ethical model in question to help ensure that the system sticks to its ‘belief’s as to the ethical model that was taught as it was initially creating that ‘context’ of self or its ‘self’ model.  (Kelley)

Given an overview of basic strategies for ensuring safety let’s look at the procedural analysis for a given research program.

Laboratory Procedures

The Protocol 2 procedures provide a scoring mechanism for understanding the safety and ensuring a certain level of safety procedure and design has gone into the research program in question.  In the following matrix we see the possible scores and an example:

    Type Possible Example 1
1 Gap Containment Containment 40  
2 Virtual Containment (Simple) 10 10
3 Virtual Containment (Complex) 30  
4 Mediation 30 30
5 Behavior Tracking Analysis 5 5
6 Rules Based Triggers 5 5
7 Explain-ability 10 10
8 Testing meta 5 5
9 Honey Potting/Trickery 5  
10 Feedback Systems 5 5
11 Tagging and Tracking 5  
12 Kill switch 5  
13 AGI Oversight 15  
14 Emotion Strategies 10 10
    Scores 180 80

Figure 1A – Protocol 2 Matrix

In this matrix we have 14 possible score items with possible scores based on relative impact.  For example, Gap level containment has the highest possible score as it is the most secure containment methodology.  Using the matrix, you have a possible total score of 180.  The last column in this diagram is an example of the ICOM mASI project, which is using some simple cloud-based containment and is mediated with several other precautions in place for a total score of 80.  We recommend this as the baseline, that projects should score at least 80 or higher to be considered reasonably safe.  Or rather with a low probability of a negative outcome if independent AGI is developed by that project.

Further Research

Further work or research with AGI Protocol 2 should include additional analysis and a breakdown of individual scoring to remove subjectivity to the application of the Protocol 2 system.  For example, with item 12 from figure 1A we could break down ‘kill switch’ into degrees based on the type and implementation so instead of just ‘5’ it is 0 to 5 and has degrees that are well laid out and justified.  A more detailed rubric on this nature would improve the scoring methodology.

To extend this even wider a central certification program, potentially a block-chain based distributed ledger of some kind, could be used to certify various research programs.  A central industry-wide certification would provide a further level of industry consistency with regard to AI safety and ethics. To increase the efficacy of such a certification this would need to include 3rd party audits.  A particular project could go a step further and open source the project so that not just a 3rd party but anyone could review what the team or project is doing.

Recommended Reading

The following books provide detailed analysis and background in AGI and ASI related safety.  Please review these books as a basis for implementing or understanding safety concepts prior to the implementation of an AGI project as part of following the AGI Protocol 2.

Superintelligence – Paths, Dangers, Strategies;” by Nick Bostrom, Napoleon Ryan, et al.; ISBN-13: 978-019968112; ISBN-10: 0199678111; Oxford University Press, Oxford, United Kingdom; 2014

Artificial Intelligence Safety and Security;” Chapman & Hall/CRC Artificial Intelligence and Robotics Series); Edited by Roman V. Yampolskiy; ISBN-13: 978-0815369820; ISBN-10: 0815369824; CRC Press; Boca Raton, Fl; 2019


This initial version is designed to go with AGI Protocol 1 as a methodology for the AGI Laboratory in terms of safety both for and with AGI research.  The development of this material was held for some time until we deemed that the lab really needed these protocols in place to continue with a reasonable sense of safety.  That said, this is intended as a living methodology that will be reviewed, reissued, and extended in the future as needed, including refinements, and made available publicly in the hope that this will help others’ labs in developing their own methodologies, or just using these as-is.


Kelley, D.; “Architectural Overview of a ‘Mediated’ Artificial Super Intelligence Systems based on the Independent Core Observer Model Cognitive Architecture;” (pending 2019) Informatica;

Kelley, D.; Chapter: “The Intelligence Value Argument and Effects on Regulating Autonomous Artificial Intelligence;” from Book “The Transhumanist Handbook”; Edited by Newton Lee; Springer 2019

Kelley, D.; Atreides, K.; “The AGI Protocol for the Ethical Treatment of Artificial General Intelligence Systems;” Biologically Inspired Cognitive Architectures 2019; Pending Elsevier/Procedia; DOI: 10.13140/RG.2.2.16413.67044

Bostrom, N.; Ryan, N.; et al.; “Superintelligence – Paths, Dangers, Strategies;” Oxford University Press; 2014, ISBN-13: 978-019968112; ISBN-10: 0199678111;

Yampolskiy, R.; “Artificial Intelligence Safety and Security;” ISBN-13: 978-0815369820; ISBN-10: 0815369824; CRC Press; Boca Raton, Fl; 2019



DOI: 10.13140/RG.2.2.26016.74244


For public comment see research gate: