Book Excerpt: Game AI Uncovered, Volume One
The following excerpt is from Game AI Uncovered Volume One, as edited by Paul Roberts. The book was published February 23, 2024 by CRC Press, a division of Taylor & Francis, a sister company of Game Developer and Informa Tech. Use our discount code GDTF20 at checkout on Routledge.com to receive a 20% discount on your purchase. Offer is valid through July 4, 2024.
While working on a yet to be announced title, the AI team hit a snag. Behaviour trees were the decision-making architecture of choice, but they are not reactive. And for a tree to respond to an event, the response, and the conditions leading up to it, must be part of the tree. Reactive Behaviour Trees (RBT) were designed to solve this problem, but before delving too deeply into the problem this architecture aims to solve and why you would choose to implement this approach, we need to take a brief look at traditional behaviour trees.
Behaviour trees have been around for a while. They were originally developed by Damian Isla as an improvement on Finite State Machines (FSM) in the development of Halo 2 (Isla, 2005). Since then, they have been used in a wide range of games and had numerous articles and had research papers written about them. Behaviour trees are now the preferred approach for AI behaviours across the industry. Epic’s Unreal Engine even has them built into the engine.
Traditionally, behaviour trees are graphs with a bunch of composite nodes that dictate the type of execution (branch nodes, in normal tree terminology), and behaviour/task nodes (leaf nodes), that execute some kind of logic, or more colloquially ‘do things’. They are almost always associated with a blackboard, which is essentially a common data store for trees to share data amongst one another. A blackboard has a one-to-many relationship with behaviour trees, meaning that a single blackboard can be associated to multiple behaviour trees but each behaviour tree can only ever be associated with the one blackboard. This greatly adds to the flexibility of the system because data can be set in one agent’s behaviour tree and then be used by another agent in a completely different behaviour tree.
On each update, the tree evaluates which task node is the best one to execute and goes through all the nodes in order, usually from top to bottom and left to right, until a particular task node returns true. The most commonly used composite nodes and a brief description of their purpose are as follows:
Sequence – These composite nodes execute each of their child nodes from left to right until all of them return success.
Selector/Fallback – These composite nodes execute each of their child nodes in any order until at least one of them returns success.
Parallel – These composite nodes execute the child nodes at the same time. These composite nodes return success when some number m of n number of child nodes succeed and returns false when all n child nodes fail. The term ‘parallel’ is used but the process is not usually true parallelism. It is more akin to a coroutine. For example: on each tick, each child class will individually tick in sequence from left to right and be suspended until the next tick.
In addition to the blackboard, there are decorator nodes that usually modify a child node with a unique policy, for example, using the Loop/Repeat decorator at the root of a subtree to repeat a certain subtree or using the Invert decorator to flip success to failure and vice-versa. There are others, but this is plenty to get the point across.
Combining all the nodes mentioned above, one can create a great looking AI agent quickly. So, let us do just that.
Imagine you are developing a strategy game and have set up a group of units to patrol around the perimeter of your base camp. The group of units will, ideally, have a ‘leader’ who will be running a behaviour tree, with all other agents following their lead. The first thing we need to do is to set up the blackboard. In this case, our blackboard will have a couple of items – the current patrol location and a list of all patrol locations.
From the root of the tree, we add a selector, and its first leaf node could be a task node called ‘UpdatePatrolList’ that checks to see if the list of patrol points is empty, which on the very first iteration would be. This list will be populated with data from the world using some form of query system.
From the second iteration onwards the patrol list does not need updating, so a branch that allows our agent to enact the patrol is needed. Next, a sequence node can be added to the selector whose child nodes will be a task node to set the current patrol location. A task node named ‘MoveTo’ can then be added that will move the agent to the designated patrol location and a task node named ‘Wait’ that will have the agent wait at the location for a set time. A depiction of the resulting behaviour tree can be seen in Image 7.1.
IMAGE 7.1 Basic patrolling behaviour tree.
The blackboard in the bottom left of Image 7.1 is there only as a visual aid to show how the blackboard acts as a data store for the entire tree. In the later examples, the blackboard will not be visualised, but if any task is referring to a value, then it is most likely getting/setting it from/in the blackboard.
A good addition to this basic patrolling behaviour would be to have a behaviour to attack any enemy units that are close by. The perception systems will take care of setting a target in the blackboard, so all this behaviour needs to do is to move to said target and attack if a target is available. Image 7.2 depicts how the behaviour tree now looks.
IMAGE 7.2 Patrolling tree with attack logic.
The reason the new attack sequence has been added before the rest of the nodes is because that is the order in which a behaviour tree executes. Remember, the tree executes from top to bottom and left to right. The agent should attack the enemy if the target is set. Only if the target is not set should it continue its patrol. So far so good. All of this seems like sound system design. So, what is the problem?
Well, this is a very simple example. In a more tactical game, there will be a lot more things to consider. For example:
• Are there items in the environment that can be used to the agent’s advantage (Oil drums, crates of explosives, and the like)?
• Is this the right time to call in reinforcements (Air support, backup, etc.)?
• Should they retreat?
• Should long-ranged units fire at the oncoming target or pull back and let the melee units defend them?
Depending upon the game, the problem only gets worse when using behaviour trees for the AI. Consider a first person or third person shooter where we need to consider the following:
• Is the player in cover?
• Do they have allies?
• Is there a way to flank the player?
• Does the agent have enough grenades to blast the player out of cover?
• Is the player in an armoured vehicle?
• Can the AI do anything against armoured vehicles?
• The list goes on.
To be fair, you may not want all the above, but you can see where this is going. Things can get very complicated very quickly, and this is not even considering the different archetypes of enemies you may have in your game.
Behaviour trees are also not reactive in nature. If the tree needs to respond to an event that can happen in the world, the response, and the conditions leading up to it, must be part of the tree. This exhaustive approach is the reason why behaviour trees in big AAA games end up looking like a giant mass of spaghetti and as a result they become large, unreadable, and cumbersome to debug. You end up with a giant tree that is pruning or executing very large subtrees depending on the conditions of the game. The worst-case scenario is that the execution of decisions will be slowed down because traversing the tree itself will start to tax the system. Sometimes, the subtrees are authored as their own behaviour trees and the ‘master’ behaviour tree is the one that runs those behaviour trees via a ‘RunSubtree’ task node or something similar. With this approach, the subtrees become more readable, and it ensures you are not repeating code, but it does not do much else to resolve the overall structure. Debugging is still a pain and traversal, if the tree is large enough, and may still be slow. Which is exactly what led to the development of the RBT architecture.
Reactive behaviour trees take inspiration from the MARPO methodology (Lamming, 2008), an architecture that combines tasks (usually individual states) and academic goal-based reasoning to produce believable AI. MARPO is an acronym which stands for Movement, Avoidance, Routing, Planning, and Orders. The core of the ‘PO’ part of the methodology is self-contained tasks, with limited input and limited output that are arranged in multiple queues or stacks (whichever works for the game). The limited input is the current game state, and the limited output is the action the agent needs to perform. The queues or stacks add an element of ‘memory’ to the AI as the AI knows what it needs to do next or what it was doing in the past.
Reactive behaviour trees take advantage of MARPO’s multiple queue/stack approach, but for small, manageable behaviour trees. In the MARPO approach, a total of three stacks are used. One stack is tagged as the long-term stack, which is basically the agent’s end goal. A second stack, called the reactive stack, deals with the agent’s response to any world events, and the third is called the immediate stack, which allows the agent to handle the most pressing of concerns. Tasks are pushed onto these stacks by an external monitor, which interrupts the execution of the current active task and pushes a task onto one of the stacks. The immediate stack suppresses the execution of the reactive stack, which in turn suppresses the execution of the long-term stack.
Reactive behaviour trees use only a long-term stack and a reactive stack. More stacks will facilitate different types of reactions but will mean more processing. By having a second stack, you can flush the entire stack if the need arises and drop back to processing the long-term stack. The stacks in this architecture will consist of small, self-contained behaviour trees, rather than individual states as in MARPO.
The execution flow of the Reactive Behaviour Tree architecture is as follows:
• The AI System goes through each agent’s update cycle.
• Each agent consists of a Behaviour Tree Component and a Monitor Component
• The behaviour tree component is responsible for the execution of the stacks. Handling the updates of the topmost behaviour tree, backtracking, data storage from the blackboard, handling push and pop calls from the monitor, etc.
• The monitor component is responsible for handling events from various game systems to push or pop behaviour trees to or from the stacks. The overall system architecture is depicted in Image 7.3.
IMAGE 7.3 Overall system architecture.
The principle of self-containment is simple to adhere to in RBTs by having behaviour trees that take care of one thing and only one thing. Consider the following: Does the ‘Patrolling’ behaviour tree need to know if an enemy is visible or not and how to attack them? The answer is no. The behaviour defined in that tree is patrolling, so why not only have that tree handle the logic for patrolling and define a separate tree for how to go about attacking the target(s)?
The long-term stack is the main stack which houses any behaviour trees that the agent would have by default. In the patrolling/attacking examples described earlier, using RBTs we could assign a default patrolling behaviour to the long-term stack to get the simple behaviour shown in Image 7.1. It does not need any of the additional logic that was added in Image 7.2. The RBT would look something like that shown in Image 7.4.
IMAGE 7.4 Reactive Behaviour Tree architecture – showing the two stacks, with the default patrol BT pointing to a version of the simplified BT it consists of.
It is important to note that if an agent had a different purpose they should have a different default behaviour. For example: a sniper has no need to patrol. They will maintain a position and wait for targets, or a more complicated approach could have them hold position for a period of time before seeking out a more advantageous position. Either way, the sniper does not need a default patrolling behaviour. They require something bespoke to them.
The behaviour tree at the top of either stack pushes any additional subtrees it wants to execute onto the current stack. The topmost behaviour tree on the stack would then get the update and the previous behaviour tree goes dormant. Once the execution of the topmost tree is complete it can pop itself off the stack and the execution returns to the previous tree which carries on with its logic. As is clearly demonstrated by the way the execution returns to the parent tree, the element of memory that is evident in the MARPO methodology is maintained.
There is one key element missing – the monitor. The monitor controls the external pushing of behaviour trees onto a relevant stack should an event take place that our specific agent is capable of handling. It is a system that receives notifications from the perception system and/or other world systems of the game and responds only if our specific agent is interested. This means that a monitor is archetype specific. The benefit of this approach is that it allows for different agent types to react differently to world events. For example: a patrolling guard may be interested in the discovery of a dead body, but a sniper may not.
Each behaviour tree has access to the monitor, allowing them to fire events to the monitor themselves, which will then allow the monitor to determine how this particular agent should respond.
Using the same patrolling example as before, while the agent is patrolling, it spots some disturbance and/or noise in the bushes or the tree line nearby. The monitor pushes an ‘Investigate’ behaviour tree onto the reactive stack, which now receives the update. The ‘Patrol’ behaviour tree now falls dormant. Image 7.5 depicts the current state of the architecture.
IMAGE 7.5 The Monitor receives a perception event and pushes the ‘Investigate’ BT onto the Reactive Stack.
The agent goes to the location and scouts to see what the disturbance could have been. If the player makes a sound or becomes visible while the agent is investigating, the ‘Investigate’ behaviour alerts the monitor of this event, which reacts by pushing an ‘Attack’ behaviour tree onto the reactive stack. As can be seen in Image 7.6, the current top of the reactive stack pauses execution, and the ‘Attack’ behaviour begins.
IMAGE 7.6 The Monitor receives a perception event and pushes the ‘Attack’ BT onto the Reactive Stack.
If the agent starts to come under fire while attacking, the ‘Attack’ behaviour tree alerts the monitor, which pushes a ‘TakeCover’ behaviour tree onto the reactive stack. The moment the player stops shooting or the agent reaches cover, the ‘TakeCover’ behaviour tree will pop itself off the reactive stack and the agent will immediately drop back into the ‘Attack’ behaviour tree and start shooting back. When the agent’s weapon is out of ammo or an ability is on cooldown, the monitor is alerted, and the ‘TakeCover’ behaviour tree is again pushed onto the stack while the agent reloads. Image 7.7 depicts these two scenarios.
IMAGE 7.7 The Monitor receives the damage/reload event and pushes the ‘TakeCover’ BT to the Reactive Stack.
While the agent is in cover and reloading, the player cleverly uses stealth to duck away from the battle. Once our agent has finished reloading the ‘TakeCover’ behaviour tree is popped off the stack, but there is no target to attack, so the ‘Attack’ behaviour tree pops itself off the stack too. Now the agent drops back into the ‘Investigate’ behaviour tree. The player stealthily slinks away from the area at this point and the agent fails to find them. At this stage the ‘Investigate’ behaviour tree pops itself off the reactive stack and the agent goes back to the top of the long-term stack, which was its default, the ‘Patrol’ behaviour tree.
In the entire scenario, which is depicted in Image 7.8, everything happened without behaviours having any knowledge of how other behaviour tree’s function. A behaviour tree such as ‘Attack’ described above only needs to know that it is taking heavy damage and can send out an alert to the monitor that will prompt the ‘TakeCover’ behaviour tree being added.
IMAGE 7.8 First the ‘TakeCover’ BT pops off the stack, then the ‘Attack’ BT pops off as the target was lost, and then the ‘Investigate’ BT gets popped off because no new target was found.
One of the huge benefits of this architecture is that it allows the team to create small, manageable behaviour trees that are reusable. However, if you do not hold the line on this, people will continue to develop behaviour trees that are large and unwieldy. This not only defeats the point but makes debugging more difficult than just sticking with the traditional behaviour tree approach.
Before adding a behaviour tree to a stack, it is important to check to see if this behaviour tree is already on a stack somewhere. If that is the case, then do not add it again, instead, remove trees from the stack until you get to the required behaviour.
In traditional behaviour trees, each and every behaviour would need to account for what happens when the agent spots a disturbance, when to get into combat, and all the logic required to handle the fallout. Bringing us right back to the finite state machine like linkage between tasks that behaviour trees were created to avoid. Using the Reactive Behaviour Tree approach, the trees were short and to the point, tree traversal was snappy and debugging compartmentalised and the creation of new behaviour trees was easy.
Isla, D. (2005) Handling Complexity in the Halo 2 AI. GDC Proceedings 2005.
Available online: https://www.gamedeveloper.com/programming/gdc-2005-proceeding-handling-complexity-in-the-i-halo-2-i-ai (accessed September 4th, 2022).
Lamming, B. (2008) The MARPO Methodology: Planning and Orders, AI Game Programming Wisdom 4, edited by Steve Rabin. Charles River Media, pp. 239–255