The idea of involving humans directly to benefit the training of AI agents is getting traction, thanks in part to advances made in reinforcement learning and human-in-the-loop training. The ability to leverage human expertise and sensibility as well as AI’s exploration power and pattern recognition capabilities in intelligent ecosystems is a stepping stone towards better, more just, and sturdier systems. However, the path from lab to real-life deployment and operation comes with architecture, functional design, and engineering complexities.

We present CogmentTM, a unifying open-source framework that introduces an actor formalism to support a variety of human-agent collaboration topologies and training approaches, including human-led demonstrations, evaluations, and guidance. Cogment addresses the aforementioned complexities and is also scalable out of the box thanks to a distributed microservice approach. This post offers an overview of how Cogment’s open-source framework supports distributed multi-actor training, deployment & operating. If you are interested in diving directly into more detail, the complete Cogment White Paper is freely available here.

Why Cogment Matters

In order to achieve human-AI collaboration, a shared environment where humans and AI agents can operate and train together is warranted; without it, AI agents cannot learn from humans’ reactions to what they are doing. We are talking, here, about going far beyond human contributions like annotation of data, which is usually carried out offline and for specific kinds of AI agents’ training on examples. Furthermore, human feedback of various degrees of complexity, human demonstrations, and live operating have all been shown to enhance AI training and results. Until now, however, there was no accessible, unifying technological and design framework to quickly develop, train, and deploy such applications. Cogment was designed to answer these needs.

Figure 1: Cogment orchestrates the running of trials (typically referred to as “episodes” or “trajectories” in the reinforcement learning literature) involving AI and human actors in simulated or “real-world” environments. Each trial involves one or multiple actors and runs in one environment. Multiple trials can run concurrently and share actors or environments.

Key Features of Cogment

**Multi-actor**

Cogment provides a **multi-actor** framework where multiple heterogeneous actors interact with an environment during trials. In Cogment, actors can be either humans or AI agents of any kind: learning or static, using machine learning models or not. Cogment thus enables AI agents and humans to interact and train together in shared environments, in any configuration. For example, instances of the same actor can participate in one or several trials at once, while environments can play host to multiple trials concurrently in different instances (see Figure 1 above). This feature gives Cogment projects the ability to train AI agents for collaborative tasks with homogeneous capabilities (like a swarm of drones inspecting a dam) or heterogeneous capabilities (like a logistical network with warehouses, autonomous trucks, and last-mile human drivers).

**Multi-source and retroactive rewards**

Cogment supports the training of any number of AI agent actors alongside any number of human actors sharing the same environment. Humans can contribute to this training through reward mechanisms, like evaluations of AI agents’ performance, or through supportive actions, like demonstrations or curriculum manipulation. The rewards can be aggregated from multiple sources regardless of their type through Cogment’s **multi-reward** capabilities (i.e. rewards coming from multiple sources). This approach opens the possibility of having several human evaluators in a single trial (for example, through crowd sourcing), thus compensating for the scarcity and slowness of human input compared to machines. It also enables AI agents to take into account contextual or subjective knowledge contributed by humans on top of objective goal measurements provided by the environment itself.
As an example, let’s consider training an AI to autonomously park a vehicle. Parking in an authorized spot, avoiding collisions, and getting it done quickly are all easily measurable goals. Maximizing the comfort of the occupants or taking into account their knowledge of which spots are more likely to be targeted by pigeons, on the other hand, are tasks that could entail drawing on subjective or contextual information provided by the vehicles’ users themselves.

Cogment also accounts for the inherent lag of some common human feedback types through its retroactive feedback capabilities (i.e. attaching rewards to past actions), all the while maintaining the ability to do online training as well. For example, a smart home AI agent dimming the lights at a specific time can retroactively learn from a negative reward when humans stand up and walk to the switch to override the AI agent’s change a few seconds later.

**Implementation swapping**

Cogment projects define the role actors take by putting in place contracts that specify how the actors (machines or humans) perceive the environment (the observation space) and what actions they can take (the action space.) These contracts enable a very interesting feature: **implementation swapping**. While observation and action spaces define the way actors can interact with the environment, their implementation defines how they behave in accordance with it. Several implementations can fulfill the same contract, and if they do, they can be swapped. This feature opens the door to several training and operating topologies that are especially relevant when involving humans. Furthermore, they make it easy to decouple the development process of each actor and the environment implementation.

Let’s say we are training two AI agents in an environment with two humans. The problem with starting the training with this setup is that humans will have to interact with untrained AI agents for a while before these agents start to do something useful, which is not an efficient use of humans. Cogment’s implementation swapping allows more interesting training setups to work around this issue. Simple examples of training regimen that can be easily implemented using Cogment include:

Bootstrapping with pseudo-humans: implement simple rule-based AI agents simulating or mimicking the behaviour of humans and run a lot of fully simulated trials with this setup. Once the AI agents have reached a good performance level, start involving the actual humans. For example, a product recommendation agent needs to understand the correlation between a recommendation and a purchase, which is something that might take some time. Implementing a very average or stereotypical human buyer behaviour (based on historical statistics, for example) will help the agent in learning a baseline, readying it before carrying on with learning more subtleties with actual humans.
Bootstrapping with business expertise-based AI agents: implement the two agents using, for example, a rule-based system to provide some value to the human, and add some stochasticity to bolster variety. These agents will start generating data that can be used to train Machine Learning (ML) based policies. Once these are good enough, the ML-based implementations can replace the initial ones. We can imagine, for example, a sensitive use case (such as a 911 dispatcher or air traffic support agent) in which a default, average-but-safe, heuristic policy is able to interact with the human, even in a real environment. To achieve this goal, another learning agent could be trained on this first, average-but-safe, agent experience and learn to make better decisions. Once the learning agent is determined to be safe enough, its implementation can be swapped in.

In the examples above, we put the emphasis on human-in-the-loop learning, but Cogment’s features obviously extend to AI-only multi-agent systems thanks to the framework’s actor formalism. Implementation swapping makes it easy to test implementations, whether in a diverse setting where they all interact with each other, or separately against one another.

**Tech agnostic**

When it comes to training AIs in simulations or real-world systems, and even more when including humans as evaluators or demonstrators, the required software involves a lot of moving pieces of different technologies. Cogment accounts for these varied needs through **technology-agnosticism**. On the “agent” side, most of the machine learning community is used to Python and fairly common libraries (from PyTorch to TensorFlow) but other fields in AI, such as planning or multi-agent systems, are used to a variety of other tech stacks. On the environment side, simulated and real-life environments can call for a very diverse stack of technologies, and also vary immensely in terms of complexity.

Some industries can involve platforms unifying both simulated and real life environments, like ROS for robotics, but for many others, the tech stacks are different. Simulated environments can rely on video game engines like Unreal or Unity, or industrial simulations such as Simulink. Real life environments can involve devices like sensors or other IoT equipment, big-data stacks (like Apache Spark), or industrial digital twins (like Azure Digital Twins). Finally, humans need an interface to interact with those simulations, usually in the form of a GUI or through voice. Tools to build those human-interfacing clients can be as evolved as video game engines, as custom as specifically designed web apps, or as lightweight as mobile apps.

The Cogment SDKs and underlying technologies allow the accommodation of these diverse paradigms without creating any conceptual or technological divides between research, prototyping, production and deployment phases of such systems.

**Distributed**

The amount of data commonly needed to train some AI agents usually requires the ability to run a large number of *trials*. In some instances, a lot of AI agents might be required to join these trials. Yet another added complexity is the need for a lot of humans to interact with the different trials (e.g, crowd sourcing). These humans might also be participating from different locations, requiring remote connections to the simulation instances. From the get go, Cogment applications are **distributed**, every part being implemented as a microservice communicating with the others using a tech-agnostic underlying protocol (gRPC).

Use cases

The full Cogment White Paper details a couple of use cases, including the test-bed project Quack Arena and the more complex Smart Dynamic Assistant agent in the context of 911 first responder dispatching. We recommend reading about these use cases in the full white paper if you want to dive into modelling, implementation, and results, or learn more about the Cogment framework’s core concepts and architectures.

Cogment: Our Open-Source Framework for Human-AI Collaboration

Why Cogment Matters

Key Features of Cogment

Use cases

AIR wins 2021 AI TechAward, named Deep Tech Pioneer, featured in AI Québec book

AI Redefined Leads the Way to More Human-centric AI with Cogment, an Open Source AI Development and Deployment Framework