Environments

Installing Environments

We support several environments in RLHive, namely:

Atari
Gym classic control
Minatar (simplified Atari)
Minigrid (single-agent grid world)
Marlgrid (multi-agent)
Pettingzoo (multi-agent)

While gym comes installed with the base package, you need to install the other environments. See Installation for more details.

Creating an Environment

RLHive Environments

Every environment used in RLHive should be a subclass of ~hive.envs.base.BaseEnv. It should provide a reset function that resets the environment to a new episode and returns a tuple of (observation, turn) and a step function that takes in an action, performs the step in the environment, and returns a tuple of (observation, reward, done, turn, info). All these values correspond to their canonical meanings, and turn corresponds to the index of the agent whose turn it is (in multi-agent environments).

The reward return value can be a single number, an array, or a dictionary. If it’s a number, then that same reward will be given to every single agent. If it’s an array, the agents get the reward corresponding to their index in the runner. If it’s a dictionary, the keys should be the agent ids, and the value the reward for that agent.

Each environment should also provide an EnvSpec environment that will provide information about the environment such as the expected observation shape and action dimension for each agent. These should be lists with one element for each agent. See GymEnv for an example.

Gym environments

If your environment is a gym environment, and you do not need to preprocess the observations generated by the environment, then you can directly use the GymEnv. Just make sure you register your environment with gym, and pass the name of the environment to the GymEnv constructor.

If you need to add extra preprocessing or change the default way that environment/EnvSpec creation is done, you can simply subclass this class and override either create_env() and/or create_env_spec(), as in AtariEnv.

Parallel Step Environments

Multi-agent environments usually come in two flavors: sequential step environments, where each agent takes it’s action one at a time, and parallel step environments, where each agent steps at the same time. The MultiAgentRunner class expects only sequential step environments. Fortunately, we can convert between parallel step environments and single step environments by simply generating the action for each agent one at a time and passing all the action to the parallel step environment all at once. To facilitate this, we provide a utility class ParallelEnv. Simply write the logic for your parallel step environment as normal, and then create a single step version of the environment by subclassing ParallelEnv and the parallel step environment, making sure to put ParallelEnv first in the superclass list.

from hive.envs.base import BaseEnv, ParallelEnv

class ParallelStepEnvironment(BaseEnv):
    # Write the logic needed for the parallel step environment. Assume the step
    # function gets an array actions as it's input, and should return an array
    # containing the observations for each agent, as well as the other return
    # values expected by the environment.

class SequentialStepEnvironment(ParallelEnv, ParallelStepEnvironment):
    # Any other logic needed to create the environmnet.