Environments
Installing Environments
We support several environments in RLHive, namely:
Atari
Gym classic control
Minatar (simplified Atari)
Minigrid (single-agent grid world)
Marlgrid (multi-agent)
Pettingzoo (multi-agent)
While gym comes installed with the base package, you need to install the other environments. See Installation for more details.
Creating an Environment
RLHive Environments
Every environment used in RLHive should be a subclass of ~hive.envs.base.BaseEnv.
It should provide a reset
function that resets the environment to a new episode
and returns a tuple of (observation, turn)
and a step
function that takes in
an action, performs the step in the environment, and returns a tuple of
(observation, reward, done, turn, info)
. All these values correspond to their
canonical meanings, and turn
corresponds to the index
of the agent whose turn it is (in multi-agent environments).
The reward
return value can be a single number, an array, or a dictionary. If it’s
a number, then that same reward will be given to every single agent. If it’s an array,
the agents get the reward corresponding to their index in the runner. If it’s a
dictionary, the keys should be the agent ids, and the value the reward for that agent.
Each environment should also provide an EnvSpec
environment that will provide information about the environment such as the expected
observation shape and action dimension for each agent. These should be lists with one
element for each agent. See GymEnv
for an example.
Gym environments
If your environment is a gym environment, and you do not need to preprocess the
observations generated by the environment, then you can directly use the
GymEnv
. Just make sure you register your environment
with gym
, and pass the name of the environment to the
GymEnv
constructor.
If you need to add extra preprocessing or change the default way that
environment/EnvSpec creation is done, you can simply subclass this class and override
either create_env()
and/or
create_env_spec()
, as in
AtariEnv
.
Parallel Step Environments
Multi-agent environments usually come in two flavors: sequential step environments,
where each agent takes it’s action one at a time, and parallel step environments,
where each agent steps at the same time. The
MultiAgentRunner
class expects only
sequential step environments. Fortunately, we can convert between parallel step
environments and single step environments by simply generating the action for each
agent one at a time and passing all the action to the parallel step environment all
at once. To facilitate this, we provide a utility class
ParallelEnv
. Simply write the logic for your parallel step
environment as normal, and then create a single step version of the environment by
subclassing ParallelEnv
and the parallel step environment,
making sure to put ParallelEnv
first in the
superclass list.
from hive.envs.base import BaseEnv, ParallelEnv
class ParallelStepEnvironment(BaseEnv):
# Write the logic needed for the parallel step environment. Assume the step
# function gets an array actions as it's input, and should return an array
# containing the observations for each agent, as well as the other return
# values expected by the environment.
class SequentialStepEnvironment(ParallelEnv, ParallelStepEnvironment):
# Any other logic needed to create the environmnet.