hive.runners.multi_agent_loop module

class hive.runners.multi_agent_loop.MultiAgentRunner(environment, agents, loggers, experiment_manager, train_steps, num_agents, eval_environment=None, test_frequency=-1, test_episodes=1, stack_size=1, self_play=False, max_steps_per_episode=1000000000.0, seed=None)[source]

Bases: Runner

Runner class used to implement a multiagent training loop.

Initializes the MultiAgentRunner object.

Parameters
  • environment (BaseEnv) – Environment used in the training loop.

  • agent (Agent) – Agent that will interact with the environment

  • loggers (List[ScheduledLogger]) – List of loggers used to log metrics.

  • experiment_manager (Experiment) – Experiment object that saves the state of the training.

  • train_steps (int) – How many steps to train for. This is the number of times that agent.update is called. If this is -1, there is no limit for the number of training steps.

  • num_agents (int) – Number of agents running in this multiagent experiment.

  • eval_environment (BaseEnv) – Environment used to evaluate the agent. If None, the environment parameter (which is a function) is used to create a second environment.

  • test_frequency (int) – After how many training steps to run testing episodes. If this is -1, testing is not run.

  • test_episodes (int) – How many episodes to run testing for duing each test phase.

  • stack_size (int) – The number of frames in an observation sent to an agent.

  • self_play (bool) – Whether this multiagent experiment is run in self-play mode. In this mode, only the first agent in the list of agents provided in the config is created. This agent performs actions for each player in the multiagent environment.

  • max_steps_per_episode (int) – The maximum number of steps to run an episode for.

  • seed (int) – Seed used to set the global seed for libraries used by Hive and seed the Seeder.

run_one_step(environment, observation, turn, episode_metrics, transition_info, agent_traj_states)[source]

Run one step of the training loop.

If it is the agent’s first turn during the episode, do not run an update step. Otherwise, run an update step based on the previous action and accumulated reward since then.

Parameters
  • environment (BaseEnv) – Environment in which the agent will take a step in.

  • observation – Current observation that the agent should create an action for.

  • turn (int) – Agent whose turn it is.

  • episode_metrics (Metrics) – Keeps track of metrics for current episode.

  • transition_info (TransitionInfo) – Used to keep track of the most recent transition for each agent.

  • agent_traj_states – List of trajectory state objects that will be passed to each agent when act and update are called. The agent returns new trajectory states to replace the state passed in.

run_end_step(episode_metrics, transition_info, agent_traj_states, terminated=True, truncated=False)[source]

Run the final step of an episode.

After an episode ends, iterate through agents and update then with the final step in the episode.

Parameters
  • episode_metrics (Metrics) – Keeps track of metrics for current episode.

  • transition_info (TransitionInfo) – Used to keep track of the most recent transition for each agent.

  • agent_traj_states – List of trajectory state objects that will be passed to each agent when act and update are called. The agent returns new trajectory states to replace the state passed in.

  • terminated (bool) – Whether this step was terminal.

  • truncated (bool) – Whether this step was terminal.

run_episode(environment)[source]

Run a single episode of the environment.

Parameters

environment (BaseEnv) – Environment in which the agent will take a step in.