hive.replays.legal_moves_replay module

class hive.replays.legal_moves_replay.LegalMovesBuffer(capacity, beta=0.5, stack_size=1, n_step=1, gamma=0.9, observation_shape=(), observation_dtype=<class 'numpy.uint8'>, action_shape=(), action_dtype=<class 'numpy.int8'>, reward_shape=(), reward_dtype=<class 'numpy.float32'>, extra_storage_types=None, action_dim=None, num_players_sharing_buffer=None)[source]

Bases: PrioritizedReplayBuffer

A Prioritized Replay buffer for the games like Hanabi with legal moves which need to add next_action_mask to the batch.

Parameters
  • capacity (int) – Total number of observations that can be stored in the buffer. Note, this is not the same as the number of transitions that can be stored in the buffer.

  • beta (float) – Parameter controlling level of prioritization.

  • stack_size (int) – The number of frames to stack to create an observation.

  • n_step (int) – Horizon used to compute n-step return reward

  • gamma (float) – Discounting factor used to compute n-step return reward

  • observation_shape (Tuple) – Shape of observations that will be stored in the buffer.

  • observation_dtype (type) – Type of observations that will be stored in the buffer. This can either be the type itself or string representation of the type. The type can be either a native python type or a numpy type. If a numpy type, a string of the form np.uint8 or numpy.uint8 is acceptable.

  • action_shape (Tuple) – Shape of actions that will be stored in the buffer.

  • action_dtype (type) – Type of actions that will be stored in the buffer. Format is described in the description of observation_dtype.

  • action_shape – Shape of actions that will be stored in the buffer.

  • action_dtype – Type of actions that will be stored in the buffer. Format is described in the description of observation_dtype.

  • reward_shape (Tuple) – Shape of rewards that will be stored in the buffer.

  • reward_dtype (type) – Type of rewards that will be stored in the buffer. Format is described in the description of observation_dtype.

  • extra_storage_types (dict) – A dictionary describing extra items to store in the buffer. The mapping should be from the name of the item to a (type, shape) tuple.

  • num_players_sharing_buffer (int) – Number of agents that share their buffers. It is used for self-play.

sample(batch_size)[source]

Sample transitions from the buffer. Adding next_action_mask to the batch for environments with legal moves.