hive.replays.legal_moves_replay module

class hive.replays.legal_moves_replay.LegalMovesBuffer(capacity, beta=0.5, stack_size=1, n_step=1, gamma=0.9, observation_shape=(), observation_dtype=<class 'numpy.uint8'>, action_shape=(), action_dtype=<class 'numpy.int8'>, reward_shape=(), reward_dtype=<class 'numpy.float32'>, extra_storage_types=None, action_dim=None, num_players_sharing_buffer=None)[source]

Bases: PrioritizedReplayBuffer

A Prioritized Replay buffer for the games like Hanabi with legal moves which need to add next_action_mask to the batch.

Parameters

capacity (int) – Total number of observations that can be stored in the buffer. Note, this is not the same as the number of transitions that can be stored in the buffer.
beta (float) – Parameter controlling level of prioritization.
stack_size (int) – The number of frames to stack to create an observation.
n_step (int) – Horizon used to compute n-step return reward
gamma (float) – Discounting factor used to compute n-step return reward
observation_shape (Tuple) – Shape of observations that will be stored in the buffer.
observation_dtype (type) – Type of observations that will be stored in the buffer. This can either be the type itself or string representation of the type. The type can be either a native python type or a numpy type. If a numpy type, a string of the form np.uint8 or numpy.uint8 is acceptable.
action_shape (Tuple) – Shape of actions that will be stored in the buffer.
action_dtype (type) – Type of actions that will be stored in the buffer. Format is described in the description of observation_dtype.
action_shape – Shape of actions that will be stored in the buffer.
action_dtype – Type of actions that will be stored in the buffer. Format is described in the description of observation_dtype.
reward_shape (Tuple) – Shape of rewards that will be stored in the buffer.
reward_dtype (type) – Type of rewards that will be stored in the buffer. Format is described in the description of observation_dtype.
extra_storage_types (dict) – A dictionary describing extra items to store in the buffer. The mapping should be from the name of the item to a (type, shape) tuple.
num_players_sharing_buffer (int) – Number of agents that share their buffers. It is used for self-play.

sample(batch_size)[source]: Sample transitions from the buffer. Adding next_action_mask to the batch for environments with legal moves.