Replays
RLHive currently provides 4 types of Replays:
CircularReplayBuffer
: An implementation of a FIFO circular replay buffer. Stores individual observations and constructs transitions on the fly when sampling to save space.SimpleReplayBuffer
: A simplified version of a FIFO circular replay buffer that stores individual transitions directly.PrioritizedReplayBuffer
: A subclass ofCircularReplayBuffer
that adds prioritized sampling.LegalMovesReplayBuffer
: A subclass ofPrioritizedReplayBuffer
that stores/handles legal moves.
The main replay buffer classes that you will likely use/extend are
CircularReplayBuffer
and
PrioritizedReplayBuffer
.
By default, these classes expect the arguments "observation"
, "action"
,
"reward"
, and "done"
when adding to the buffer. You can also provide alternative
shapes/dtypes for these keys, and the buffer will try to automatically cast the objects
you add to the buffer.
Along with these default keys, you can also store extra keys in the buffer. When
creating the buffer, provide a dictionary with key-value pairs key: (type, shape)
.
When adding, you can directly provide this key as an argument to the
add()
method, and it will
automatically be added to the batch dictionary that you sample.