ertk.utils.batch_arrays_by_length
- ertk.utils.batch_arrays_by_length(arrays_x: ndarray | List[ndarray], y: ndarray, batch_size: int = 32, shuffle: bool = True, uniform_batch_size: bool = False) Tuple[ndarray, ndarray]
Batches a list of arrays of different sizes, grouping them by size. This is designed for use with variable length sequences. Each batch will have a maximum of batch_size arrays, but may have less if there are fewer arrays of the same length. It is recommended to use the
pad_arrays()method of theDatasetinstance before using this function, in order to quantise the lengths.- Parameters:
- arrays_x: list of ndarray
A list of N-D arrays, possibly of different lengths, to batch. The assumption is that all the arrays have the same rank and only axis 0 differs in length.
- y: ndarray
The labels for each of the arrays in arrays_x.
- batch_size: int
Arrays will be grouped together by size, up to a maximum of batch_size, after which a new batch will be created. Thus each batch produced will have between 1 and batch_size items.
- shuffle: bool, default = True
Whether to shuffle array order in a batch.
- uniform_batch_size: bool, default = False
Whether to keep all batches the same size, batch_size, and pad with zeros if necessary, or have batches of different sizes if there aren’t enough sequences to group together.
- Returns:
- x_list: ndarray,
The batched arrays. x_list[i] is the i’th batch, having between 1 and batch_size items, each of length lengths[i].
- y_list: ndarray
The batched labels corresponding to sequences in x_list. y_list[i] has the same length as x_list[i].