ertk.utils.batch_arrays_by_length

ertk.utils.batch_arrays_by_length(arrays_x: ndarray | List[ndarray], y: ndarray, batch_size: int = 32, shuffle: bool = True, uniform_batch_size: bool = False) Tuple[ndarray, ndarray]

Batches a list of arrays of different sizes, grouping them by size. This is designed for use with variable length sequences. Each batch will have a maximum of batch_size arrays, but may have less if there are fewer arrays of the same length. It is recommended to use the pad_arrays() method of the Dataset instance before using this function, in order to quantise the lengths.

Parameters:
arrays_x: list of ndarray

A list of N-D arrays, possibly of different lengths, to batch. The assumption is that all the arrays have the same rank and only axis 0 differs in length.

y: ndarray

The labels for each of the arrays in arrays_x.

batch_size: int

Arrays will be grouped together by size, up to a maximum of batch_size, after which a new batch will be created. Thus each batch produced will have between 1 and batch_size items.

shuffle: bool, default = True

Whether to shuffle array order in a batch.

uniform_batch_size: bool, default = False

Whether to keep all batches the same size, batch_size, and pad with zeros if necessary, or have batches of different sizes if there aren’t enough sequences to group together.

Returns:
x_list: ndarray,

The batched arrays. x_list[i] is the i’th batch, having between 1 and batch_size items, each of length lengths[i].

y_list: ndarray

The batched labels corresponding to sequences in x_list. y_list[i] has the same length as x_list[i].