ertk.utils.batch_arrays_by_length

ertk.utils.batch_arrays_by_length(arrays_x: ndarray | List[ndarray], y: ndarray, batch_size: int = 32, shuffle: bool = True, uniform_batch_size: bool = False) → Tuple[ndarray, ndarray]

Batches a list of arrays of different sizes, grouping them by size. This is designed for use with variable length sequences. Each batch will have a maximum of batch_size arrays, but may have less if there are fewer arrays of the same length. It is recommended to use the pad_arrays() method of the Dataset instance before using this function, in order to quantise the lengths.

Parameters:

arrays_x: list of ndarray: A list of N-D arrays, possibly of different lengths, to batch. The assumption is that all the arrays have the same rank and only axis 0 differs in length.
y: ndarray: The labels for each of the arrays in arrays_x.
batch_size: int: Arrays will be grouped together by size, up to a maximum of batch_size, after which a new batch will be created. Thus each batch produced will have between 1 and batch_size items.
shuffle: bool, default = True: Whether to shuffle array order in a batch.
uniform_batch_size: bool, default = False: Whether to keep all batches the same size, batch_size, and pad with zeros if necessary, or have batches of different sizes if there aren’t enough sequences to group together.

Returns:

x_list: ndarray,: The batched arrays. x_list[i] is the i’th batch, having between 1 and batch_size items, each of length lengths[i].
y_list: ndarray: The batched labels corresponding to sequences in x_list. y_list[i] has the same length as x_list[i].