Dataset classes and functions

This module contains classes and functions for working with datasets.

Features

read_features(path, **kwargs)

Read features from given path into memory.

read_features_iterable(path, **kwargs)

Read features from given path either sequentially or by index.

register_format(suffix, read, write)

Register new read/write functions for a given file format.

write_features(path, features, names[, ...])

Convenience function to write features to given path.

Annotations

read_annotations(filename[, dtype])

Returns a pd.Series containing values for this annotation for each instance, indexed by name.

write_annotations(annotations[, name, path])

Write sorted annotations CSV.

Dataset loading and manipulation

CombinedDataset(*datasets)

A dataset that joins one or more individual datasets together and merges annotations.

CorpusInfo(name, description, annotations, ...)

Defines information about a dataset.

DataLoadConfig(datasets, ...)

Defines a configuration for loading one or more datasets.

DataSelector(subset, groups, ...)

Defines a selection of data to keep.

Dataset(corpus_info[, features, subset, label])

Class representing a generic dataset, consisting of a set of features and optional partitions and annotations.

DatasetConfig(path, features, read_kwargs, ...)

Defines a dataset configuration.

MapGroups([map])

Defined a mapping between categorical groups.

RemoveGroups(drop, keep)

Defines a set of groups to keep or remove from a dataset.

SubsetInfo([clips, description])

Defines a subset of a dataset.

load_datasets_config(config)

Load one or more datasets from a DataLoadConfig.

load_multiple(corpus_files[, features, ...])

Load one or more datasets with the given features.

Utilities

get_audio_paths(path[, absolute])

Given a path to a dir or list of audio files, return a sequence of absolute paths to those files.

resample_audio(paths, dir[, sr, n_jobs])

Resample given audio clips to 16 kHz 16-bit WAV, and place in direcotory given by dir.

resample_rename_clips(mapping[, sr, n_jobs])

Resample given audio clips to 16 kHz 16-bit WAV.

write_filelist(paths, path)

Write sorted file list.