ertk.dataset.CombinedDataset

class ertk.dataset.CombinedDataset(*datasets: Dataset)

Bases: Dataset

A dataset that joins one or more individual datasets together and merges annotations.

__init__(*datasets: Dataset)

Methods

__init__(*datasets)

Inherited Methods

annotation_type(annot_name)

clip_arrays(length)

Clips each array to the specified maximum length.

clone()

copy()

frame_arrays(frame_size, frame_shift[, ...])

Create a sequence of frames from the raw signal.

get_annotations(annot_name)

Get a list of annotations, one for each instance currently in the dataset.

get_audio_paths()

get_group_counts(annot_name)

Get group counts for a partition.

get_group_indices(annot_name)

Gets the group indices (i.e.

get_group_names(annot_name)

Get the names of groups in a partition.

get_idx_for_names(names)

Gets indices of instances corresponding to names.

get_idx_for_split(split[, return_complement])

Gets indices of instances corresponding to the selection given by split.

get_ratings(name[, rating_set])

Get per-annotator ratings of a specified column, for this dataset.

init_corpus_info(path)

Initialise corpus metadata from YAML.

map_and_select(map, select, remove)

Convenience function for mapping one or more partitions and then selecting one or more groups.

map_classes(mapping)

Modifies classses based on the mapping in map.

map_groups(part_name, mapping)

Map group names in a partition.

normalise(partition[, normaliser])

Transforms the data matrix of this dataset in-place using some (offline) normalisation method.

pad_arrays([pad])

Pads each array to the nearest multiple of pad greater than the array size.

remove_annotation(annot_name)

Removes a set of annotations from the dataset.

remove_classes(*[, drop, keep])

Remove instances with labels not in keep.

remove_groups(part_name, *[, drop, keep])

Remove instances corresponding to groups from the given partition.

remove_instances(*[, drop, keep])

Remove instances from dataset.

remove_ratings(rating_set)

Delete a set of ratings for this dataset.

rename_annotation(old_name, new_name)

Renames an annotation.

transpose_time()

Transpose the time and feature axis of each instance.

update_annotation(annot_name, annotations[, ...])

Add or update an annotation.

update_features(features, **read_kwargs)

Update the features matrix and feature names for this dataset.

update_labels(labels)

update_ratings(rating_set, ratings)

Update a set of ratings.

use_subset([subset])

Use a different subset of the instances.

Attributes

annotations

Full annotation matrix for dataset.

class_counts

Number of instances for each class.

classes

List of unique class labels.

corpus

The corpus this dataset represents.

corpus_indices

Indices into corpora list of corresponding corpus for each instance.

corpus_names

List of corpora in this CombinedDataset.

description

The descriptive name of this corpus

feature_names

List of feature names.

label_annot

The annotation used as target label.

labels

List of labels for instances.

n_classes

Number of unique classes.

n_features

Number of features.

n_instances

Number of instances in this dataset.

n_speakers

names

List of instance names.

partitions

Partitions in this dataset.

ratings

Full ratings for dataset.

speaker_counts

Number of instances for each speaker.

speaker_indices

Indices into speakers array of corresponding speaker for each instance.

speaker_names

List of unique speakers in this dataset.

speakers

subset

Name of clip subset used.

subsets

Dict from subset name to set of clip names.

x

The data matrix.

y

The class label array; one label per instance.

property corpus_indices: ndarray

Indices into corpora list of corresponding corpus for each instance.

property corpus_names: List[str]

List of corpora in this CombinedDataset.