ertk.dataset.CombinedDataset

class ertk.dataset.CombinedDataset(*datasets: Dataset)

Bases: Dataset

A dataset that joins one or more individual datasets together and merges annotations.

__init__(*datasets: Dataset)

Methods

__init__(*datasets)

Inherited Methods

`annotation_type`(annot_name)
`clip_arrays`(length)	Clips each array to the specified maximum length.
`clone`()
`copy`()
`frame_arrays`(frame_size, frame_shift[, ...])	Create a sequence of frames from the raw signal.
`get_annotations`(annot_name)	Get a list of annotations, one for each instance currently in the dataset.
`get_audio_paths`()
`get_group_counts`(annot_name)	Get group counts for a partition.
`get_group_indices`(annot_name)	Gets the group indices (i.e.
`get_group_names`(annot_name)	Get the names of groups in a partition.
`get_idx_for_names`(names)	Gets indices of instances corresponding to `names`.
`get_idx_for_split`(split[, return_complement])	Gets indices of instances corresponding to the selection given by `split`.
`get_ratings`(name[, rating_set])	Get per-annotator ratings of a specified column, for this dataset.
`init_corpus_info`(path)	Initialise corpus metadata from YAML.
`map_and_select`(map, select, remove)	Convenience function for mapping one or more partitions and then selecting one or more groups.
`map_classes`(mapping)	Modifies classses based on the mapping in map.
`map_groups`(part_name, mapping)	Map group names in a partition.
`normalise`(partition[, normaliser])	Transforms the data matrix of this dataset in-place using some (offline) normalisation method.
`pad_arrays`([pad])	Pads each array to the nearest multiple of `pad` greater than the array size.
`remove_annotation`(annot_name)	Removes a set of annotations from the dataset.
`remove_classes`(*[, drop, keep])	Remove instances with labels not in `keep`.
`remove_groups`(part_name, *[, drop, keep])	Remove instances corresponding to groups from the given partition.
`remove_instances`(*[, drop, keep])	Remove instances from dataset.
`remove_ratings`(rating_set)	Delete a set of ratings for this dataset.
`rename_annotation`(old_name, new_name)	Renames an annotation.
`transpose_time`()	Transpose the time and feature axis of each instance.
`update_annotation`(annot_name, annotations[, ...])	Add or update an annotation.
`update_features`(features, **read_kwargs)	Update the features matrix and feature names for this dataset.
`update_labels`(labels)
`update_ratings`(rating_set, ratings)	Update a set of ratings.
`use_subset`([subset])	Use a different subset of the instances.

Attributes

`annotations`	Full annotation matrix for dataset.
`class_counts`	Number of instances for each class.
`classes`	List of unique class labels.
`corpus`	The corpus this dataset represents.
`corpus_indices`	Indices into corpora list of corresponding corpus for each instance.
`corpus_names`	List of corpora in this CombinedDataset.
`description`	The descriptive name of this corpus
`feature_names`	List of feature names.
`label_annot`	The annotation used as target label.
`labels`	List of labels for instances.
`n_classes`	Number of unique classes.
`n_features`	Number of features.
`n_instances`	Number of instances in this dataset.
`n_speakers`
`names`	List of instance names.
`partitions`	Partitions in this dataset.
`ratings`	Full ratings for dataset.
`speaker_counts`	Number of instances for each speaker.
`speaker_indices`	Indices into speakers array of corresponding speaker for each instance.
`speaker_names`	List of unique speakers in this dataset.
`speakers`
`subset`	Name of clip subset used.
`subsets`	Dict from subset name to set of clip names.
`x`	The data matrix.
`y`	The class label array; one label per instance.

property corpus_indices: ndarray: Indices into corpora list of corresponding corpus for each instance.

property corpus_names: List[str]: List of corpora in this CombinedDataset.