ertk.classification.dataset_cross_validation
- ertk.classification.dataset_cross_validation(clf, dataset: Dataset, clf_lib: str, partition: str | None = None, label: str = 'label', cv: BaseCrossValidator | int = 10, verbose: int = 0, n_jobs: int = 1, scoring=None, fit_params: Dict[str, Any] = {}) ExperimentResult
Cross validates a
Classifierinstance on a single dataset.- Parameters:
- clf: class that implements fit() and predict()
The classifier to test.
- dataset: Dataset
The dataset for within-corpus cross-validation.
- clf_lib: str
One of {“sk”, “tf”, “pt”} to select which library-specific cross-validation method to use, since they’re not all quite compatible.
- partition: str, optional
The name of the partition to cross-validate over. If None, then don’t use group cross-validation.
- label: str
The annotations to use as class labels.
- cv: int or BaseCrossValidator
A splitter used for cross-validation. Default is KFold(10) for 10 fold cross-validation.
- verbose: bool
Passed to cross_validate().
- n_jobs: bool
Passed to cross_validate().
- scoring: str, list, dict, optional
Scoring metric(s) to use. Can be anything accepted by scikit-learn’s cross_val* methods (i.e. str, list or dict).
- fit_params: dict
Additional parameters passed to the model’s fit() method. This should be used to pass any more specific parameters not covered here.
- Returns:
- df: pandas.DataFrame
A dataframe holding the results from all runs with this model.