cca_zoo.model_selection.cross_validate#

class cca_zoo.model_selection.cross_validate(estimator, views, y=None, *, groups=None, scoring=None, cv=None, n_jobs=None, verbose=0, fit_params=None, pre_dispatch='2*n_jobs', return_train_score=False, return_estimator=False, error_score=nan)[source]#

Bases:

Evaluate metric(s) by cross-validation and also record fit/score times.

Read more in the User Guide.

Parameters:
  • estimator (object) – Estimator object implementing ‘fit’. The object to use to fit the data.

  • views (list or tuple of array-like) – List or tuple of numpy arrays or array-likes with the same number of rows (samples).

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs), optional, default=None) – The target variable to try to predict in the case of supervised learning.

  • groups (array-like of shape (n_samples,), optional, default=None) – Group labels for the samples used while splitting the dataset into train/test set. Only used in conjunction with a “Group” cv instance (e.g., GroupKFold).

  • scoring (str, callable, list, tuple, or dict, optional, default=None) – Strategy to evaluate the performance of the cross-validated model on the test set. See notes below for more detail.

  • cv (int, cross-validation generator or an iterable, optional, default=None) – Determines the cross-validation splitting strategy. See notes below for more detail.

  • n_jobs (int, optional, default=None) – Number of jobs to run in parallel.

  • verbose (int, default=0) – The verbosity level.

  • fit_params (dict, optional, default=None) – Parameters to pass to the fit method of the estimator.

  • pre_dispatch (int or str, default='2*n_jobs') – Controls the number of jobs that get dispatched during parallel execution. See notes below for more detail.

Notes

For scoring: If scoring represents a single score, one can use:

If scoring represents multiple scores, one can use:
  • a list or tuple of unique strings;

  • a callable returning a dictionary where the keys are the metric names and the values are the metric scores;

  • a dictionary with metric names as keys and callables a values.

See Specifying multiple metrics for evaluation for an example.

For cv: Possible inputs for cv are:

  • None, to use the default 5-fold cross validation,

  • int, to specify the number of folds in a (Stratified)KFold,

  • CV splitter,

  • An iterable yielding (train, test) splits as arrays of indices.

For int/None inputs, if the estimator is a classifier and y is either binary or multiclass, StratifiedKFold is used. In all other cases, Fold is used. These splitters are instantiated with shuffle=False so the splits will be the same across calls. Refer User Guide for the various cross-validation strategies that can be used here.

For pre_dispatch: This parameter can be:

  • None, in which case all the jobs are immediately created and spawned. Use this for lightweight and fast-running jobs, to avoid delays due to on-demand spawning of the jobs

  • An int, giving the exact number of total jobs that are spawned

  • A str, giving an expression as a function of n_jobs, as in ‘2*n_jobs’