Note
Click here to download the full example code
Hyperparameter Selection¶
This script will show how to perform hyperparameter selection
import numpy as np
import pandas as pd
from sklearn.utils.fixes import loguniform
from cca_zoo.data import generate_covariance_data
from cca_zoo.model_selection import GridSearchCV, RandomizedSearchCV
from cca_zoo.models import KCCA
np.random.seed(42)
n = 200
p = 100
q = 100
latent_dims = 1
cv = 3
(X, Y), (tx, ty) = generate_covariance_data(
n, view_features=[p, q], latent_dims=latent_dims, correlation=[0.9]
)
Grid Search¶
Hyperparameter selection works in a very similar way to in scikit-learn where the main difference is in how we enter the parameter grid. We form a parameter grid with the search space for each view for each parameter. This search space must be entered as a list but can be any of - a single value (as in “kernel”) where this value will be used for each view - a list for each view - a mixture of a single value for one view and a distribution or list for the other
param_grid = {"kernel": ["poly"], "c": [[1e-1], [1e-1, 2e-1]], "degree": [[2], [2, 3]]}
kernel_reg = GridSearchCV(
KCCA(latent_dims=latent_dims), param_grid=param_grid, cv=cv, verbose=True
).fit([X, Y])
print(pd.DataFrame(kernel_reg.cv_results_))
Out:
Fitting 3 folds for each of 4 candidates, totalling 12 fits
mean_fit_time std_fit_time ... std_test_score rank_test_score
0 0.019541 0.001504 ... 0.072324 4
1 0.021130 0.000043 ... 0.068810 2
2 0.018554 0.000090 ... 0.060830 3
3 0.021025 0.000193 ... 0.061616 1
[4 rows x 14 columns]
Randomized Search¶
With Randomized Search we can additionally use distributions from scikit-learn to define the parameter search space
param_grid = {
"kernel": ["poly"],
"c": [loguniform(1e-1, 2e-1), [1e-1]],
"degree": [[2], [2, 3]],
}
kernel_reg = RandomizedSearchCV(
KCCA(latent_dims=latent_dims), param_distributions=param_grid, cv=cv, verbose=True
).fit([X, Y])
print(pd.DataFrame(kernel_reg.cv_results_))
Out:
Fitting 3 folds for each of 10 candidates, totalling 30 fits
mean_fit_time std_fit_time ... std_test_score rank_test_score
0 0.019662 0.000104 ... 0.061057 9
1 0.020962 0.000117 ... 0.064763 2
2 0.019322 0.001202 ... 0.067161 5
3 0.020776 0.000173 ... 0.063162 3
4 0.020895 0.000220 ... 0.059467 4
5 0.021833 0.003942 ... 0.061797 8
6 0.018183 0.000095 ... 0.063509 7
7 0.020684 0.000175 ... 0.068683 1
8 0.018370 0.000150 ... 0.060346 10
9 0.018471 0.000213 ... 0.064719 6
[10 rows x 14 columns]
Total running time of the script: ( 0 minutes 1.223 seconds)