User Guideο
Explore the robust capabilities of cca-zoo in facilitating multiview data analysis through Canonical Correlation Analysis (CCA) and its advanced variations.
Model Fittingο
Preparing Your Dataο
Ensure your data is appropriately preprocessed before analysis. In this example, we create two synthetic views, each containing 10 features.
import numpy as np
# Create synthetic data for two views
train_view_1 = np.random.normal(size=(100, 10))
train_view_2 = np.random.normal(size=(100, 10))
# Normalize the data by removing the mean
train_view_1 -= train_view_1.mean(axis=0)
train_view_2 -= train_view_2.mean(axis=0)
Initiating and Fitting Your Modelο
To begin, instantiate the CCA model and specify the desired number of latent dimensions.
from cca_zoo.models import CCA
latent_dims = 3
linear_cca = CCA(latent_dims=latent_dims)
# Fit the model
linear_cca.fit([train_view_1, train_view_2])
Hyperparameter Tuningο
Manual vs Data-Driven Approachesο
Hyperparameters can either be manually configured during model initialization or tuned in a data-driven manner using the gridsearch_fit() method.
from cca_zoo.models import rCCA
from cca_zoo.model_selection import GridSearchCV
# Custom scoring function
def scorer(estimator, X):
dim_corrs = estimator.score(X)
return dim_corrs.mean()
# Define grid of potential regularization parameters
c1 = [0.1, 0.3, 0.7, 0.9]
c2 = [0.1, 0.3, 0.7, 0.9]
param_grid = {'c': [c1, c2]}
cv = 5 # Number of folds in cross-validation
# Conduct grid search
ridge = GridSearchCV(rCCA(latent_dims=latent_dims), param_grid=param_grid,
cv=cv, verbose=True, scoring=scorer).fit([train_view_1, train_view_2]).best_estimator_
Model Transformationsο
Transform your data post-fitting to obtain latent projections for each view.
projections = ridge.transform([train_view_1, train_view_2])
Alternatively, use fit_transform for simultaneous fitting and transformation.
projections = ridge.fit_transform([train_view_1, train_view_2])
Model Evaluationο
Assess the performance of your model by evaluating the correlations in the latent space.
correlation = ridge.score([train_view_1, train_view_2])
For tensor-based CCA models, this score represents higher-order correlations in each dimension.
Extracting Model Weightsο
In specialized applications, it may be essential to access the modelβs linear transformations for each view.
view_1_weights = ridge.weights[0]
view_2_weights = ridge.weights[1]
Deep Models in CCA-Zooο
Deep models in cca-zoo utilize neural networks as view encoders, capturing complex relationships between different views.
Constructing Encoder Architecturesο
Here, we define encoder architectures using multi-layer perceptrons (MLPs).
from cca_zoo.deepmodels import architectures
encoder_1 = architectures.Encoder(latent_dims=latent_dims, feature_size=784)
encoder_2 = architectures.Encoder(latent_dims=latent_dims, feature_size=784)
Deep CCA Model Initiationο
Initialize a Deep CCA model using the encoder architectures.
from cca_zoo.deepmodels import DCCA
dcca_model = DCCA(latent_dims=latent_dims, encoders=[encoder_1, encoder_2])
The resulting object is a PyTorch.nn.Module, allowing for further updates in a custom training loop.