User Guide

Discover how cca-zoo empowers your exploration of multiview data using Canonical Correlation Analysis (CCA) and its various innovative forms.

Model Fitting

Preparing Your Data

Firstly, ensure your data is appropriately preprocessed. In this example, we generate two views of synthetic data, each comprising 10 features:

import numpy as np

# Create synthetic data with two views
train_view_1 = np.random.normal(size=(100, 10))
train_view_2 = np.random.normal(size=(100, 10))

# Remove the mean to normalize data
train_view_1 -= train_view_1.mean(axis=0)
train_view_2 -= train_view_2.mean(axis=0)

Initiating and Fitting Your Model

To start, instantiate your CCA model, specifying the desired latent dimensions:

from cca_zoo.models import CCA

latent_dims = 3
linear_cca = CCA(latent_dims=latent_dims)

# Fit the model
linear_cca.fit([train_view_1, train_view_2])

Hyperparameter Tuning

Manual vs Data-Driven Approaches

Hyperparameters can be manually set during model instantiation. Alternatively, the gridsearch_fit() method offers a systematic, data-driven tuning approach.

Consider the following example for the regularized CCA (rCCA):

from cca_zoo.models import rCCA
from cca_zoo.model_selection import GridSearchCV

# Custom scoring function returning mean correlation in latent space
def scorer(estimator, X):
   dim_corrs = estimator.score(X)
   return dim_corrs.mean()

# Define grid of potential regularization parameters for each view
c1 = [0.1, 0.3, 0.7, 0.9]
c2 = [0.1, 0.3, 0.7, 0.9]
param_grid = {'c': [c1, c2]}

cv = 5  # Specify number of folds for cross-validation

# Grid search with rCCA
ridge = GridSearchCV(rCCA(latent_dims=latent_dims), param_grid=param_grid,
                     cv=cv, verbose=True, scoring=scorer).fit([train_view_1, train_view_2]).best_estimator_

Model Transformations

Following model fitting, transform your data to obtain latent projections for each view:

projections = ridge.transform([train_view_1, train_view_2])

Or employ the fit_transform for a simultaneous fit and transformation:

projections = ridge.fit_transform([train_view_1, train_view_2])

Model Evaluation

Evaluate your model by determining its correlation in the latent space:

correlation = ridge.score([train_view_1, train_view_2])

For tensor CCA models, this represents higher-order correlations within each dimension.

Extracting Model Weights

For specific CCA applications, accessing model weights—i.e., the linear transformations mapping each view to the latent space—is crucial. Here’s how:

view_1_weights = ridge.weights[0]
view_2_weights = ridge.weights[1]

Unraveling Deep Models

Deep models in cca-zoo harness neural networks as view encoders, offering a way to capture intricate relationships between views.

Constructing Encoder Architectures

Define encoder networks’ architectures, like the following multi-layer perceptrons (MLPs) example:

from cca_zoo.deepmodels import architectures

encoder_1 = architectures.Encoder(latent_dims=latent_dims, feature_size=784)
encoder_2 = architectures.Encoder(latent_dims=latent_dims, feature_size=784)

Deep CCA Model Initiation

Instantiate a deep CCA model using the encoders:

from cca_zoo.deepmodels import DCCA

dcca_model = DCCA(latent_dims=latent_dims, encoders=[encoder_1, encoder_2])

The output is a PyTorch.nn.Module object, which can undergo updates in a custom training loop. Furthermore, the provided LightningModule class (from pytorch-lightning) simplifies the training of these models.