User Guide

Explore the robust capabilities of cca-zoo in facilitating multiview data analysis through Canonical Correlation Analysis (CCA) and its advanced variations.

Model Fitting

Preparing Your Data

Ensure your data is appropriately preprocessed before analysis. In this example, we create two synthetic views, each containing 10 features.

import numpy as np

# Create synthetic data for two views
train_view_1 = np.random.normal(size=(100, 10))
train_view_2 = np.random.normal(size=(100, 10))

# Normalize the data by removing the mean
train_view_1 -= train_view_1.mean(axis=0)
train_view_2 -= train_view_2.mean(axis=0)

Initiating and Fitting Your Model

To begin, instantiate the CCA model and specify the desired number of latent dimensions.

from cca_zoo.models import CCA

latent_dims = 3
linear_cca = CCA(latent_dims=latent_dims)

# Fit the model
linear_cca.fit([train_view_1, train_view_2])

Hyperparameter Tuning

Manual vs Data-Driven Approaches

Hyperparameters can either be manually configured during model initialization or tuned in a data-driven manner using the gridsearch_fit() method.

from cca_zoo.models import rCCA
from cca_zoo.model_selection import GridSearchCV

# Custom scoring function
def scorer(estimator, X):
    dim_corrs = estimator.score(X)
    return dim_corrs.mean()

# Define grid of potential regularization parameters
c1 = [0.1, 0.3, 0.7, 0.9]
c2 = [0.1, 0.3, 0.7, 0.9]
param_grid = {'c': [c1, c2]}

cv = 5  # Number of folds in cross-validation

# Conduct grid search
ridge = GridSearchCV(rCCA(latent_dims=latent_dims), param_grid=param_grid,
                     cv=cv, verbose=True, scoring=scorer).fit([train_view_1, train_view_2]).best_estimator_

Model Transformations

Transform your data post-fitting to obtain latent projections for each view.

projections = ridge.transform([train_view_1, train_view_2])

Alternatively, use fit_transform for simultaneous fitting and transformation.

projections = ridge.fit_transform([train_view_1, train_view_2])

Model Evaluation

Assess the performance of your model by evaluating the correlations in the latent space.

correlation = ridge.score([train_view_1, train_view_2])

For tensor-based CCA models, this score represents higher-order correlations in each dimension.

Extracting Model Weights

In specialized applications, it may be essential to access the model’s linear transformations for each view.

view_1_weights = ridge.weights[0]
view_2_weights = ridge.weights[1]

Deep Models in CCA-Zoo

Deep models in cca-zoo utilize neural networks as view encoders, capturing complex relationships between different views.

Constructing Encoder Architectures

Here, we define encoder architectures using multi-layer perceptrons (MLPs).

from cca_zoo.deepmodels import architectures

encoder_1 = architectures.Encoder(latent_dims=latent_dims, feature_size=784)
encoder_2 = architectures.Encoder(latent_dims=latent_dims, feature_size=784)

Deep CCA Model Initiation

Initialize a Deep CCA model using the encoder architectures.

from cca_zoo.deepmodels import DCCA

dcca_model = DCCA(latent_dims=latent_dims, encoders=[encoder_1, encoder_2])

The resulting object is a PyTorch.nn.Module, allowing for further updates in a custom training loop.