User Guide#
Explore the robust capabilities of cca-zoo in facilitating multiview data analysis through Canonical Correlation Analysis (CCA) and its advanced variations.
Model Fitting#
Preparing Your Data ==================-
Ensure your data is appropriately preprocessed before analysis. In this example, we create two synthetic views, each containing 10 features.
import numpy as np
# Create synthetic data for two views
train_view_1 = np.random.normal(size=(100, 10))
train_view_2 = np.random.normal(size=(100, 10))
# Normalize the data by removing the mean
train_view_1 -= train_view_1.mean(axis=0)
train_view_2 -= train_view_2.mean(axis=0)
Initiating and Fitting Your Model#
To begin, instantiate the CCA model and specify the desired number of latent dimensions.
from cca_zoo.models import CCA
latent_dimensions = 3
linear_cca = CCA(latent_dimensions=latent_dimensions)
# Fit the model
linear_cca.fit((train_view_1, train_view_2))
Hyperparameter Tuning#
Manual vs Data-Driven Approaches#
Hyperparameters can either be manually configured during model initialization or tuned in a data-driven manner using the gridsearch_fit() method.
from cca_zoo.models import rCCA
from cca_zoo.model_selection import GridSearchCV
# Custom scoring function
def scorer(estimator, X):
dim_corrs = estimator.score(X)
return dim_corrs.mean()
# Define grid of potential regularization parameters
c1 = [0.1, 0.3, 0.7, 0.9]
c2 = [0.1, 0.3, 0.7, 0.9]
param_grid = {'c': [c1, c2]}
cv = 5 # Number of folds in cross-validation
# Conduct grid search
ridge = GridSearchCV(rCCA(latent_dimensions=latent_dimensions), param_grid=param_grid,
cv=cv, verbose=True, scoring=scorer).fit((train_view_1, train_view_2)).best_estimator_
Model Transformations#
Transform your data post-fitting to obtain latent projections for each view.
projections = ridge.transform((train_view_1, train_view_2))
Alternatively, use fit_transform for simultaneous fitting and transformation.
projections = ridge.fit_transform((train_view_1, train_view_2))
Model Evaluation#
Assess the performance of your model by evaluating the correlations in the latent space.
correlation = ridge.score((train_view_1, train_view_2))
For tensor-based CCA models, this score represents higher-order correlations in each dimension.
Extracting Model Weights#
In specialized applications, it may be essential to access the model’s linear transformations for each view.
view_1_weights = ridge.weights_[0]
view_2_weights = ridge.weights_[1]
Deep Models in CCA-Zoo#
Deep models in cca-zoo utilize neural networks as view encoders, capturing complex relationships between different views.
Constructing Encoder Architectures –
Here, we define encoder architectures using multi-layer perceptrons (MLPs).
from cca_zoo.deepmodels import architectures
encoder_1 = architectures.Encoder(latent_dimensions=latent_dimensions, feature_size=784)
encoder_2 = architectures.Encoder(latent_dimensions=latent_dimensions, feature_size=784)
Deep CCA Model Initiation#
Initialize a Deep CCA model using the encoder architectures.
from cca_zoo.deepmodels import DCCA
dcca_model = DCCA(latent_dimensions=latent_dimensions, encoders=[encoder_1, encoder_2])
The resulting object is a PyTorch.nn.Module, allowing for further updates in a custom training loop.