Models

class cca_zoo.models.GCCA(latent_dims: int = 1, scale: bool = True, centre=True, copy_data=True, random_state=None, c: Optional[Union[Iterable[float], float]] = None, view_weights: Optional[Iterable[float]] = None)[source]

Bases: rCCA

A class used to fit GCCA model. For more than 2 views, GCCA optimizes the sum of correlations with a shared auxiliary vector

\[ \begin{align}\begin{aligned}\begin{split}w_{opt}=\underset{w}{\mathrm{argmax}}\{ \sum_iw_i^TX_i^TT \}\\\end{split}\\\text{subject to:}\\T^TT=1\end{aligned}\end{align} \]

Parameters

latent_dims (int, optional) – Number of latent dimensions to use, by default 1
scale (bool, optional) – Whether to scale the data, by default True
centre (bool, optional) – Whether to centre the data, by default True
copy_data (bool, optional) – Whether to copy the data, by default True
random_state (int, optional) – Random state, by default None
c (Union[Iterable[float], float], optional) – Regularization parameter, by default None
view_weights (Iterable[float], optional) – Weights for each view, by default None

References

Tenenhaus, Arthur, and Michel Tenenhaus. “Regularized generalized canonical correlation analysis.” Psychometrika 76.2 (2011): 257.

Examples

>>> from cca_zoo.models import GCCA
>>> import numpy as np
>>> rng=np.random.RandomState(0)
>>> X1 = rng.random((10,5))
>>> X2 = rng.random((10,5))
>>> X3 = rng.random((10,5))
>>> model = GCCA()
>>> model.fit((X1,X2,X3)).score((X1,X2,X3))
array([0.97229856])

Parameters

latent_dims (int, optional) – Number of latent dimensions to fit. Default is 1.
scale (bool, optional) – Whether to scale the data to unit variance. Default is True.
centre (bool, optional) – Whether to centre the data. Default is True.
copy_data (bool, optional) – Whether to copy the data. Default is True.
accept_sparse (bool, optional) – Whether to accept sparse data. Default is False.
random_state (int, RandomState instance or None, optional (default=None)) – Pass an int for reproducible output across multiple function calls.

fit(views: Iterable[ndarray], y=None, **kwargs)

Fits the model to the given data

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
y (None) –
kwargs (any additional keyword arguments required by the given model) –

Returns

self

Return type

object

fit_transform(views: Iterable[ndarray], **kwargs)

Fits the model to the given data and returns the transformed views

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –

Returns

transformed_views

Return type

list of numpy arrays

get_factor_loadings(views: Iterable[ndarray], normalize=True, **kwargs)

Returns the factor loadings for each view

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
normalize (bool, optional) – Whether to normalize the factor loadings. Default is True.
kwargs (any additional keyword arguments required by the given model) –

Returns

factor_loadings

Return type

list of numpy arrays

get_params(deep=True)

Get parameters for this estimator.

Parameters: deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns: params – Parameter names mapped to their values.
Return type: dict

pairwise_correlations(views: Iterable[ndarray], **kwargs)

Returns the pairwise correlations between the views in each dimension

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –

Returns

pairwise_correlations

Return type

numpy array of shape (n_views, n_views, latent_dims)

score(views: Iterable[ndarray], y=None, **kwargs)

Returns the average pairwise correlation between the views

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
y (None) –
kwargs (any additional keyword arguments required by the given model) –

Returns

score

Return type

float

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters: **params (dict) – Estimator parameters.
Returns: self – Estimator instance.
Return type: estimator instance

transform(views: Iterable[ndarray], **kwargs)

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –

Returns

transformed_views

Return type

list of numpy arrays

class cca_zoo.models.KGCCA(latent_dims: int = 1, scale: bool = True, centre=True, copy_data=True, random_state=None, c: Optional[Union[Iterable[float], float]] = None, eps=0.001, kernel: Optional[Iterable[Union[float, callable]]] = None, gamma: Optional[Iterable[float]] = None, degree: Optional[Iterable[float]] = None, coef0: Optional[Iterable[float]] = None, kernel_params: Optional[Iterable[dict]] = None)[source]

Bases: GCCA

A class used to fit KGCCA model. For more than 2 views, KGCCA optimizes the sum of correlations with a shared auxiliary vector

\[ \begin{align}\begin{aligned}\begin{split}w_{opt}=\underset{w}{\mathrm{argmax}}\{ \sum_i\alpha_i^TK_i^TT \}\\\end{split}\\\text{subject to:}\\T^TT=1\end{aligned}\end{align} \]

References

Tenenhaus, Arthur, Cathy Philippe, and Vincent Frouin. “Kernel generalized canonical correlation analysis.” Computational Statistics & Data Analysis 90 (2015): 114-131.

Examples

>>> from cca_zoo.models import KGCCA
>>> import numpy as np
>>> rng=np.random.RandomState(0)
>>> X1 = rng.random((10,5))
>>> X2 = rng.random((10,5))
>>> X3 = rng.random((10,5))
>>> model = KGCCA()
>>> model.fit((X1,X2,X3)).score((X1,X2,X3))
array([0.97019284])

Parameters

latent_dims (int, optional) – Number of latent dimensions to fit. Default is 1.
scale (bool, optional) – Whether to scale the data to unit variance. Default is True.
centre (bool, optional) – Whether to centre the data. Default is True.
copy_data (bool, optional) – Whether to copy the data. Default is True.
accept_sparse (bool, optional) – Whether to accept sparse data. Default is False.
random_state (int, RandomState instance or None, optional (default=None)) – Pass an int for reproducible output across multiple function calls.

transform(views: ndarray, y=None, **kwargs)[source]

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –

Returns

transformed_views

Return type

list of numpy arrays

fit(views: Iterable[ndarray], y=None, **kwargs)

Fits the model to the given data

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
y (None) –
kwargs (any additional keyword arguments required by the given model) –

Returns

self

Return type

object

fit_transform(views: Iterable[ndarray], **kwargs)

Fits the model to the given data and returns the transformed views

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –

Returns

transformed_views

Return type

list of numpy arrays

get_factor_loadings(views: Iterable[ndarray], normalize=True, **kwargs)

Returns the factor loadings for each view

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
normalize (bool, optional) – Whether to normalize the factor loadings. Default is True.
kwargs (any additional keyword arguments required by the given model) –

Returns

factor_loadings

Return type

list of numpy arrays

get_params(deep=True)

Get parameters for this estimator.

Parameters: deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns: params – Parameter names mapped to their values.
Return type: dict

pairwise_correlations(views: Iterable[ndarray], **kwargs)

Returns the pairwise correlations between the views in each dimension

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –

Returns

pairwise_correlations

Return type

numpy array of shape (n_views, n_views, latent_dims)

score(views: Iterable[ndarray], y=None, **kwargs)

Returns the average pairwise correlation between the views

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
y (None) –
kwargs (any additional keyword arguments required by the given model) –

Returns

score

Return type

float

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters: **params (dict) – Estimator parameters.
Returns: self – Estimator instance.
Return type: estimator instance

class cca_zoo.models.PLS_ALS(latent_dims: int = 1, scale: bool = True, centre=True, copy_data=True, random_state=None, max_iter: int = 100, initialization: Union[str, callable] = 'random', tol: float = 0.001, deflation='pls', verbose=0)[source]

Bases: _BaseIterative

A class used to fit a PLS model to two or more views of data.

Fits a partial least squares model with CCA deflation by NIPALS algorithm

\[ \begin{align}\begin{aligned}\begin{split}w_{opt}=\underset{w}{\mathrm{argmax}}\{\sum_i\sum_{j\neq i} \|X_iw_i-X_jw_j\|^2\}\\\end{split}\\\text{subject to:}\\w_i^Tw_i=1\end{aligned}\end{align} \]

Can also be used with more than two views

Parameters

latent_dims (int, optional) – Number of latent dimensions to use, by default 1
scale (bool, optional) – Whether to scale the data, by default True
centre (bool, optional) – Whether to centre the data, by default True
copy_data (bool, optional) – Whether to copy the data, by default True
random_state (int, optional) – Random state for reproducibility, by default None
max_iter (int, optional) – Maximum number of iterations, by default 100
initialization (Union[str, callable], optional) – Initialization method, by default “random”
tol (float, optional) – Tolerance for convergence, by default 1e-9
verbose (int, optional) – Verbosity level, by default 0

Examples

>>> from cca_zoo.models import PLS
>>> import numpy as np
>>> rng=np.random.RandomState(0)
>>> X1 = rng.random((10,5))
>>> X2 = rng.random((10,5))
>>> model = PLS_ALS(random_state=0)
>>> model.fit((X1,X2)).score((X1,X2))
array([0.81796854])

Parameters

latent_dims (int, optional) – Number of latent dimensions to fit. Default is 1.
scale (bool, optional) – Whether to scale the data to unit variance. Default is True.
centre (bool, optional) – Whether to centre the data. Default is True.
copy_data (bool, optional) – Whether to copy the data. Default is True.
accept_sparse (bool, optional) – Whether to accept sparse data. Default is False.
random_state (int, RandomState instance or None, optional (default=None)) – Pass an int for reproducible output across multiple function calls.

fit(views: Iterable[ndarray], y=None, **kwargs)

Fits the model to the given data

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
y (None) –
kwargs (any additional keyword arguments required by the given model) –

Returns

self

Return type

object

fit_transform(views: Iterable[ndarray], **kwargs)

Fits the model to the given data and returns the transformed views

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –

Returns

transformed_views

Return type

list of numpy arrays

get_factor_loadings(views: Iterable[ndarray], normalize=True, **kwargs)

Returns the factor loadings for each view

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
normalize (bool, optional) – Whether to normalize the factor loadings. Default is True.
kwargs (any additional keyword arguments required by the given model) –

Returns

factor_loadings

Return type

list of numpy arrays

get_params(deep=True)

Get parameters for this estimator.

Parameters: deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns: params – Parameter names mapped to their values.
Return type: dict

pairwise_correlations(views: Iterable[ndarray], **kwargs)

Returns the pairwise correlations between the views in each dimension

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –

Returns

pairwise_correlations

Return type

numpy array of shape (n_views, n_views, latent_dims)

score(views: Iterable[ndarray], y=None, **kwargs)

Returns the average pairwise correlation between the views

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
y (None) –
kwargs (any additional keyword arguments required by the given model) –

Returns

score

Return type

float

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters: **params (dict) – Estimator parameters.
Returns: self – Estimator instance.
Return type: estimator instance

transform(views: Iterable[ndarray], **kwargs)

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –

Returns

transformed_views

Return type

list of numpy arrays

class cca_zoo.models.SCCA_PMD(latent_dims: int = 1, scale: bool = True, centre=True, copy_data=True, random_state=None, deflation='cca', tau: Optional[Union[Iterable[float], float]] = None, max_iter: int = 100, initialization: Union[str, callable] = 'pls', tol: float = 0.001, positive: Optional[Union[Iterable[bool], bool]] = None, verbose=0)[source]

Bases: PLS_ALS

Fits a Sparse CCA (Penalized Matrix Decomposition) model for 2 or more views.

\[ \begin{align}\begin{aligned}\begin{split}w_{opt}=\underset{w}{\mathrm{argmax}}\{ w_1^TX_1^TX_2w_2 \}\\\end{split}\\\text{subject to:}\\w_i^Tw_i=1\\\|w_i\|<=c_i\end{aligned}\end{align} \]

Parameters

latent_dims (int, default=1) – Number of latent dimensions to use in the model.
scale (bool, default=True) – Whether to scale the data to unit variance.
centre (bool, default=True) – Whether to centre the data to have zero mean.
copy_data (bool, default=True) – Whether to copy the data or overwrite it.
random_state (int, default=None) – Random seed for initialisation.
deflation (str, default="cca") – Deflation method to use. Options are “cca” and “pmd”.
tau (float or list of floats, default=None) – Regularisation parameter. If a single float is given, the same value is used for all views. If a list of floats is given, the values are used for each view respectively. If None, the value is set to 1.
max_iter (int, default=100) – Maximum number of iterations to run.
initialization (str or callable, default="pls") – Method to use for initialisation. Options are “pls” and “random”.
tol (float, default=1e-9) – Tolerance for convergence.
positive (bool or list of bools, default=False) – Whether to constrain the weights to be positive.
verbose (int, default=0) – Verbosity level. 0 is silent, 1 prints progress.

References

Witten, Daniela M., Robert Tibshirani, and Trevor Hastie. “A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis.” Biostatistics 10.3 (2009): 515-534.

Examples

>>> from cca_zoo.models import SCCA_PMD
>>> import numpy as np
>>> rng=np.random.RandomState(0)
>>> X1 = rng.random((10,5))
>>> X2 = rng.random((10,5))
>>> model = SCCA_PMD(tau=[1,1],random_state=0)
>>> model.fit((X1,X2)).score((X1,X2))
array([0.81796873])

Parameters

latent_dims (int, optional) – Number of latent dimensions to fit. Default is 1.
scale (bool, optional) – Whether to scale the data to unit variance. Default is True.
centre (bool, optional) – Whether to centre the data. Default is True.
copy_data (bool, optional) – Whether to copy the data. Default is True.
accept_sparse (bool, optional) – Whether to accept sparse data. Default is False.
random_state (int, RandomState instance or None, optional (default=None)) – Pass an int for reproducible output across multiple function calls.

fit(views: Iterable[ndarray], y=None, **kwargs)

Fits the model to the given data

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
y (None) –
kwargs (any additional keyword arguments required by the given model) –

Returns

self

Return type

object

fit_transform(views: Iterable[ndarray], **kwargs)

Fits the model to the given data and returns the transformed views

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –

Returns

transformed_views

Return type

list of numpy arrays

get_factor_loadings(views: Iterable[ndarray], normalize=True, **kwargs)

Returns the factor loadings for each view

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
normalize (bool, optional) – Whether to normalize the factor loadings. Default is True.
kwargs (any additional keyword arguments required by the given model) –

Returns

factor_loadings

Return type

list of numpy arrays

get_params(deep=True)

Get parameters for this estimator.

Parameters: deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns: params – Parameter names mapped to their values.
Return type: dict

pairwise_correlations(views: Iterable[ndarray], **kwargs)

Returns the pairwise correlations between the views in each dimension

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –

Returns

pairwise_correlations

Return type

numpy array of shape (n_views, n_views, latent_dims)

score(views: Iterable[ndarray], y=None, **kwargs)

Returns the average pairwise correlation between the views

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
y (None) –
kwargs (any additional keyword arguments required by the given model) –

Returns

score

Return type

float

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters: **params (dict) – Estimator parameters.
Returns: self – Estimator instance.
Return type: estimator instance

transform(views: Iterable[ndarray], **kwargs)

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –

Returns

transformed_views

Return type

list of numpy arrays

class cca_zoo.models.ElasticCCA(latent_dims: int = 1, scale: bool = True, centre=True, copy_data=True, random_state=None, deflation='cca', max_iter: int = 100, initialization: Union[str, callable] = 'pls', tol: float = 0.001, c: Optional[Union[Iterable[float], float]] = None, l1_ratio: Optional[Union[Iterable[float], float]] = None, maxvar: bool = True, stochastic=False, positive: Optional[Union[Iterable[bool], bool]] = None, verbose=0)[source]

Bases: _BaseIterative

Fits an elastic CCA by iterating elastic net regressions to two or more views of data.

By default, ElasticCCA uses CCA with an auxiliary variable target i.e. MAXVAR configuration

\[ \begin{align}\begin{aligned}\begin{split}w_{opt}, t_{opt}=\underset{w,t}{\mathrm{argmax}}\{\sum_i \|X_iw_i-t\|^2 + c\|w_i\|^2_2 + \text{l1_ratio}\|w_i\|_1\}\\\end{split}\\\text{subject to:}\\t^Tt=n\end{aligned}\end{align} \]

But we can force it to attempt to use the SUMCOR form which will approximate a solution to the problem:

\[ \begin{align}\begin{aligned}\begin{split}w_{opt}=\underset{w}{\mathrm{argmax}}\{\sum_i\sum_{j\neq i} \|X_iw_i-X_jw_j\|^2 + c\|w_i\|^2_2 + \text{l1_ratio}\|w_i\|_1\}\\\end{split}\\\text{subject to:}\\w_i^TX_i^TX_iw_i=n\end{aligned}\end{align} \]

Parameters

latent_dims (int, default=1) – Number of latent dimensions to use
scale (bool, default=True) – Whether to scale the data to unit variance
centre (bool, default=True) – Whether to centre the data to zero mean
copy_data (bool, default=True) – Whether to copy the data or overwrite it
random_state (int, default=None) – Random seed for initialization
deflation (str, default="cca") – Whether to use CCA or PLS deflation
max_iter (int, default=100) – Maximum number of iterations to run
initialization (str or callable, default="pls") – How to initialize the weights. Can be “pls” or “random” or a callable
tol (float, default=1e-3) – Tolerance for convergence
c (float or list of floats, default=None) – Regularisation parameter for the L2 penalty. If None, defaults to 1.0
l1_ratio (float or list of floats, default=None) – Regularisation parameter for the L1 penalty. If None, defaults to 0.0
maxvar (bool, default=True) – Whether to use MAXVAR or SUMCOR configuration
stochastic (bool, default=False) – Whether to use stochastic gradient descent
positive (bool or list of bools, default=None) – Whether to use non-negative constraints
verbose (int, default=0) – Verbosity level

Examples

>>> from cca_zoo.models import ElasticCCA
>>> import numpy as np
>>> rng=np.random.RandomState(0)
>>> X1 = rng.random((10,5))
>>> X2 = rng.random((10,5))
>>> model = ElasticCCA(c=[1e-1,1e-1],l1_ratio=[0.5,0.5], random_state=0)
>>> model.fit((X1,X2)).score((X1,X2))
array([0.9316638])

Parameters

latent_dims (int, optional) – Number of latent dimensions to fit. Default is 1.
scale (bool, optional) – Whether to scale the data to unit variance. Default is True.
centre (bool, optional) – Whether to centre the data. Default is True.
copy_data (bool, optional) – Whether to copy the data. Default is True.
accept_sparse (bool, optional) – Whether to accept sparse data. Default is False.
random_state (int, RandomState instance or None, optional (default=None)) – Pass an int for reproducible output across multiple function calls.

fit(views: Iterable[ndarray], y=None, **kwargs)

Fits the model to the given data

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
y (None) –
kwargs (any additional keyword arguments required by the given model) –

Returns

self

Return type

object

fit_transform(views: Iterable[ndarray], **kwargs)

Fits the model to the given data and returns the transformed views

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –

Returns

transformed_views

Return type

list of numpy arrays

get_factor_loadings(views: Iterable[ndarray], normalize=True, **kwargs)

Returns the factor loadings for each view

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
normalize (bool, optional) – Whether to normalize the factor loadings. Default is True.
kwargs (any additional keyword arguments required by the given model) –

Returns

factor_loadings

Return type

list of numpy arrays

get_params(deep=True)

Get parameters for this estimator.

Parameters: deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns: params – Parameter names mapped to their values.
Return type: dict

pairwise_correlations(views: Iterable[ndarray], **kwargs)

Returns the pairwise correlations between the views in each dimension

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –

Returns

pairwise_correlations

Return type

numpy array of shape (n_views, n_views, latent_dims)

score(views: Iterable[ndarray], y=None, **kwargs)

Returns the average pairwise correlation between the views

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
y (None) –
kwargs (any additional keyword arguments required by the given model) –

Returns

score

Return type

float

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters: **params (dict) – Estimator parameters.
Returns: self – Estimator instance.
Return type: estimator instance

transform(views: Iterable[ndarray], **kwargs)

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –

Returns

transformed_views

Return type

list of numpy arrays

class cca_zoo.models.SCCA_Parkhomenko(latent_dims: int = 1, scale: bool = True, centre=True, copy_data=True, random_state=None, deflation='cca', tau: Optional[Union[Iterable[float], float]] = None, max_iter: int = 100, initialization: Union[str, callable] = 'pls', tol: float = 0.001, verbose=0)[source]

Bases: _BaseIterative

Fits a sparse CCA (penalized CCA) model for 2 or more views.

\[ \begin{align}\begin{aligned}\begin{split}w_{opt}=\underset{w}{\mathrm{argmax}}\{ w_1^TX_1^TX_2w_2 \} + c_i\|w_i\|\\\end{split}\\\text{subject to:}\\w_i^Tw_i=1\end{aligned}\end{align} \]

References

Parkhomenko, Elena, David Tritchler, and Joseph Beyene. “Sparse canonical correlation analysis with application to genomic data integration.” Statistical applications in genetics and molecular biology 8.1 (2009).

Examples

>>> from cca_zoo.models import SCCA_Parkhomenko
>>> import numpy as np
>>> rng=np.random.RandomState(0)
>>> X1 = rng.random((10,5))
>>> X2 = rng.random((10,5))
>>> model = SCCA_Parkhomenko(tau=[0.001,0.001],random_state=0)
>>> model.fit((X1,X2)).score((X1,X2))
array([0.81803527])

Parameters

latent_dims (int, optional) – Number of latent dimensions to fit. Default is 1.
scale (bool, optional) – Whether to scale the data to unit variance. Default is True.
centre (bool, optional) – Whether to centre the data. Default is True.
copy_data (bool, optional) – Whether to copy the data. Default is True.
accept_sparse (bool, optional) – Whether to accept sparse data. Default is False.
random_state (int, RandomState instance or None, optional (default=None)) – Pass an int for reproducible output across multiple function calls.

fit(views: Iterable[ndarray], y=None, **kwargs)

Fits the model to the given data

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
y (None) –
kwargs (any additional keyword arguments required by the given model) –

Returns

self

Return type

object

fit_transform(views: Iterable[ndarray], **kwargs)

Fits the model to the given data and returns the transformed views

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –

Returns

transformed_views

Return type

list of numpy arrays

get_factor_loadings(views: Iterable[ndarray], normalize=True, **kwargs)

Returns the factor loadings for each view

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
normalize (bool, optional) – Whether to normalize the factor loadings. Default is True.
kwargs (any additional keyword arguments required by the given model) –

Returns

factor_loadings

Return type

list of numpy arrays

get_params(deep=True)

Get parameters for this estimator.

Parameters: deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns: params – Parameter names mapped to their values.
Return type: dict

pairwise_correlations(views: Iterable[ndarray], **kwargs)

Returns the pairwise correlations between the views in each dimension

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –

Returns

pairwise_correlations

Return type

numpy array of shape (n_views, n_views, latent_dims)

score(views: Iterable[ndarray], y=None, **kwargs)

Returns the average pairwise correlation between the views

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
y (None) –
kwargs (any additional keyword arguments required by the given model) –

Returns

score

Return type

float

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters: **params (dict) – Estimator parameters.
Returns: self – Estimator instance.
Return type: estimator instance

transform(views: Iterable[ndarray], **kwargs)

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –

Returns

transformed_views

Return type

list of numpy arrays

class cca_zoo.models.SCCA_IPLS(latent_dims: int = 1, scale: bool = True, centre=True, copy_data=True, random_state=None, deflation='cca', tau: Optional[Union[Iterable[float], float]] = None, max_iter: int = 100, maxvar: bool = False, initialization: Union[str, callable] = 'pls', tol: float = 0.001, stochastic=False, positive: Optional[Union[Iterable[bool], bool]] = None, verbose=0)[source]

Bases: ElasticCCA

Fits a sparse CCA model by _iterative rescaled lasso regression. Implemented by ElasticCCA with l1 ratio=1

For default maxvar=False, the optimisation is given by:

Maths

\[ \begin{align}\begin{aligned}\begin{split}w_{opt}=\underset{w}{\mathrm{argmax}}\{\sum_i\sum_{j\neq i} \|X_iw_i-X_jw_j\|^2 + \text{l1_ratio}\|w_i\|_1\}\\\end{split}\\\text{subject to:}\\w_i^TX_i^TX_iw_i=n\end{aligned}\end{align} \]

Citation

Mai, Qing, and Xin Zhang. “An _iterative penalized least squares approach to sparse canonical correlation analysis.” Biometrics 75.3 (2019): 734-744.

For maxvar=True, the optimisation is given by the ElasticCCA problem with no l2 regularisation:

Maths

\[ \begin{align}\begin{aligned}\begin{split}w_{opt}, t_{opt}=\underset{w,t}{\mathrm{argmax}}\{\sum_i \|X_iw_i-t\|^2 + \text{l1_ratio}\|w_i\|_1\}\\\end{split}\\\text{subject to:}\\t^Tt=n\end{aligned}\end{align} \]

Citation

Fu, Xiao, et al. “Scalable and flexible multiview MAX-VAR canonical correlation analysis.” IEEE Transactions on Signal Processing 65.16 (2017): 4150-4165.

Example

>>> from cca_zoo.models import SCCA_IPLS
>>> import numpy as np
>>> rng=np.random.RandomState(0)
>>> X1 = rng.random((10,5))
>>> X2 = rng.random((10,5))
>>> model = SCCA_IPLS(c=[0.001,0.001], random_state=0)
>>> model.fit((X1,X2)).score((X1,X2))
array([0.99998761])

Parameters

latent_dims (int, optional) – Number of latent dimensions to fit. Default is 1.
scale (bool, optional) – Whether to scale the data to unit variance. Default is True.
centre (bool, optional) – Whether to centre the data. Default is True.
copy_data (bool, optional) – Whether to copy the data. Default is True.
accept_sparse (bool, optional) – Whether to accept sparse data. Default is False.
random_state (int, RandomState instance or None, optional (default=None)) – Pass an int for reproducible output across multiple function calls.

fit(views: Iterable[ndarray], y=None, **kwargs)

Fits the model to the given data

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
y (None) –
kwargs (any additional keyword arguments required by the given model) –

Returns

self

Return type

object

fit_transform(views: Iterable[ndarray], **kwargs)

Fits the model to the given data and returns the transformed views

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –

Returns

transformed_views

Return type

list of numpy arrays

get_factor_loadings(views: Iterable[ndarray], normalize=True, **kwargs)

Returns the factor loadings for each view

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
normalize (bool, optional) – Whether to normalize the factor loadings. Default is True.
kwargs (any additional keyword arguments required by the given model) –

Returns

factor_loadings

Return type

list of numpy arrays

get_params(deep=True)

Get parameters for this estimator.

Parameters: deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns: params – Parameter names mapped to their values.
Return type: dict

pairwise_correlations(views: Iterable[ndarray], **kwargs)

Returns the pairwise correlations between the views in each dimension

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –

Returns

pairwise_correlations

Return type

numpy array of shape (n_views, n_views, latent_dims)

score(views: Iterable[ndarray], y=None, **kwargs)

Returns the average pairwise correlation between the views

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
y (None) –
kwargs (any additional keyword arguments required by the given model) –

Returns

score

Return type

float

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters: **params (dict) – Estimator parameters.
Returns: self – Estimator instance.
Return type: estimator instance

transform(views: Iterable[ndarray], **kwargs)

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –

Returns

transformed_views

Return type

list of numpy arrays

class cca_zoo.models.SCCA_ADMM(latent_dims: int = 1, scale: bool = True, centre=True, copy_data=True, random_state=None, deflation='cca', tau: Optional[Union[Iterable[float], float]] = None, mu: Optional[Union[Iterable[float], float]] = None, lam: Optional[Union[Iterable[float], float]] = None, eta: Optional[Union[Iterable[float], float]] = None, max_iter: int = 100, initialization: Union[str, callable] = 'pls', tol: float = 0.001, verbose=0)[source]

Bases: _BaseIterative

Fits a sparse CCA model by alternating ADMM for two or more views.

\[ \begin{align}\begin{aligned}\begin{split}w_{opt}=\underset{w}{\mathrm{argmax}}\{\sum_i\sum_{j\neq i} \|X_iw_i-X_jw_j\|^2 + \|w_i\|_1\}\\\end{split}\\\text{subject to:}\\w_i^TX_i^TX_iw_i=1\end{aligned}\end{align} \]

Parameters

latent_dims (int, default=1) – Number of latent dimensions to use in the model.
scale (bool, default=True) – Whether to scale the data to unit variance.
centre (bool, default=True) – Whether to centre the data to have zero mean.
copy_data (bool, default=True) – Whether to copy the data or overwrite it.
random_state (int, default=None) – Random seed for initialisation.
deflation (str, default="cca") – Deflation method to use. Options are “cca” and “pls”.
tau (float or list of floats, default=None) – Regularisation parameter. If a single float is given, the same value is used for all views. If a list of floats is given, the values are used for each view.
mu (float or list of floats, default=None) – Regularisation parameter. If a single float is given, the same value is used for all views. If a list of floats is given, the values are used for each view.
lam (float or list of floats, default=None) – Regularisation parameter. If a single float is given, the same value is used for all views. If a list of floats is given, the values are used for each view.
eta (float or list of floats, default=None) – Regularisation parameter. If a single float is given, the same value is used for all views. If a list of floats is given, the values are used for each view.
max_iter (int, default=100) – Maximum number of iterations to run.
initialization (str or callable, default="pls") – Method to use for initialisation. Options are “pls” and “random”.
tol (float, default=1e-9) – Tolerance for convergence.
verbose (int, default=0) – Verbosity level. If 0, no output is printed. If 1, output is printed every 10 iterations.

References

Suo, Xiaotong, et al. “Sparse canonical correlation analysis.” arXiv preprint arXiv:1705.10865 (2017).

Examples

>>> from cca_zoo.models import SCCA_ADMM
>>> import numpy as np
>>> rng=np.random.RandomState(0)
>>> X1 = rng.random((10,5))
>>> X2 = rng.random((10,5))
>>> model = SCCA_ADMM(random_state=0,tau=[1e-1,1e-1])
>>> model.fit((X1,X2)).score((X1,X2))
array([0.84348183])

Parameters

latent_dims (int, optional) – Number of latent dimensions to fit. Default is 1.
scale (bool, optional) – Whether to scale the data to unit variance. Default is True.
centre (bool, optional) – Whether to centre the data. Default is True.
copy_data (bool, optional) – Whether to copy the data. Default is True.
accept_sparse (bool, optional) – Whether to accept sparse data. Default is False.
random_state (int, RandomState instance or None, optional (default=None)) – Pass an int for reproducible output across multiple function calls.

fit(views: Iterable[ndarray], y=None, **kwargs)

Fits the model to the given data

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
y (None) –
kwargs (any additional keyword arguments required by the given model) –

Returns

self

Return type

object

fit_transform(views: Iterable[ndarray], **kwargs)

Fits the model to the given data and returns the transformed views

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –

Returns

transformed_views

Return type

list of numpy arrays

get_factor_loadings(views: Iterable[ndarray], normalize=True, **kwargs)

Returns the factor loadings for each view

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
normalize (bool, optional) – Whether to normalize the factor loadings. Default is True.
kwargs (any additional keyword arguments required by the given model) –

Returns

factor_loadings

Return type

list of numpy arrays

get_params(deep=True)

Get parameters for this estimator.

Parameters: deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns: params – Parameter names mapped to their values.
Return type: dict

pairwise_correlations(views: Iterable[ndarray], **kwargs)

Returns the pairwise correlations between the views in each dimension

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –

Returns

pairwise_correlations

Return type

numpy array of shape (n_views, n_views, latent_dims)

score(views: Iterable[ndarray], y=None, **kwargs)

Returns the average pairwise correlation between the views

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
y (None) –
kwargs (any additional keyword arguments required by the given model) –

Returns

score

Return type

float

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters: **params (dict) – Estimator parameters.
Returns: self – Estimator instance.
Return type: estimator instance

transform(views: Iterable[ndarray], **kwargs)

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –

Returns

transformed_views

Return type

list of numpy arrays

class cca_zoo.models.SCCA_Span(latent_dims: int = 1, scale: bool = True, centre=True, copy_data=True, max_iter: int = 100, initialization: str = 'uniform', tol: float = 0.001, regularisation='l0', tau: Optional[Union[Iterable[Union[float, int]], float, int]] = None, rank=1, positive: Optional[Union[Iterable[bool], bool]] = None, random_state=None, deflation='cca', verbose=0)[source]

Bases: _BaseIterative

Fits a Sparse CCA model using SpanCCA.

\[ \begin{align}\begin{aligned}\begin{split}w_{opt}=\underset{w}{\mathrm{argmax}}\{\sum_i\sum_{j\neq i} \|X_iw_i-X_jw_j\|^2 + \text{l1_ratio}\|w_i\|_1\}\\\end{split}\\\text{subject to:}\\w_i^TX_i^TX_iw_i=1\end{aligned}\end{align} \]

References

Asteris, Megasthenis, et al. “A simple and provable algorithm for sparse diagonal CCA.” International Conference on Machine Learning. PMLR, 2016.

Examples

>>> from cca_zoo.models import SCCA_Span
>>> import numpy as np
>>> rng=np.random.RandomState(0)
>>> X1 = rng.random((10,5))
>>> X2 = rng.random((10,5))
>>> model = SCCA_Span(regularisation="l0", tau=[2, 2])
>>> model.fit((X1,X2)).score((X1,X2))
array([0.84556666])

Parameters

latent_dims (int, optional) – Number of latent dimensions to fit. Default is 1.
scale (bool, optional) – Whether to scale the data to unit variance. Default is True.
centre (bool, optional) – Whether to centre the data. Default is True.
copy_data (bool, optional) – Whether to copy the data. Default is True.
accept_sparse (bool, optional) – Whether to accept sparse data. Default is False.
random_state (int, RandomState instance or None, optional (default=None)) – Pass an int for reproducible output across multiple function calls.

fit(views: Iterable[ndarray], y=None, **kwargs)

Fits the model to the given data

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
y (None) –
kwargs (any additional keyword arguments required by the given model) –

Returns

self

Return type

object

fit_transform(views: Iterable[ndarray], **kwargs)

Fits the model to the given data and returns the transformed views

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –

Returns

transformed_views

Return type

list of numpy arrays

get_factor_loadings(views: Iterable[ndarray], normalize=True, **kwargs)

Returns the factor loadings for each view

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
normalize (bool, optional) – Whether to normalize the factor loadings. Default is True.
kwargs (any additional keyword arguments required by the given model) –

Returns

factor_loadings

Return type

list of numpy arrays

get_params(deep=True)

Get parameters for this estimator.

Parameters: deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns: params – Parameter names mapped to their values.
Return type: dict

pairwise_correlations(views: Iterable[ndarray], **kwargs)

Returns the pairwise correlations between the views in each dimension

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –

Returns

pairwise_correlations

Return type

numpy array of shape (n_views, n_views, latent_dims)

score(views: Iterable[ndarray], y=None, **kwargs)

Returns the average pairwise correlation between the views

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
y (None) –
kwargs (any additional keyword arguments required by the given model) –

Returns

score

Return type

float

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters: **params (dict) – Estimator parameters.
Returns: self – Estimator instance.
Return type: estimator instance

transform(views: Iterable[ndarray], **kwargs)

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –

Returns

transformed_views

Return type

list of numpy arrays

class cca_zoo.models.SWCCA(scale: bool = True, centre=True, copy_data=True, random_state=None, max_iter: int = 500, initialization: str = 'random', tol: float = 0.001, regularisation='l0', tau: Optional[Union[Iterable[Union[float, int]], float, int]] = None, sample_support=None, positive=False, verbose=0)[source]

Bases: _BaseIterative

A class used to fit SWCCA model

References

Examples

>>> from cca_zoo.models import SWCCA
>>> import numpy as np
>>> rng=np.random.RandomState(0)
>>> X1 = rng.random((10,5))
>>> X2 = rng.random((10,5))
>>> model = SWCCA(regularisation='l0',tau=[2, 2], sample_support=5, random_state=0)
>>> model.fit((X1,X2)).score((X1,X2))
array([0.61620969])

Parameters

latent_dims (int, optional) – Number of latent dimensions to fit. Default is 1.
scale (bool, optional) – Whether to scale the data to unit variance. Default is True.
centre (bool, optional) – Whether to centre the data. Default is True.
copy_data (bool, optional) – Whether to copy the data. Default is True.
accept_sparse (bool, optional) – Whether to accept sparse data. Default is False.
random_state (int, RandomState instance or None, optional (default=None)) – Pass an int for reproducible output across multiple function calls.

fit(views: Iterable[ndarray], y=None, **kwargs)

Fits the model to the given data

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
y (None) –
kwargs (any additional keyword arguments required by the given model) –

Returns

self

Return type

object

fit_transform(views: Iterable[ndarray], **kwargs)

Fits the model to the given data and returns the transformed views

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –

Returns

transformed_views

Return type

list of numpy arrays

get_factor_loadings(views: Iterable[ndarray], normalize=True, **kwargs)

Returns the factor loadings for each view

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
normalize (bool, optional) – Whether to normalize the factor loadings. Default is True.
kwargs (any additional keyword arguments required by the given model) –

Returns

factor_loadings

Return type

list of numpy arrays

get_params(deep=True)

Get parameters for this estimator.

Parameters: deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns: params – Parameter names mapped to their values.
Return type: dict

pairwise_correlations(views: Iterable[ndarray], **kwargs)

Returns the pairwise correlations between the views in each dimension

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –

Returns

pairwise_correlations

Return type

numpy array of shape (n_views, n_views, latent_dims)

score(views: Iterable[ndarray], y=None, **kwargs)

Returns the average pairwise correlation between the views

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
y (None) –
kwargs (any additional keyword arguments required by the given model) –

Returns

score

Return type

float

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters: **params (dict) – Estimator parameters.
Returns: self – Estimator instance.
Return type: estimator instance

transform(views: Iterable[ndarray], **kwargs)

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –

Returns

transformed_views

Return type

list of numpy arrays

class cca_zoo.models.MCCA(latent_dims: int = 1, scale: bool = True, centre=True, copy_data=True, random_state=None, c: Optional[Union[Iterable[float], float]] = None, eps=1e-09)[source]

Bases: rCCA

A class used to fit MCCA model. For more than 2 views, MCCA optimizes the sum of pairwise correlations.

\[ \begin{align}\begin{aligned}\begin{split}w_{opt}=\underset{w}{\mathrm{argmax}}\{\sum_i\sum_{j\neq i} w_i^TX_i^TX_jw_j \}\\\end{split}\\\text{subject to:}\\(1-c_i)w_i^TX_i^TX_iw_i+c_iw_i^Tw_i=1\end{aligned}\end{align} \]

Parameters

latent_dims (int, optional) – Number of latent dimensions to use, by default 1
scale (bool, optional) – Whether to scale the data, by default True
centre (bool, optional) – Whether to centre the data, by default True
copy_data (bool, optional) – Whether to copy the data, by default True
random_state (int, optional) – Random state, by default None
c (Union[Iterable[float], float], optional) – Regularisation parameter, by default None
eps (float, optional) – Small value to add to the diagonal of the regularisation matrix, by default 1e-9

References

Kettenring, Jon R. “Canonical analysis of several sets of variables.” Biometrika 58.3 (1971): 433-451.

Examples

>>> from cca_zoo.models import MCCA
>>> import numpy as np
>>> rng=np.random.RandomState(0)
>>> X1 = rng.random((10,5))
>>> X2 = rng.random((10,5))
>>> X3 = rng.random((10,5))
>>> model = MCCA()
>>> model.fit((X1,X2,X3)).score((X1,X2,X3))
array([0.97200847])

Parameters

latent_dims (int, optional) – Number of latent dimensions to fit. Default is 1.
scale (bool, optional) – Whether to scale the data to unit variance. Default is True.
centre (bool, optional) – Whether to centre the data. Default is True.
copy_data (bool, optional) – Whether to copy the data. Default is True.
accept_sparse (bool, optional) – Whether to accept sparse data. Default is False.
random_state (int, RandomState instance or None, optional (default=None)) – Pass an int for reproducible output across multiple function calls.

fit(views: Iterable[ndarray], y=None, **kwargs)

Fits the model to the given data

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
y (None) –
kwargs (any additional keyword arguments required by the given model) –

Returns

self

Return type

object

fit_transform(views: Iterable[ndarray], **kwargs)

Fits the model to the given data and returns the transformed views

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –

Returns

transformed_views

Return type

list of numpy arrays

get_factor_loadings(views: Iterable[ndarray], normalize=True, **kwargs)

Returns the factor loadings for each view

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
normalize (bool, optional) – Whether to normalize the factor loadings. Default is True.
kwargs (any additional keyword arguments required by the given model) –

Returns

factor_loadings

Return type

list of numpy arrays

get_params(deep=True)

Get parameters for this estimator.

Parameters: deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns: params – Parameter names mapped to their values.
Return type: dict

pairwise_correlations(views: Iterable[ndarray], **kwargs)

Returns the pairwise correlations between the views in each dimension

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –

Returns

pairwise_correlations

Return type

numpy array of shape (n_views, n_views, latent_dims)

score(views: Iterable[ndarray], y=None, **kwargs)

Returns the average pairwise correlation between the views

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
y (None) –
kwargs (any additional keyword arguments required by the given model) –

Returns

score

Return type

float

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters: **params (dict) – Estimator parameters.
Returns: self – Estimator instance.
Return type: estimator instance

transform(views: Iterable[ndarray], **kwargs)

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –

Returns

transformed_views

Return type

list of numpy arrays

class cca_zoo.models.KCCA(latent_dims: int = 1, scale: bool = True, centre=True, copy_data=True, random_state=None, c: Optional[Union[Iterable[float], float]] = None, eps=0.001, kernel: Optional[Iterable[Union[float, callable]]] = None, gamma: Optional[Iterable[float]] = None, degree: Optional[Iterable[float]] = None, coef0: Optional[Iterable[float]] = None, kernel_params: Optional[Iterable[dict]] = None)[source]

Bases: MCCA

A class used to fit KCCA model.

\[ \begin{align}\begin{aligned}\begin{split}\alpha_{opt}=\underset{\alpha}{\mathrm{argmax}}\{\sum_i\sum_{j\neq i} \alpha_i^TK_i^TK_j\alpha_j \}\\\end{split}\\\text{subject to:}\\c_i\alpha_i^TK_i\alpha_i + (1-c_i)\alpha_i^TK_i^TK_i\alpha_i=1\end{aligned}\end{align} \]

Examples

>>> from cca_zoo.models import KCCA
>>> import numpy as np
>>> rng=np.random.RandomState(0)
>>> X1 = rng.random((10,5))
>>> X2 = rng.random((10,5))
>>> X3 = rng.random((10,5))
>>> model = KCCA()
>>> model.fit((X1,X2,X3)).score((X1,X2,X3))
array([0.96893666])

Parameters

latent_dims (int, optional) – Number of latent dimensions to fit. Default is 1.
scale (bool, optional) – Whether to scale the data to unit variance. Default is True.
centre (bool, optional) – Whether to centre the data. Default is True.
copy_data (bool, optional) – Whether to copy the data. Default is True.
accept_sparse (bool, optional) – Whether to accept sparse data. Default is False.
random_state (int, RandomState instance or None, optional (default=None)) – Pass an int for reproducible output across multiple function calls.

transform(views: ndarray, **kwargs)[source]

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –

Returns

transformed_views

Return type

list of numpy arrays

fit(views: Iterable[ndarray], y=None, **kwargs)

Fits the model to the given data

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
y (None) –
kwargs (any additional keyword arguments required by the given model) –

Returns

self

Return type

object

fit_transform(views: Iterable[ndarray], **kwargs)

Fits the model to the given data and returns the transformed views

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –

Returns

transformed_views

Return type

list of numpy arrays

get_factor_loadings(views: Iterable[ndarray], normalize=True, **kwargs)

Returns the factor loadings for each view

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
normalize (bool, optional) – Whether to normalize the factor loadings. Default is True.
kwargs (any additional keyword arguments required by the given model) –

Returns

factor_loadings

Return type

list of numpy arrays

get_params(deep=True)

Get parameters for this estimator.

Parameters: deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns: params – Parameter names mapped to their values.
Return type: dict

pairwise_correlations(views: Iterable[ndarray], **kwargs)

Returns the pairwise correlations between the views in each dimension

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –

Returns

pairwise_correlations

Return type

numpy array of shape (n_views, n_views, latent_dims)

score(views: Iterable[ndarray], y=None, **kwargs)

Returns the average pairwise correlation between the views

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
y (None) –
kwargs (any additional keyword arguments required by the given model) –

Returns

score

Return type

float

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters: **params (dict) – Estimator parameters.
Returns: self – Estimator instance.
Return type: estimator instance

class cca_zoo.models.NCCA(latent_dims: int = 1, scale=True, centre=True, copy_data=True, accept_sparse=False, random_state: Optional[Union[int, RandomState]] = None, nearest_neighbors=None, gamma: Optional[Iterable[float]] = None)[source]

Bases: _BaseCCA

A class used to fit nonparametric (NCCA) model.

References

Michaeli, Tomer, Weiran Wang, and Karen Livescu. “Nonparametric canonical correlation analysis.” International conference on machine learning. PMLR, 2016.

Example

>>> from cca_zoo.models import NCCA
>>> X1 = np.random.rand(10,5)
>>> X2 = np.random.rand(10,5)
>>> model = NCCA()
>>> model.fit((X1,X2)).score((X1,X2))
array([1.])

Parameters

latent_dims (int, optional) – Number of latent dimensions to fit. Default is 1.
scale (bool, optional) – Whether to scale the data to unit variance. Default is True.
centre (bool, optional) – Whether to centre the data. Default is True.
copy_data (bool, optional) – Whether to copy the data. Default is True.
accept_sparse (bool, optional) – Whether to accept sparse data. Default is False.
random_state (int, RandomState instance or None, optional (default=None)) – Pass an int for reproducible output across multiple function calls.

fit(views: Iterable[ndarray], y=None, **kwargs)[source]

Fits the model to the given data

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
y (None) –
kwargs (any additional keyword arguments required by the given model) –

Returns

self

Return type

object

transform(views: Iterable[ndarray], **kwargs)[source]

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –

Returns

transformed_views

Return type

list of numpy arrays

fit_transform(views: Iterable[ndarray], **kwargs)

Fits the model to the given data and returns the transformed views

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –

Returns

transformed_views

Return type

list of numpy arrays

get_factor_loadings(views: Iterable[ndarray], normalize=True, **kwargs)

Returns the factor loadings for each view

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
normalize (bool, optional) – Whether to normalize the factor loadings. Default is True.
kwargs (any additional keyword arguments required by the given model) –

Returns

factor_loadings

Return type

list of numpy arrays

get_params(deep=True)

Get parameters for this estimator.

Parameters: deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns: params – Parameter names mapped to their values.
Return type: dict

pairwise_correlations(views: Iterable[ndarray], **kwargs)

Returns the pairwise correlations between the views in each dimension

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –

Returns

pairwise_correlations

Return type

numpy array of shape (n_views, n_views, latent_dims)

score(views: Iterable[ndarray], y=None, **kwargs)

Returns the average pairwise correlation between the views

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
y (None) –
kwargs (any additional keyword arguments required by the given model) –

Returns

score

Return type

float

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters: **params (dict) – Estimator parameters.
Returns: self – Estimator instance.
Return type: estimator instance

class cca_zoo.models.PartialCCA(latent_dims: int = 1, scale: bool = True, centre=True, copy_data=True, random_state=None, c: Optional[Union[Iterable[float], float]] = None)[source]

Bases: MCCA

A class used to fit a partial cca model. The key difference between this and a vanilla CCA or MCCA is that the canonical score vectors must be orthogonal to the supplied confounding variables.

Parameters

latent_dims (int, optional) – The number of latent dimensions to use, by default 1
scale (bool, optional) – Whether to scale the data, by default True
centre (bool, optional) – Whether to centre the data, by default True
copy_data (bool, optional) – Whether to copy the data, by default True
random_state (int, optional) – The random state to use, by default None
c (Union[Iterable[float], float], optional) – The regularization parameter, by default None

References

Rao, B. Raja. “Partial canonical correlations.” Trabajos de estadistica y de investigación operativa 20.2-3 (1969): 211-219.

Example

>>> from cca_zoo.models import PartialCCA
>>> X1 = np.random.rand(10,5)
>>> X2 = np.random.rand(10,5)
>>> partials = np.random.rand(10,3)
>>> model = PartialCCA()
>>> model.fit((X1,X2),partials=partials).score((X1,X2))
array([0.99993046])

Parameters

latent_dims (int, optional) – Number of latent dimensions to fit. Default is 1.
scale (bool, optional) – Whether to scale the data to unit variance. Default is True.
centre (bool, optional) – Whether to centre the data. Default is True.
copy_data (bool, optional) – Whether to copy the data. Default is True.
accept_sparse (bool, optional) – Whether to accept sparse data. Default is False.
random_state (int, RandomState instance or None, optional (default=None)) – Pass an int for reproducible output across multiple function calls.

fit(views: Iterable[ndarray], y=None, partials=None, **kwargs)[source]

Fits the model to the given data

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
y (None) –
kwargs (any additional keyword arguments required by the given model) –

Returns

self

Return type

object

transform(views: Iterable[ndarray], partials=None, **kwargs)[source]

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –

Returns

transformed_views

Return type

list of numpy arrays

fit_transform(views: Iterable[ndarray], **kwargs)

Fits the model to the given data and returns the transformed views

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –

Returns

transformed_views

Return type

list of numpy arrays

get_factor_loadings(views: Iterable[ndarray], normalize=True, **kwargs)

Returns the factor loadings for each view

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
normalize (bool, optional) – Whether to normalize the factor loadings. Default is True.
kwargs (any additional keyword arguments required by the given model) –

Returns

factor_loadings

Return type

list of numpy arrays

get_params(deep=True)

Get parameters for this estimator.

Parameters: deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns: params – Parameter names mapped to their values.
Return type: dict

pairwise_correlations(views: Iterable[ndarray], **kwargs)

Returns the pairwise correlations between the views in each dimension

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –

Returns

pairwise_correlations

Return type

numpy array of shape (n_views, n_views, latent_dims)

score(views: Iterable[ndarray], y=None, **kwargs)

Returns the average pairwise correlation between the views

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
y (None) –
kwargs (any additional keyword arguments required by the given model) –

Returns

score

Return type

float

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters: **params (dict) – Estimator parameters.
Returns: self – Estimator instance.
Return type: estimator instance

class cca_zoo.models.rCCA(latent_dims: int = 1, scale: bool = True, centre=True, copy_data=True, random_state=None, c: Optional[Union[Iterable[float], float]] = None, accept_sparse=None)[source]

Bases: _BaseCCA

A class used to fit Regularised CCA (canonical ridge) model. Uses PCA to perform the optimization efficiently for high dimensional data.

\[ \begin{align}\begin{aligned}\begin{split}w_{opt}=\underset{w}{\mathrm{argmax}}\{ w_1^TX_1^TX_2w_2 \}\\\end{split}\\\text{subject to:}\\(1-c_1)w_1^TX_1^TX_1w_1+c_1w_1^Tw_1=n\\(1-c_2)w_2^TX_2^TX_2w_2+c_2w_2^Tw_2=n\end{aligned}\end{align} \]

Parameters

latent_dims (int, optional) – Number of latent dimensions to use, by default 1
scale (bool, optional) – Whether to scale the data, by default True
centre (bool, optional) – Whether to centre the data, by default True
copy_data (bool, optional) – Whether to copy the data, by default True
random_state (int, optional) – Random state, by default None
c (Union[Iterable[float], float], optional) – Regularisation parameter, by default None
accept_sparse (Union[bool, str], optional) – Whether to accept sparse data, by default None

References

Vinod, Hrishikesh D. “Canonical ridge and econometrics of joint production.” Journal of econometrics 4.2 (1976): 147-166.

Example

>>> from cca_zoo.models import rCCA
>>> import numpy as np
>>> rng=np.random.RandomState(0)
>>> X1 = rng.random((10,5))
>>> X2 = rng.random((10,5))
>>> model = rCCA(c=[0.1,0.1])
>>> model.fit((X1,X2)).score((X1,X2))
array([0.95222128])

Parameters

latent_dims (int, optional) – Number of latent dimensions to fit. Default is 1.
scale (bool, optional) – Whether to scale the data to unit variance. Default is True.
centre (bool, optional) – Whether to centre the data. Default is True.
copy_data (bool, optional) – Whether to copy the data. Default is True.
accept_sparse (bool, optional) – Whether to accept sparse data. Default is False.
random_state (int, RandomState instance or None, optional (default=None)) – Pass an int for reproducible output across multiple function calls.

fit(views: Iterable[ndarray], y=None, **kwargs)[source]

Fits the model to the given data

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
y (None) –
kwargs (any additional keyword arguments required by the given model) –

Returns

self

Return type

object

fit_transform(views: Iterable[ndarray], **kwargs)

Fits the model to the given data and returns the transformed views

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –

Returns

transformed_views

Return type

list of numpy arrays

get_factor_loadings(views: Iterable[ndarray], normalize=True, **kwargs)

Returns the factor loadings for each view

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
normalize (bool, optional) – Whether to normalize the factor loadings. Default is True.
kwargs (any additional keyword arguments required by the given model) –

Returns

factor_loadings

Return type

list of numpy arrays

get_params(deep=True)

Get parameters for this estimator.

Parameters: deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns: params – Parameter names mapped to their values.
Return type: dict

pairwise_correlations(views: Iterable[ndarray], **kwargs)

Returns the pairwise correlations between the views in each dimension

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –

Returns

pairwise_correlations

Return type

numpy array of shape (n_views, n_views, latent_dims)

score(views: Iterable[ndarray], y=None, **kwargs)

Returns the average pairwise correlation between the views

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
y (None) –
kwargs (any additional keyword arguments required by the given model) –

Returns

score

Return type

float

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters: **params (dict) – Estimator parameters.
Returns: self – Estimator instance.
Return type: estimator instance

transform(views: Iterable[ndarray], **kwargs)

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –

Returns

transformed_views

Return type

list of numpy arrays

class cca_zoo.models.CCA(latent_dims: int = 1, scale: bool = True, centre=True, copy_data=True, random_state=None)[source]

Bases: rCCA

A class used to fit a simple CCA model

\[ \begin{align}\begin{aligned}\begin{split}w_{opt}=\underset{w}{\mathrm{argmax}}\{ w_1^TX_1^TX_2w_2 \}\\\end{split}\\\text{subject to:}\\w_1^TX_1^TX_1w_1=n\\w_2^TX_2^TX_2w_2=n\end{aligned}\end{align} \]

Parameters

latent_dims (int, optional) – Number of latent dimensions to use, by default 1
scale (bool, optional) – Whether to scale the data, by default True
centre (bool, optional) – Whether to centre the data, by default True
copy_data (bool, optional) – Whether to copy the data, by default True
random_state (int, optional) – Random state, by default None
accept_sparse (Union[bool, str], optional) – Whether to accept sparse data, by default None

References

Hotelling, Harold. “Relations between two sets of variates.” Breakthroughs in statistics. Springer, New York, NY, 1992. 162-190.

Example

>>> from cca_zoo.models import CCA
>>> import numpy as np
>>> rng=np.random.RandomState(0)
>>> X1 = rng.random((10,5))
>>> X2 = rng.random((10,5))
>>> model = CCA()
>>> model.fit((X1,X2)).score((X1,X2))
array([1.])

Parameters

latent_dims (int, optional) – Number of latent dimensions to fit. Default is 1.
scale (bool, optional) – Whether to scale the data to unit variance. Default is True.
centre (bool, optional) – Whether to centre the data. Default is True.
copy_data (bool, optional) – Whether to copy the data. Default is True.
accept_sparse (bool, optional) – Whether to accept sparse data. Default is False.
random_state (int, RandomState instance or None, optional (default=None)) – Pass an int for reproducible output across multiple function calls.

fit(views: Iterable[ndarray], y=None, **kwargs)

Fits the model to the given data

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
y (None) –
kwargs (any additional keyword arguments required by the given model) –

Returns

self

Return type

object

fit_transform(views: Iterable[ndarray], **kwargs)

Fits the model to the given data and returns the transformed views

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –

Returns

transformed_views

Return type

list of numpy arrays

get_factor_loadings(views: Iterable[ndarray], normalize=True, **kwargs)

Returns the factor loadings for each view

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
normalize (bool, optional) – Whether to normalize the factor loadings. Default is True.
kwargs (any additional keyword arguments required by the given model) –

Returns

factor_loadings

Return type

list of numpy arrays

get_params(deep=True)

Get parameters for this estimator.

Parameters: deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns: params – Parameter names mapped to their values.
Return type: dict

pairwise_correlations(views: Iterable[ndarray], **kwargs)

Returns the pairwise correlations between the views in each dimension

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –

Returns

pairwise_correlations

Return type

numpy array of shape (n_views, n_views, latent_dims)

score(views: Iterable[ndarray], y=None, **kwargs)

Returns the average pairwise correlation between the views

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
y (None) –
kwargs (any additional keyword arguments required by the given model) –

Returns

score

Return type

float

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters: **params (dict) – Estimator parameters.
Returns: self – Estimator instance.
Return type: estimator instance

transform(views: Iterable[ndarray], **kwargs)

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –

Returns

transformed_views

Return type

list of numpy arrays

class cca_zoo.models.PLS(latent_dims: int = 1, scale: bool = True, centre=True, copy_data=True, random_state=None)[source]

Bases: rCCA

A class used to fit a simple PLS model

Implements PLS by inheriting regularised CCA with maximal regularisation

\[ \begin{align}\begin{aligned}\begin{split}w_{opt}=\underset{w}{\mathrm{argmax}}\{ w_1^TX_1^TX_2w_2 \}\\\end{split}\\\text{subject to:}\\w_1^Tw_1=1\\w_2^Tw_2=1\end{aligned}\end{align} \]

Parameters

latent_dims (int, optional) – Number of latent dimensions to use, by default 1
scale (bool, optional) – Whether to scale the data, by default True
centre (bool, optional) – Whether to centre the data, by default True
copy_data (bool, optional) – Whether to copy the data, by default True
random_state (int, optional) – Random state, by default None
accept_sparse (Union[bool, str], optional) – Whether to accept sparse data, by default None

Example

>>> from cca_zoo.models import PLS
>>> import numpy as np
>>> rng=np.random.RandomState(0)
>>> X1 = rng.random((10,5))
>>> X2 = rng.random((10,5))
>>> model = PLS()
>>> model.fit((X1,X2)).score((X1,X2))
array([0.81796873])

Parameters

latent_dims (int, optional) – Number of latent dimensions to fit. Default is 1.
scale (bool, optional) – Whether to scale the data to unit variance. Default is True.
centre (bool, optional) – Whether to centre the data. Default is True.
copy_data (bool, optional) – Whether to copy the data. Default is True.
accept_sparse (bool, optional) – Whether to accept sparse data. Default is False.
random_state (int, RandomState instance or None, optional (default=None)) – Pass an int for reproducible output across multiple function calls.

fit(views: Iterable[ndarray], y=None, **kwargs)

Fits the model to the given data

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
y (None) –
kwargs (any additional keyword arguments required by the given model) –

Returns

self

Return type

object

fit_transform(views: Iterable[ndarray], **kwargs)

Fits the model to the given data and returns the transformed views

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –

Returns

transformed_views

Return type

list of numpy arrays

get_factor_loadings(views: Iterable[ndarray], normalize=True, **kwargs)

Returns the factor loadings for each view

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
normalize (bool, optional) – Whether to normalize the factor loadings. Default is True.
kwargs (any additional keyword arguments required by the given model) –

Returns

factor_loadings

Return type

list of numpy arrays

get_params(deep=True)

Get parameters for this estimator.

Parameters: deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns: params – Parameter names mapped to their values.
Return type: dict

pairwise_correlations(views: Iterable[ndarray], **kwargs)

Returns the pairwise correlations between the views in each dimension

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –

Returns

pairwise_correlations

Return type

numpy array of shape (n_views, n_views, latent_dims)

score(views: Iterable[ndarray], y=None, **kwargs)

Returns the average pairwise correlation between the views

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
y (None) –
kwargs (any additional keyword arguments required by the given model) –

Returns

score

Return type

float

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters: **params (dict) – Estimator parameters.
Returns: self – Estimator instance.
Return type: estimator instance

transform(views: Iterable[ndarray], **kwargs)

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –

Returns

transformed_views

Return type

list of numpy arrays

class cca_zoo.models.TCCA(latent_dims: int = 1, scale=True, centre=True, copy_data=True, random_state=None, c: Optional[Union[Iterable[float], float]] = None)[source]

Bases: _BaseCCA

Fits a Tensor CCA model. Tensor CCA maximises higher order correlations between the views.

\[ \begin{align}\begin{aligned}\begin{split}\alpha_{opt}=\underset{\alpha}{\mathrm{argmax}}\{\sum_i\sum_{j\neq i} \alpha_i^TK_i^TK_j\alpha_j \}\\\end{split}\\\text{subject to:}\\\alpha_i^TK_i^TK_i\alpha_i=1\end{aligned}\end{align} \]

Parameters

latent_dims (int, default=1) – The number of latent dimensions to use
scale (bool, default=True) – Whether to scale the data to unit variance
centre (bool, default=True) – Whether to centre the data
copy_data (bool, default=True) – Whether to copy the data or not
random_state (int, default=None) – The random state to use
c (float or list of floats, default=None) – The regularisation parameter for each view. If None, defaults to 0 for each view.

References

Kim, Tae-Kyun, Shu-Fai Wong, and Roberto Cipolla. “Tensor canonical correlation analysis for action classification.” 2007 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2007 https://github.com/rciszek/mdr_tcca

Examples

>>> from cca_zoo.models import TCCA
>>> rng=np.random.RandomState(0)
>>> X1 = rng.random((10,5))
>>> X2 = rng.random((10,5))
>>> X3 = rng.random((10,5))
>>> model = TCCA()
>>> model._fit((X1,X2,X3)).score((X1,X2,X3))
array([1.14595755])

Parameters

latent_dims (int, optional) – Number of latent dimensions to fit. Default is 1.
scale (bool, optional) – Whether to scale the data to unit variance. Default is True.
centre (bool, optional) – Whether to centre the data. Default is True.
copy_data (bool, optional) – Whether to copy the data. Default is True.
accept_sparse (bool, optional) – Whether to accept sparse data. Default is False.
random_state (int, RandomState instance or None, optional (default=None)) – Pass an int for reproducible output across multiple function calls.

fit(views: Iterable[ndarray], y=None, **kwargs)[source]

Fits the model to the given data

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
y (None) –
kwargs (any additional keyword arguments required by the given model) –

Returns

self

Return type

object

correlations(views: Iterable[ndarray], **kwargs)[source]

Predicts the correlation for the given data using the fit model

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

score(views: Iterable[ndarray], **kwargs)[source]

Returns the higher order correlations in each dimension

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

fit_transform(views: Iterable[ndarray], **kwargs)

Fits the model to the given data and returns the transformed views

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –

Returns

transformed_views

Return type

list of numpy arrays

get_factor_loadings(views: Iterable[ndarray], normalize=True, **kwargs)

Returns the factor loadings for each view

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
normalize (bool, optional) – Whether to normalize the factor loadings. Default is True.
kwargs (any additional keyword arguments required by the given model) –

Returns

factor_loadings

Return type

list of numpy arrays

get_params(deep=True)

Get parameters for this estimator.

Parameters: deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns: params – Parameter names mapped to their values.
Return type: dict

pairwise_correlations(views: Iterable[ndarray], **kwargs)

Returns the pairwise correlations between the views in each dimension

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –

Returns

pairwise_correlations

Return type

numpy array of shape (n_views, n_views, latent_dims)

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters: **params (dict) – Estimator parameters.
Returns: self – Estimator instance.
Return type: estimator instance

transform(views: Iterable[ndarray], **kwargs)

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –

Returns

transformed_views

Return type

list of numpy arrays

class cca_zoo.models.KTCCA(latent_dims: int = 1, scale: bool = True, centre=True, copy_data=True, random_state=None, eps=0.001, c: Optional[Union[Iterable[float], float]] = None, kernel: Optional[Iterable[Union[float, callable]]] = None, gamma: Optional[Iterable[float]] = None, degree: Optional[Iterable[float]] = None, coef0: Optional[Iterable[float]] = None, kernel_params: Optional[Iterable[dict]] = None)[source]

Bases: TCCA

Fits a Kernel Tensor CCA model. Tensor CCA maximises higher order correlations

\[ \begin{align}\begin{aligned}\begin{split}\alpha_{opt}=\underset{\alpha}{\mathrm{argmax}}\{\sum_i\sum_{j\neq i} \alpha_i^TK_i^TK_j\alpha_j \}\\\end{split}\\\text{subject to:}\\\alpha_i^TK_i^TK_i\alpha_i=1\end{aligned}\end{align} \]

References

Kim, Tae-Kyun, Shu-Fai Wong, and Roberto Cipolla. “Tensor canonical correlation analysis for action classification.” 2007 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2007

Examples

>>> from cca_zoo.models import KTCCA
>>> rng=np.random.RandomState(0)
>>> X1 = rng.random((10,5))
>>> X2 = rng.random((10,5))
>>> X3 = rng.random((10,5))
>>> model = KTCCA()
>>> model.fit((X1,X2,X3)).score((X1,X2,X3))
array([1.69896269])

Parameters

latent_dims (int, optional) – Number of latent dimensions to fit. Default is 1.
scale (bool, optional) – Whether to scale the data to unit variance. Default is True.
centre (bool, optional) – Whether to centre the data. Default is True.
copy_data (bool, optional) – Whether to copy the data. Default is True.
accept_sparse (bool, optional) – Whether to accept sparse data. Default is False.
random_state (int, RandomState instance or None, optional (default=None)) – Pass an int for reproducible output across multiple function calls.

transform(views: ndarray, **kwargs)[source]

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –

Returns

transformed_views

Return type

list of numpy arrays

correlations(views: Iterable[ndarray], **kwargs)

Predicts the correlation for the given data using the fit model

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

fit(views: Iterable[ndarray], y=None, **kwargs)

Fits the model to the given data

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
y (None) –
kwargs (any additional keyword arguments required by the given model) –

Returns

self

Return type

object

fit_transform(views: Iterable[ndarray], **kwargs)

Fits the model to the given data and returns the transformed views

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –

Returns

transformed_views

Return type

list of numpy arrays

get_factor_loadings(views: Iterable[ndarray], normalize=True, **kwargs)

Returns the factor loadings for each view

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
normalize (bool, optional) – Whether to normalize the factor loadings. Default is True.
kwargs (any additional keyword arguments required by the given model) –

Returns

factor_loadings

Return type

list of numpy arrays

get_params(deep=True)

Get parameters for this estimator.

Parameters: deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns: params – Parameter names mapped to their values.
Return type: dict

pairwise_correlations(views: Iterable[ndarray], **kwargs)

Returns the pairwise correlations between the views in each dimension

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –

Returns

pairwise_correlations

Return type

numpy array of shape (n_views, n_views, latent_dims)

score(views: Iterable[ndarray], **kwargs)

Returns the higher order correlations in each dimension

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters: **params (dict) – Estimator parameters.
Returns: self – Estimator instance.
Return type: estimator instance

class cca_zoo.models.PRCCA(latent_dims: int = 1, scale: bool = True, centre=True, copy_data=True, random_state=None, eps=0.001, c=0)[source]

Bases: MCCA

Partially Regularized Canonical Correlation Analysis

Parameters

latent_dims (int, optional) – Number of latent dimensions to use, by default 1
scale (bool, optional) – Whether to scale the data, by default True
centre (bool, optional) – Whether to centre the data, by default True
copy_data (bool, optional) – Whether to copy the data, by default True
random_state (int, optional) – Random state for reproducibility, by default None
eps (float, optional) – Tolerance for convergence, by default 1e-3
c (Union[Iterable[float], float], optional) – Regularisation parameter, by default None

References

Tuzhilina, Elena, Leonardo Tozzi, and Trevor Hastie. “Canonical correlation analysis in high dimensions with structured regularization.” Statistical Modelling (2021): 1471082X211041033.

Parameters

c (Union[Iterable[float], float], optional) – Regularisation parameter, by default None
eps (float, optional) – Tolerance for convergence, by default 1e-3

fit(views: Iterable[ndarray], y=None, idxs=None, **kwargs)[source]

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
y (None) –
idxs (list/tuple of integers indicating which features from each view are the partially regularised features) –
kwargs (any additional keyword arguments required by the given model) –

fit_transform(views: Iterable[ndarray], **kwargs)

Fits the model to the given data and returns the transformed views

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –

Returns

transformed_views

Return type

list of numpy arrays

get_factor_loadings(views: Iterable[ndarray], normalize=True, **kwargs)

Returns the factor loadings for each view

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
normalize (bool, optional) – Whether to normalize the factor loadings. Default is True.
kwargs (any additional keyword arguments required by the given model) –

Returns

factor_loadings

Return type

list of numpy arrays

get_params(deep=True)

Get parameters for this estimator.

Parameters: deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns: params – Parameter names mapped to their values.
Return type: dict

pairwise_correlations(views: Iterable[ndarray], **kwargs)

Returns the pairwise correlations between the views in each dimension

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –

Returns

pairwise_correlations

Return type

numpy array of shape (n_views, n_views, latent_dims)

score(views: Iterable[ndarray], y=None, **kwargs)

Returns the average pairwise correlation between the views

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
y (None) –
kwargs (any additional keyword arguments required by the given model) –

Returns

score

Return type

float

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters: **params (dict) – Estimator parameters.
Returns: self – Estimator instance.
Return type: estimator instance

transform(views: Iterable[ndarray], **kwargs)

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –

Returns

transformed_views

Return type

list of numpy arrays

class cca_zoo.models.GRCCA(latent_dims: int = 1, scale: bool = True, centre=True, copy_data=True, random_state=None, eps=0.001, c: float = 0, mu: float = 0)[source]

Bases: MCCA

Grouped Regularized Canonical Correlation Analysis

Parameters

latent_dims (int, default=1) – Number of latent dimensions to use
scale (bool, default=True) – Whether to scale the data to unit variance
centre (bool, default=True) – Whether to centre the data
copy_data (bool, default=True) – Whether to copy the data
random_state (int, default=None) – Random state for initialisation
eps (float, default=1e-3) – Tolerance for convergence
c (float, default=0) – Regularization parameter for the group means
mu (float, default=0) – Regularization parameter for the group sizes

References

Tuzhilina, Elena, Leonardo Tozzi, and Trevor Hastie. “Canonical correlation analysis in high dimensions with structured regularization.” Statistical Modelling (2021): 1471082X211041033.

Parameters

latent_dims (int, optional) – Number of latent dimensions to fit. Default is 1.
scale (bool, optional) – Whether to scale the data to unit variance. Default is True.
centre (bool, optional) – Whether to centre the data. Default is True.
copy_data (bool, optional) – Whether to copy the data. Default is True.
accept_sparse (bool, optional) – Whether to accept sparse data. Default is False.
random_state (int, RandomState instance or None, optional (default=None)) – Pass an int for reproducible output across multiple function calls.

fit(views: Iterable[ndarray], y=None, feature_groups=None, **kwargs)[source]

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
y (None) –
feature_groups (list/tuple of integer numpy arrays or array likes with dimensions (,view shape)) –
kwargs (any additional keyword arguments required by the given model) –

fit_transform(views: Iterable[ndarray], **kwargs)

Fits the model to the given data and returns the transformed views

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –

Returns

transformed_views

Return type

list of numpy arrays

get_factor_loadings(views: Iterable[ndarray], normalize=True, **kwargs)

Returns the factor loadings for each view

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
normalize (bool, optional) – Whether to normalize the factor loadings. Default is True.
kwargs (any additional keyword arguments required by the given model) –

Returns

factor_loadings

Return type

list of numpy arrays

get_params(deep=True)

Get parameters for this estimator.

Parameters: deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns: params – Parameter names mapped to their values.
Return type: dict

pairwise_correlations(views: Iterable[ndarray], **kwargs)

Returns the pairwise correlations between the views in each dimension

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –

Returns

pairwise_correlations

Return type

numpy array of shape (n_views, n_views, latent_dims)

score(views: Iterable[ndarray], y=None, **kwargs)

Returns the average pairwise correlation between the views

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
y (None) –
kwargs (any additional keyword arguments required by the given model) –

Returns

score

Return type

float

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters: **params (dict) – Estimator parameters.
Returns: self – Estimator instance.
Return type: estimator instance

transform(views: Iterable[ndarray], **kwargs)

Parameters

views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –

Returns

transformed_views

Return type

list of numpy arrays