Models

class cca_zoo.models.GCCA(latent_dims: int = 1, scale: bool = True, centre=True, copy_data=True, random_state=None, c: Optional[Union[Iterable[float], float]] = None, view_weights: Optional[Iterable[float]] = None, eps=1e-09)[source]

Bases: rCCA

A class used to fit GCCA model. For more than 2 views, GCCA optimizes the sum of correlations with a shared auxiliary vector

Maths

\[ \begin{align}\begin{aligned}\begin{split}w_{opt}=\underset{w}{\mathrm{argmax}}\{ \sum_iw_i^TX_i^TT \}\\\end{split}\\\text{subject to:}\\T^TT=1\end{aligned}\end{align} \]

Citation

Tenenhaus, Arthur, and Michel Tenenhaus. “Regularized generalized canonical correlation analysis.” Psychometrika 76.2 (2011): 257.

Example

>>> from cca_zoo.models import GCCA
>>> import numpy as np
>>> rng=np.random.RandomState(0)
>>> X1 = rng.random((10,5))
>>> X2 = rng.random((10,5))
>>> X3 = rng.random((10,5))
>>> model = GCCA()
>>> model.fit((X1,X2,X3)).score((X1,X2,X3))
array([0.97229856])

Constructor for GCCA

Parameters

latent_dims – number of latent dimensions to fit
scale – normalize variance in each column before fitting
centre – demean data by column before fitting (and before transforming out of sample
copy_data – If True, views will be copied; else, it may be overwritten
random_state – Pass for reproducible output across multiple function calls
c – regularisation between 0 (CCA) and 1 (PLS)
view_weights – list of weights of each view

fit(views: Iterable[ndarray], y=None, **kwargs)

Fits a regularised CCA (canonical ridge) model

Parameters: views – list/tuple of numpy arrays or array likes with the same number of rows (samples)

fit_transform(views: Iterable[ndarray], **kwargs)

Fits and then transforms the training data

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

get_loadings(views: Iterable[ndarray], normalize=False, **kwargs)

Returns the model loadings for each view for the given data

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model
normalize – scales loadings to ensure that they represent correlations between features and scores

get_params(deep=True)

Get parameters for this estimator.

Parameters: deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns: params – Parameter names mapped to their values.
Return type: dict

pairwise_correlations(views: Iterable[ndarray], **kwargs)

Predicts the correlations between each view for each dimension for the given data using the fit model

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

Returns

all_corrs: an array of the pairwise correlations (k,k,self.latent_dims) where k is the number of views

score(views: Iterable[ndarray], y=None, **kwargs)

Returns average correlation in each dimension (averages over all pairs for multiview)

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
y – unused but needed to integrate with scikit-learn

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters: **params (dict) – Estimator parameters.
Returns: self – Estimator instance.
Return type: estimator instance

transform(views: Iterable[ndarray], **kwargs)

Transforms data given a fit model

Parameters

views – numpy arrays with the same number of rows (samples) separated by commas
kwargs – any additional keyword arguments required by the given model

class cca_zoo.models.KGCCA(latent_dims: int = 1, scale: bool = True, centre=True, copy_data=True, random_state=None, c: Optional[Union[Iterable[float], float]] = None, eps=0.001, kernel: Optional[Iterable[Union[float, callable]]] = None, gamma: Optional[Iterable[float]] = None, degree: Optional[Iterable[float]] = None, coef0: Optional[Iterable[float]] = None, kernel_params: Optional[Iterable[dict]] = None)[source]

Bases: GCCA

A class used to fit KGCCA model. For more than 2 views, KGCCA optimizes the sum of correlations with a shared auxiliary vector

Maths

\[ \begin{align}\begin{aligned}\begin{split}w_{opt}=\underset{w}{\mathrm{argmax}}\{ \sum_i\alpha_i^TK_i^TT \}\\\end{split}\\\text{subject to:}\\T^TT=1\end{aligned}\end{align} \]

Citation

Tenenhaus, Arthur, Cathy Philippe, and Vincent Frouin. “Kernel generalized canonical correlation analysis.” Computational Statistics & Data Analysis 90 (2015): 114-131.

Example

>>> from cca_zoo.models import KGCCA
>>> import numpy as np
>>> rng=np.random.RandomState(0)
>>> X1 = rng.random((10,5))
>>> X2 = rng.random((10,5))
>>> X3 = rng.random((10,5))
>>> model = KGCCA()
>>> model.fit((X1,X2,X3)).score((X1,X2,X3))
array([0.97019284])

Constructor for PLS

Parameters

latent_dims – number of latent dimensions to fit
scale – normalize variance in each column before fitting
centre – demean data by column before fitting (and before transforming out of sample
copy_data – If True, views will be copied; else, it may be overwritten
random_state – Pass for reproducible output across multiple function calls
c – Iterable of regularisation parameters for each view (between 0:CCA and 1:PLS)
eps – epsilon for stability
kernel – Iterable of kernel mappings used internally. This parameter is directly passed to pairwise_kernel. If element of kernel is a string, it must be one of the metrics in pairwise.PAIRWISE_KERNEL_FUNCTIONS. Alternatively, if element of kernel is a callable function, it is called on each pair of instances (rows) and the resulting value recorded. The callable should take two rows from views as input and return the corresponding kernel value as a single number. This means that callables from sklearn.metrics.pairwise are not allowed, as they operate on matrices, not single samples. Use the string identifying the kernel instead.
gamma – Iterable of gamma parameters for the RBF, laplacian, polynomial, exponential chi2 and sigmoid kernels. Interpretation of the default value is left to the kernel; see the documentation for sklearn.metrics.pairwise. Ignored by other kernels.
degree – Iterable of degree parameters of the polynomial kernel. Ignored by other kernels.
coef0 – Iterable of zero coefficients for polynomial and sigmoid kernels. Ignored by other kernels.
kernel_params – Iterable of additional parameters (keyword arguments) for kernel function passed as callable object.
eps – epsilon value to ensure stability of smallest eigenvalues

transform(views: ndarray, y=None, **kwargs)[source]

Transforms data given a fit KGCCA model

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

fit(views: Iterable[ndarray], y=None, **kwargs)

Fits a regularised CCA (canonical ridge) model

Parameters: views – list/tuple of numpy arrays or array likes with the same number of rows (samples)

fit_transform(views: Iterable[ndarray], **kwargs)

Fits and then transforms the training data

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

get_loadings(views: Iterable[ndarray], normalize=False, **kwargs)

Returns the model loadings for each view for the given data

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model
normalize – scales loadings to ensure that they represent correlations between features and scores

get_params(deep=True)

Get parameters for this estimator.

Parameters: deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns: params – Parameter names mapped to their values.
Return type: dict

pairwise_correlations(views: Iterable[ndarray], **kwargs)

Predicts the correlations between each view for each dimension for the given data using the fit model

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

Returns

all_corrs: an array of the pairwise correlations (k,k,self.latent_dims) where k is the number of views

score(views: Iterable[ndarray], y=None, **kwargs)

Returns average correlation in each dimension (averages over all pairs for multiview)

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
y – unused but needed to integrate with scikit-learn

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters: **params (dict) – Estimator parameters.
Returns: self – Estimator instance.
Return type: estimator instance

class cca_zoo.models.PLS_ALS(latent_dims: int = 1, scale: bool = True, centre=True, copy_data=True, random_state=None, max_iter: int = 100, initialization: Union[str, callable] = 'random', tol: float = 1e-09, verbose=0)[source]

Bases: _BaseIterative

A class used to fit a PLS model

Fits a partial least squares model with CCA deflation by NIPALS algorithm

\[ \begin{align}\begin{aligned}\begin{split}w_{opt}=\underset{w}{\mathrm{argmax}}\{\sum_i\sum_{j\neq i} \|X_iw_i-X_jw_j\|^2\}\\\end{split}\\\text{subject to:}\\w_i^Tw_i=1\end{aligned}\end{align} \]

Example

>>> from cca_zoo.models import PLS
>>> import numpy as np
>>> rng=np.random.RandomState(0)
>>> X1 = rng.random((10,5))
>>> X2 = rng.random((10,5))
>>> model = PLS_ALS(random_state=0)
>>> model.fit((X1,X2)).score((X1,X2))
array([0.81796854])

Constructor for PLS

Parameters

latent_dims – number of latent dimensions to fit
scale – normalize variance in each column before fitting
centre – demean data by column before fitting (and before transforming out of sample
copy_data – If True, views will be copied; else, it may be overwritten
random_state – Pass for reproducible output across multiple function calls
max_iter – the maximum number of iterations to perform in the inner optimization loop
initialization – either string from “pls”, “cca”, “random”, “uniform” or callable to initialize the score variables for _iterative methods
tol – tolerance value used for early stopping

fit(views: Iterable[ndarray], y=None, **kwargs)

Fits the model by running an inner loop to convergence and then using either CCA or PLS deflation

Parameters: views – list/tuple of numpy arrays or array likes with the same number of rows (samples)

fit_transform(views: Iterable[ndarray], **kwargs)

Fits and then transforms the training data

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

get_loadings(views: Iterable[ndarray], normalize=False, **kwargs)

Returns the model loadings for each view for the given data

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model
normalize – scales loadings to ensure that they represent correlations between features and scores

get_params(deep=True)

Get parameters for this estimator.

Parameters: deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns: params – Parameter names mapped to their values.
Return type: dict

pairwise_correlations(views: Iterable[ndarray], **kwargs)

Predicts the correlations between each view for each dimension for the given data using the fit model

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

Returns

all_corrs: an array of the pairwise correlations (k,k,self.latent_dims) where k is the number of views

score(views: Iterable[ndarray], y=None, **kwargs)

Returns average correlation in each dimension (averages over all pairs for multiview)

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
y – unused but needed to integrate with scikit-learn

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters: **params (dict) – Estimator parameters.
Returns: self – Estimator instance.
Return type: estimator instance

transform(views: Iterable[ndarray], **kwargs)

Transforms data given a fit model

Parameters

views – numpy arrays with the same number of rows (samples) separated by commas
kwargs – any additional keyword arguments required by the given model

class cca_zoo.models.SCCA_PMD(latent_dims: int = 1, scale: bool = True, centre=True, copy_data=True, random_state=None, deflation='cca', c: Optional[Union[Iterable[float], float]] = None, max_iter: int = 100, initialization: Union[str, callable] = 'pls', tol: float = 1e-09, positive: Optional[Union[Iterable[bool], bool]] = None, verbose=0)[source]

Bases: _BaseIterative

Fits a Sparse CCA (Penalized Matrix Decomposition) model.

Maths

\[ \begin{align}\begin{aligned}\begin{split}w_{opt}=\underset{w}{\mathrm{argmax}}\{ w_1^TX_1^TX_2w_2 \}\\\end{split}\\\text{subject to:}\\w_i^Tw_i=1\\\|w_i\|<=c_i\end{aligned}\end{align} \]

Citation

Witten, Daniela M., Robert Tibshirani, and Trevor Hastie. “A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis.” Biostatistics 10.3 (2009): 515-534.

Example

>>> from cca_zoo.models import SCCA_PMD
>>> import numpy as np
>>> rng=np.random.RandomState(0)
>>> X1 = rng.random((10,5))
>>> X2 = rng.random((10,5))
>>> model = SCCA_PMD(c=[1,1],random_state=0)
>>> model.fit((X1,X2)).score((X1,X2))
array([0.81796873])

Constructor for SCCA_PMD

Parameters

latent_dims – number of latent dimensions to fit
scale – normalize variance in each column before fitting
centre – demean data by column before fitting (and before transforming out of sample
copy_data – If True, views will be copied; else, it may be overwritten
random_state – Pass for reproducible output across multiple function calls
c – l1 regularisation parameter between 1 and sqrt(number of features) for each view
max_iter – the maximum number of iterations to perform in the inner optimization loop
initialization – either string from “pls”, “cca”, “random”, “uniform” or callable to initialize the score variables for _iterative methods
tol – tolerance value used for early stopping
positive – constrain model weights to be positive

fit(views: Iterable[ndarray], y=None, **kwargs)

Fits the model by running an inner loop to convergence and then using either CCA or PLS deflation

Parameters: views – list/tuple of numpy arrays or array likes with the same number of rows (samples)

fit_transform(views: Iterable[ndarray], **kwargs)

Fits and then transforms the training data

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

get_loadings(views: Iterable[ndarray], normalize=False, **kwargs)

Returns the model loadings for each view for the given data

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model
normalize – scales loadings to ensure that they represent correlations between features and scores

get_params(deep=True)

Get parameters for this estimator.

Parameters: deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns: params – Parameter names mapped to their values.
Return type: dict

pairwise_correlations(views: Iterable[ndarray], **kwargs)

Predicts the correlations between each view for each dimension for the given data using the fit model

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

Returns

all_corrs: an array of the pairwise correlations (k,k,self.latent_dims) where k is the number of views

score(views: Iterable[ndarray], y=None, **kwargs)

Returns average correlation in each dimension (averages over all pairs for multiview)

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
y – unused but needed to integrate with scikit-learn

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters: **params (dict) – Estimator parameters.
Returns: self – Estimator instance.
Return type: estimator instance

transform(views: Iterable[ndarray], **kwargs)

Transforms data given a fit model

Parameters

views – numpy arrays with the same number of rows (samples) separated by commas
kwargs – any additional keyword arguments required by the given model

class cca_zoo.models.ElasticCCA(latent_dims: int = 1, scale: bool = True, centre=True, copy_data=True, random_state=None, deflation='cca', max_iter: int = 100, initialization: Union[str, callable] = 'pls', tol: float = 1e-09, c: Optional[Union[Iterable[float], float]] = None, l1_ratio: Optional[Union[Iterable[float], float]] = None, maxvar: bool = True, stochastic=False, positive: Optional[Union[Iterable[bool], bool]] = None, verbose=0)[source]

Bases: _BaseIterative

Fits an elastic CCA by iterating elastic net regressions.

By default, ElasticCCA uses CCA with an auxiliary variable target i.e. MAXVAR configuration

\[ \begin{align}\begin{aligned}\begin{split}w_{opt}, t_{opt}=\underset{w,t}{\mathrm{argmax}}\{\sum_i \|X_iw_i-t\|^2 + c\|w_i\|^2_2 + \text{l1_ratio}\|w_i\|_1\}\\\end{split}\\\text{subject to:}\\t^Tt=n\end{aligned}\end{align} \]

But we can force it to attempt to use the SUMCOR form which will approximate a solution to the problem:

\[ \begin{align}\begin{aligned}\begin{split}w_{opt}=\underset{w}{\mathrm{argmax}}\{\sum_i\sum_{j\neq i} \|X_iw_i-X_jw_j\|^2 + c\|w_i\|^2_2 + \text{l1_ratio}\|w_i\|_1\}\\\end{split}\\\text{subject to:}\\w_i^TX_i^TX_iw_i=n\end{aligned}\end{align} \]

Example

>>> from cca_zoo.models import ElasticCCA
>>> import numpy as np
>>> rng=np.random.RandomState(0)
>>> X1 = rng.random((10,5))
>>> X2 = rng.random((10,5))
>>> model = ElasticCCA(c=[1e-1,1e-1],l1_ratio=[0.5,0.5], random_state=0)
>>> model.fit((X1,X2)).score((X1,X2))
array([0.9316638])

Constructor for ElasticCCA

Parameters

latent_dims – number of latent dimensions to fit
scale – normalize variance in each column before fitting
centre – demean data by column before fitting (and before transforming out of sample
copy_data – If True, views will be copied; else, it may be overwritten
random_state – Pass for reproducible output across multiple function calls
deflation – the type of deflation.
max_iter – the maximum number of iterations to perform in the inner optimization loop
initialization – either string from “pls”, “cca”, “random”, “uniform” or callable to initialize the score variables for _iterative methods
tol – tolerance value used for early stopping
c – lasso alpha
l1_ratio – l1 ratio in lasso subproblems
maxvar – use auxiliary variable “maxvar” formulation
stochastic – use _stochastic regression optimisers for subproblems
positive – constrain model weights to be positive

fit(views: Iterable[ndarray], y=None, **kwargs)

Fits the model by running an inner loop to convergence and then using either CCA or PLS deflation

Parameters: views – list/tuple of numpy arrays or array likes with the same number of rows (samples)

fit_transform(views: Iterable[ndarray], **kwargs)

Fits and then transforms the training data

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

get_loadings(views: Iterable[ndarray], normalize=False, **kwargs)

Returns the model loadings for each view for the given data

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model
normalize – scales loadings to ensure that they represent correlations between features and scores

get_params(deep=True)

Get parameters for this estimator.

Parameters: deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns: params – Parameter names mapped to their values.
Return type: dict

pairwise_correlations(views: Iterable[ndarray], **kwargs)

Predicts the correlations between each view for each dimension for the given data using the fit model

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

Returns

all_corrs: an array of the pairwise correlations (k,k,self.latent_dims) where k is the number of views

score(views: Iterable[ndarray], y=None, **kwargs)

Returns average correlation in each dimension (averages over all pairs for multiview)

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
y – unused but needed to integrate with scikit-learn

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters: **params (dict) – Estimator parameters.
Returns: self – Estimator instance.
Return type: estimator instance

transform(views: Iterable[ndarray], **kwargs)

Transforms data given a fit model

Parameters

views – numpy arrays with the same number of rows (samples) separated by commas
kwargs – any additional keyword arguments required by the given model

class cca_zoo.models.SCCA_Parkhomenko(latent_dims: int = 1, scale: bool = True, centre=True, copy_data=True, random_state=None, deflation='cca', c: Optional[Union[Iterable[float], float]] = None, max_iter: int = 100, initialization: Union[str, callable] = 'pls', tol: float = 1e-09, verbose=0)[source]

Bases: _BaseIterative

Fits a sparse CCA (penalized CCA) model

\[ \begin{align}\begin{aligned}\begin{split}w_{opt}=\underset{w}{\mathrm{argmax}}\{ w_1^TX_1^TX_2w_2 \} + c_i\|w_i\|\\\end{split}\\\text{subject to:}\\w_i^Tw_i=1\end{aligned}\end{align} \]

Citation

Parkhomenko, Elena, David Tritchler, and Joseph Beyene. “Sparse canonical correlation analysis with application to genomic data integration.” Statistical applications in genetics and molecular biology 8.1 (2009).

Example

>>> from cca_zoo.models import SCCA_Parkhomenko
>>> import numpy as np
>>> rng=np.random.RandomState(0)
>>> X1 = rng.random((10,5))
>>> X2 = rng.random((10,5))
>>> model = SCCA_Parkhomenko(c=[0.001,0.001],random_state=0)
>>> model.fit((X1,X2)).score((X1,X2))
array([0.81803527])

Constructor for ParkhomenkoCCA

Parameters

latent_dims – number of latent dimensions to fit
scale – normalize variance in each column before fitting
centre – demean data by column before fitting (and before transforming out of sample
copy_data – If True, views will be copied; else, it may be overwritten
random_state – Pass for reproducible output across multiple function calls
c – l1 regularisation parameter
max_iter – the maximum number of iterations to perform in the inner optimization loop
initialization – either string from “pls”, “cca”, “random”, “uniform” or callable to initialize the score variables for _iterative methods
tol – tolerance value used for early stopping

fit(views: Iterable[ndarray], y=None, **kwargs)

Fits the model by running an inner loop to convergence and then using either CCA or PLS deflation

Parameters: views – list/tuple of numpy arrays or array likes with the same number of rows (samples)

fit_transform(views: Iterable[ndarray], **kwargs)

Fits and then transforms the training data

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

get_loadings(views: Iterable[ndarray], normalize=False, **kwargs)

Returns the model loadings for each view for the given data

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model
normalize – scales loadings to ensure that they represent correlations between features and scores

get_params(deep=True)

Get parameters for this estimator.

Parameters: deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns: params – Parameter names mapped to their values.
Return type: dict

pairwise_correlations(views: Iterable[ndarray], **kwargs)

Predicts the correlations between each view for each dimension for the given data using the fit model

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

Returns

all_corrs: an array of the pairwise correlations (k,k,self.latent_dims) where k is the number of views

score(views: Iterable[ndarray], y=None, **kwargs)

Returns average correlation in each dimension (averages over all pairs for multiview)

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
y – unused but needed to integrate with scikit-learn

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters: **params (dict) – Estimator parameters.
Returns: self – Estimator instance.
Return type: estimator instance

transform(views: Iterable[ndarray], **kwargs)

Transforms data given a fit model

Parameters

views – numpy arrays with the same number of rows (samples) separated by commas
kwargs – any additional keyword arguments required by the given model

class cca_zoo.models.SCCA_IPLS(latent_dims: int = 1, scale: bool = True, centre=True, copy_data=True, random_state=None, deflation='cca', c: Optional[Union[Iterable[float], float]] = None, max_iter: int = 100, maxvar: bool = False, initialization: Union[str, callable] = 'pls', tol: float = 1e-09, stochastic=False, positive: Optional[Union[Iterable[bool], bool]] = None, verbose=0)[source]

Bases: ElasticCCA

Fits a sparse CCA model by _iterative rescaled lasso regression. Implemented by ElasticCCA with l1 ratio=1

For default maxvar=False, the optimisation is given by:

Maths

\[ \begin{align}\begin{aligned}\begin{split}w_{opt}=\underset{w}{\mathrm{argmax}}\{\sum_i\sum_{j\neq i} \|X_iw_i-X_jw_j\|^2 + \text{l1_ratio}\|w_i\|_1\}\\\end{split}\\\text{subject to:}\\w_i^TX_i^TX_iw_i=n\end{aligned}\end{align} \]

Citation

Mai, Qing, and Xin Zhang. “An _iterative penalized least squares approach to sparse canonical correlation analysis.” Biometrics 75.3 (2019): 734-744.

For maxvar=True, the optimisation is given by the ElasticCCA problem with no l2 regularisation:

Maths

\[ \begin{align}\begin{aligned}\begin{split}w_{opt}, t_{opt}=\underset{w,t}{\mathrm{argmax}}\{\sum_i \|X_iw_i-t\|^2 + \text{l1_ratio}\|w_i\|_1\}\\\end{split}\\\text{subject to:}\\t^Tt=n\end{aligned}\end{align} \]

Citation

Fu, Xiao, et al. “Scalable and flexible multiview MAX-VAR canonical correlation analysis.” IEEE Transactions on Signal Processing 65.16 (2017): 4150-4165.

Example

>>> from cca_zoo.models import SCCA_IPLS
>>> import numpy as np
>>> rng=np.random.RandomState(0)
>>> X1 = rng.random((10,5))
>>> X2 = rng.random((10,5))
>>> model = SCCA_IPLS(c=[0.001,0.001], random_state=0)
>>> model.fit((X1,X2)).score((X1,X2))
array([0.99998761])

Constructor for SCCA_IPLS

Parameters

latent_dims – number of latent dimensions to fit
scale – normalize variance in each column before fitting
centre – demean data by column before fitting (and before transforming out of sample
copy_data – If True, views will be copied; else, it may be overwritten
random_state – Pass for reproducible output across multiple function calls
max_iter – the maximum number of iterations to perform in the inner optimization loop
maxvar – use auxiliary variable “maxvar” form
initialization – either string from “pls”, “cca”, “random”, “uniform” or callable to initialize the score variables for _iterative methods
tol – tolerance value used for early stopping
c – lasso alpha
stochastic – use _stochastic regression optimisers for subproblems
positive – constrain model weights to be positive

fit(views: Iterable[ndarray], y=None, **kwargs)

Fits the model by running an inner loop to convergence and then using either CCA or PLS deflation

Parameters: views – list/tuple of numpy arrays or array likes with the same number of rows (samples)

fit_transform(views: Iterable[ndarray], **kwargs)

Fits and then transforms the training data

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

get_loadings(views: Iterable[ndarray], normalize=False, **kwargs)

Returns the model loadings for each view for the given data

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model
normalize – scales loadings to ensure that they represent correlations between features and scores

get_params(deep=True)

Get parameters for this estimator.

Parameters: deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns: params – Parameter names mapped to their values.
Return type: dict

pairwise_correlations(views: Iterable[ndarray], **kwargs)

Predicts the correlations between each view for each dimension for the given data using the fit model

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

Returns

all_corrs: an array of the pairwise correlations (k,k,self.latent_dims) where k is the number of views

score(views: Iterable[ndarray], y=None, **kwargs)

Returns average correlation in each dimension (averages over all pairs for multiview)

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
y – unused but needed to integrate with scikit-learn

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters: **params (dict) – Estimator parameters.
Returns: self – Estimator instance.
Return type: estimator instance

transform(views: Iterable[ndarray], **kwargs)

Transforms data given a fit model

Parameters

views – numpy arrays with the same number of rows (samples) separated by commas
kwargs – any additional keyword arguments required by the given model

class cca_zoo.models.SCCA_ADMM(latent_dims: int = 1, scale: bool = True, centre=True, copy_data=True, random_state=None, deflation='cca', c: Optional[Union[Iterable[float], float]] = None, mu: Optional[Union[Iterable[float], float]] = None, lam: Optional[Union[Iterable[float], float]] = None, eta: Optional[Union[Iterable[float], float]] = None, max_iter: int = 100, initialization: Union[str, callable] = 'pls', tol: float = 1e-09, verbose=0)[source]

Bases: _BaseIterative

Fits a sparse CCA model by alternating ADMM

\[ \begin{align}\begin{aligned}\begin{split}w_{opt}=\underset{w}{\mathrm{argmax}}\{\sum_i\sum_{j\neq i} \|X_iw_i-X_jw_j\|^2 + \|w_i\|_1\}\\\end{split}\\\text{subject to:}\\w_i^TX_i^TX_iw_i=1\end{aligned}\end{align} \]

Citation

Suo, Xiaotong, et al. “Sparse canonical correlation analysis.” arXiv preprint arXiv:1705.10865 (2017).

Example

>>> from cca_zoo.models import SCCA_ADMM
>>> import numpy as np
>>> rng=np.random.RandomState(0)
>>> X1 = rng.random((10,5))
>>> X2 = rng.random((10,5))
>>> model = SCCA_ADMM(random_state=0,c=[1e-1,1e-1])
>>> model.fit((X1,X2)).score((X1,X2))
array([0.84348183])

Constructor for SCCA_ADMM

Parameters

latent_dims – number of latent dimensions to fit
scale – normalize variance in each column before fitting
centre – demean data by column before fitting (and before transforming out of sample
copy_data – If True, views will be copied; else, it may be overwritten
random_state – Pass for reproducible output across multiple function calls
c – l1 regularisation parameter
max_iter – the maximum number of iterations to perform in the inner optimization loop
initialization – either string from “pls”, “cca”, “random”, “uniform” or callable to initialize the score variables for _iterative methods
tol – tolerance value used for early stopping
mu –
lam –

Param

eta:

fit(views: Iterable[ndarray], y=None, **kwargs)

Fits the model by running an inner loop to convergence and then using either CCA or PLS deflation

Parameters: views – list/tuple of numpy arrays or array likes with the same number of rows (samples)

fit_transform(views: Iterable[ndarray], **kwargs)

Fits and then transforms the training data

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

get_loadings(views: Iterable[ndarray], normalize=False, **kwargs)

Returns the model loadings for each view for the given data

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model
normalize – scales loadings to ensure that they represent correlations between features and scores

get_params(deep=True)

Get parameters for this estimator.

Parameters: deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns: params – Parameter names mapped to their values.
Return type: dict

pairwise_correlations(views: Iterable[ndarray], **kwargs)

Predicts the correlations between each view for each dimension for the given data using the fit model

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

Returns

all_corrs: an array of the pairwise correlations (k,k,self.latent_dims) where k is the number of views

score(views: Iterable[ndarray], y=None, **kwargs)

Returns average correlation in each dimension (averages over all pairs for multiview)

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
y – unused but needed to integrate with scikit-learn

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters: **params (dict) – Estimator parameters.
Returns: self – Estimator instance.
Return type: estimator instance

transform(views: Iterable[ndarray], **kwargs)

Transforms data given a fit model

Parameters

views – numpy arrays with the same number of rows (samples) separated by commas
kwargs – any additional keyword arguments required by the given model

class cca_zoo.models.SCCA_Span(latent_dims: int = 1, scale: bool = True, centre=True, copy_data=True, max_iter: int = 100, initialization: str = 'uniform', tol: float = 1e-09, regularisation='l0', c: Optional[Union[Iterable[Union[float, int]], float, int]] = None, rank=1, positive: Optional[Union[Iterable[bool], bool]] = None, random_state=None, deflation='cca', verbose=0)[source]

Bases: _BaseIterative

Fits a Sparse CCA model using SpanCCA.

\[ \begin{align}\begin{aligned}\begin{split}w_{opt}=\underset{w}{\mathrm{argmax}}\{\sum_i\sum_{j\neq i} \|X_iw_i-X_jw_j\|^2 + \text{l1_ratio}\|w_i\|_1\}\\\end{split}\\\text{subject to:}\\w_i^TX_i^TX_iw_i=1\end{aligned}\end{align} \]

Citation

Asteris, Megasthenis, et al. “A simple and provable algorithm for sparse diagonal CCA.” International Conference on Machine Learning. PMLR, 2016.

Example

>>> from cca_zoo.models import SCCA_Span
>>> import numpy as np
>>> rng=np.random.RandomState(0)
>>> X1 = rng.random((10,5))
>>> X2 = rng.random((10,5))
>>> model = SCCA_Span(regularisation="l0", c=[2, 2])
>>> model.fit((X1,X2)).score((X1,X2))
array([0.84556666])

Parameters

latent_dims – number of latent dimensions to fit
scale – normalize variance in each column before fitting
centre – demean data by column before fitting (and before transforming out of sample
copy_data – If True, views will be copied; else, it may be overwritten
random_state – Pass for reproducible output across multiple function calls
max_iter – the maximum number of iterations to perform in the inner optimization loop
initialization – either string from “pls”, “cca”, “random”, “uniform” or callable to initialize the score variables for _iterative methods
tol – tolerance value used for early stopping
regularisation –
c – regularisation parameter
rank – rank of the approximation
positive – constrain weights to be positive

fit(views: Iterable[ndarray], y=None, **kwargs)

Fits the model by running an inner loop to convergence and then using either CCA or PLS deflation

Parameters: views – list/tuple of numpy arrays or array likes with the same number of rows (samples)

fit_transform(views: Iterable[ndarray], **kwargs)

Fits and then transforms the training data

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

get_loadings(views: Iterable[ndarray], normalize=False, **kwargs)

Returns the model loadings for each view for the given data

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model
normalize – scales loadings to ensure that they represent correlations between features and scores

get_params(deep=True)

Get parameters for this estimator.

Parameters: deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns: params – Parameter names mapped to their values.
Return type: dict

pairwise_correlations(views: Iterable[ndarray], **kwargs)

Predicts the correlations between each view for each dimension for the given data using the fit model

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

Returns

all_corrs: an array of the pairwise correlations (k,k,self.latent_dims) where k is the number of views

score(views: Iterable[ndarray], y=None, **kwargs)

Returns average correlation in each dimension (averages over all pairs for multiview)

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
y – unused but needed to integrate with scikit-learn

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters: **params (dict) – Estimator parameters.
Returns: self – Estimator instance.
Return type: estimator instance

transform(views: Iterable[ndarray], **kwargs)

Transforms data given a fit model

Parameters

views – numpy arrays with the same number of rows (samples) separated by commas
kwargs – any additional keyword arguments required by the given model

class cca_zoo.models.SWCCA(latent_dims: int = 1, scale: bool = True, centre=True, copy_data=True, random_state=None, max_iter: int = 500, initialization: str = 'random', tol: float = 1e-09, regularisation='l0', c: Optional[Union[Iterable[Union[float, int]], float, int]] = None, sample_support=None, positive=False, verbose=0)[source]

Bases: _BaseIterative

A class used to fit SWCCA model

Citation

Example

>>> from cca_zoo.models import SWCCA
>>> import numpy as np
>>> rng=np.random.RandomState(0)
>>> X1 = rng.random((10,5))
>>> X2 = rng.random((10,5))
>>> model = SWCCA(regularisation='l0',c=[2, 2], sample_support=5, random_state=0)
>>> model.fit((X1,X2)).score((X1,X2))
array([0.61620969])

Parameters

latent_dims – number of latent dimensions to fit
scale – normalize variance in each column before fitting
centre – demean data by column before fitting (and before transforming out of sample
copy_data – If True, views will be copied; else, it may be overwritten
random_state – Pass for reproducible output across multiple function calls
max_iter – the maximum number of iterations to perform in the inner optimization loop
initialization – either string from “pls”, “cca”, “random”, “uniform” or callable to initialize the score variables for _iterative methods
tol – tolerance value used for early stopping
regularisation – the type of regularisation on the weights either ‘l0’ or ‘l1’
c – regularisation parameter
sample_support – the l0 norm of the sample weights
positive – constrain weights to be positive

fit(views: Iterable[ndarray], y=None, **kwargs)

Fits the model by running an inner loop to convergence and then using either CCA or PLS deflation

Parameters: views – list/tuple of numpy arrays or array likes with the same number of rows (samples)

fit_transform(views: Iterable[ndarray], **kwargs)

Fits and then transforms the training data

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

get_loadings(views: Iterable[ndarray], normalize=False, **kwargs)

Returns the model loadings for each view for the given data

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model
normalize – scales loadings to ensure that they represent correlations between features and scores

get_params(deep=True)

Get parameters for this estimator.

Parameters: deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns: params – Parameter names mapped to their values.
Return type: dict

pairwise_correlations(views: Iterable[ndarray], **kwargs)

Predicts the correlations between each view for each dimension for the given data using the fit model

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

Returns

all_corrs: an array of the pairwise correlations (k,k,self.latent_dims) where k is the number of views

score(views: Iterable[ndarray], y=None, **kwargs)

Returns average correlation in each dimension (averages over all pairs for multiview)

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
y – unused but needed to integrate with scikit-learn

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters: **params (dict) – Estimator parameters.
Returns: self – Estimator instance.
Return type: estimator instance

transform(views: Iterable[ndarray], **kwargs)

Transforms data given a fit model

Parameters

views – numpy arrays with the same number of rows (samples) separated by commas
kwargs – any additional keyword arguments required by the given model

class cca_zoo.models.MCCA(latent_dims: int = 1, scale: bool = True, centre=True, copy_data=True, random_state=None, c: Optional[Union[Iterable[float], float]] = None, eps=1e-09)[source]

Bases: rCCA

A class used to fit MCCA model. For more than 2 views, MCCA optimizes the sum of pairwise correlations.

Maths

\[ \begin{align}\begin{aligned}\begin{split}w_{opt}=\underset{w}{\mathrm{argmax}}\{\sum_i\sum_{j\neq i} w_i^TX_i^TX_jw_j \}\\\end{split}\\\text{subject to:}\\(1-c_i)w_i^TX_i^TX_iw_i+c_iw_i^Tw_i=1\end{aligned}\end{align} \]

Citation

Kettenring, Jon R. “Canonical analysis of several sets of variables.” Biometrika 58.3 (1971): 433-451.

Example

>>> from cca_zoo.models import MCCA
>>> import numpy as np
>>> rng=np.random.RandomState(0)
>>> X1 = rng.random((10,5))
>>> X2 = rng.random((10,5))
>>> X3 = rng.random((10,5))
>>> model = MCCA()
>>> model.fit((X1,X2,X3)).score((X1,X2,X3))
array([0.97200847])

Constructor for MCCA

Parameters

latent_dims – number of latent dimensions to fit
scale – normalize variance in each column before fitting
centre – demean data by column before fitting (and before transforming out of sample
copy_data – If True, views will be copied; else, it may be overwritten
random_state – Pass for reproducible output across multiple function calls
c – Iterable of regularisation parameters for each view (between 0:CCA and 1:PLS)
eps – epsilon for stability

fit(views: Iterable[ndarray], y=None, **kwargs)

Fits a regularised CCA (canonical ridge) model

Parameters: views – list/tuple of numpy arrays or array likes with the same number of rows (samples)

fit_transform(views: Iterable[ndarray], **kwargs)

Fits and then transforms the training data

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

get_loadings(views: Iterable[ndarray], normalize=False, **kwargs)

Returns the model loadings for each view for the given data

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model
normalize – scales loadings to ensure that they represent correlations between features and scores

get_params(deep=True)

Get parameters for this estimator.

Parameters: deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns: params – Parameter names mapped to their values.
Return type: dict

pairwise_correlations(views: Iterable[ndarray], **kwargs)

Predicts the correlations between each view for each dimension for the given data using the fit model

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

Returns

all_corrs: an array of the pairwise correlations (k,k,self.latent_dims) where k is the number of views

score(views: Iterable[ndarray], y=None, **kwargs)

Returns average correlation in each dimension (averages over all pairs for multiview)

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
y – unused but needed to integrate with scikit-learn

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters: **params (dict) – Estimator parameters.
Returns: self – Estimator instance.
Return type: estimator instance

transform(views: Iterable[ndarray], **kwargs)

Transforms data given a fit model

Parameters

views – numpy arrays with the same number of rows (samples) separated by commas
kwargs – any additional keyword arguments required by the given model

class cca_zoo.models.KCCA(latent_dims: int = 1, scale: bool = True, centre=True, copy_data=True, random_state=None, c: Optional[Union[Iterable[float], float]] = None, eps=0.001, kernel: Optional[Iterable[Union[float, callable]]] = None, gamma: Optional[Iterable[float]] = None, degree: Optional[Iterable[float]] = None, coef0: Optional[Iterable[float]] = None, kernel_params: Optional[Iterable[dict]] = None)[source]

Bases: MCCA

A class used to fit KCCA model.

Maths

\[ \begin{align}\begin{aligned}\begin{split}\alpha_{opt}=\underset{\alpha}{\mathrm{argmax}}\{\sum_i\sum_{j\neq i} \alpha_i^TK_i^TK_j\alpha_j \}\\\end{split}\\\text{subject to:}\\c_i\alpha_i^TK_i\alpha_i + (1-c_i)\alpha_i^TK_i^TK_i\alpha_i=1\end{aligned}\end{align} \]

Example

>>> from cca_zoo.models import KCCA
>>> import numpy as np
>>> rng=np.random.RandomState(0)
>>> X1 = rng.random((10,5))
>>> X2 = rng.random((10,5))
>>> X3 = rng.random((10,5))
>>> model = KCCA()
>>> model.fit((X1,X2,X3)).score((X1,X2,X3))
array([0.96893666])

Parameters

latent_dims – number of latent dimensions to fit
scale – normalize variance in each column before fitting
centre – demean data by column before fitting (and before transforming out of sample
copy_data – If True, views will be copied; else, it may be overwritten
random_state – Pass for reproducible output across multiple function calls
c – Iterable of regularisation parameters for each view (between 0:CCA and 1:PLS)
eps – epsilon for stability
kernel – Iterable of kernel mappings used internally. This parameter is directly passed to pairwise_kernel. If element of kernel is a string, it must be one of the metrics in pairwise.PAIRWISE_KERNEL_FUNCTIONS. Alternatively, if element of kernel is a callable function, it is called on each pair of instances (rows) and the resulting value recorded. The callable should take two rows from views as input and return the corresponding kernel value as a single number. This means that callables from sklearn.metrics.pairwise are not allowed, as they operate on matrices, not single samples. Use the string identifying the kernel instead.
gamma – Iterable of gamma parameters for the RBF, laplacian, polynomial, exponential chi2 and sigmoid kernels. Interpretation of the default value is left to the kernel; see the documentation for sklearn.metrics.pairwise. Ignored by other kernels.
degree – Iterable of degree parameters of the polynomial kernel. Ignored by other kernels.
coef0 – Iterable of zero coefficients for polynomial and sigmoid kernels. Ignored by other kernels.
kernel_params – Iterable of additional parameters (keyword arguments) for kernel function passed as callable object.
eps – epsilon value to ensure stability of smallest eigenvalues

transform(views: ndarray, **kwargs)[source]

Transforms data given a fit KCCA model

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

fit(views: Iterable[ndarray], y=None, **kwargs)

Fits a regularised CCA (canonical ridge) model

Parameters: views – list/tuple of numpy arrays or array likes with the same number of rows (samples)

fit_transform(views: Iterable[ndarray], **kwargs)

Fits and then transforms the training data

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

get_loadings(views: Iterable[ndarray], normalize=False, **kwargs)

Returns the model loadings for each view for the given data

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model
normalize – scales loadings to ensure that they represent correlations between features and scores

get_params(deep=True)

Get parameters for this estimator.

Parameters: deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns: params – Parameter names mapped to their values.
Return type: dict

pairwise_correlations(views: Iterable[ndarray], **kwargs)

Predicts the correlations between each view for each dimension for the given data using the fit model

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

Returns

all_corrs: an array of the pairwise correlations (k,k,self.latent_dims) where k is the number of views

score(views: Iterable[ndarray], y=None, **kwargs)

Returns average correlation in each dimension (averages over all pairs for multiview)

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
y – unused but needed to integrate with scikit-learn

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters: **params (dict) – Estimator parameters.
Returns: self – Estimator instance.
Return type: estimator instance

class cca_zoo.models.NCCA(latent_dims: int = 1, scale=True, centre=True, copy_data=True, accept_sparse=False, random_state: Optional[Union[int, RandomState]] = None, nearest_neighbors=None, gamma: Optional[Iterable[float]] = None)[source]

Bases: _BaseCCA

A class used to fit nonparametric (NCCA) model.

Citation

Michaeli, Tomer, Weiran Wang, and Karen Livescu. “Nonparametric canonical correlation analysis.” International conference on machine learning. PMLR, 2016.

Example

>>> from cca_zoo.models import NCCA
>>> X1 = np.random.rand(10,5)
>>> X2 = np.random.rand(10,5)
>>> model = NCCA()
>>> model._fit((X1,X2)).score((X1,X2))
array([1.])

Constructor for NCCA

Parameters

latent_dims – number of latent dimensions to fit
scale – normalize variance in each column before fitting
centre – demean data by column before fitting (and before transforming out of sample
copy_data – If True, views will be copied; else, it may be overwritten
accept_sparse – Whether model can take sparse data as input
random_state – Pass for reproducible output across multiple function calls
nearest_neighbors – Number of nearest neighbors (l2 distance) to consider when constructing affinity
gamma – Bandwidth parameter for rbf kernel

fit(views: Iterable[ndarray], y=None, **kwargs)[source]

Fits a given model

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
y – unused but needed to integrate with scikit-learn

transform(views: Iterable[ndarray], **kwargs)[source]

Transforms data given a fit model

Parameters

views – numpy arrays with the same number of rows (samples) separated by commas
kwargs – any additional keyword arguments required by the given model

fit_transform(views: Iterable[ndarray], **kwargs)

Fits and then transforms the training data

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

get_loadings(views: Iterable[ndarray], normalize=False, **kwargs)

Returns the model loadings for each view for the given data

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model
normalize – scales loadings to ensure that they represent correlations between features and scores

get_params(deep=True)

Get parameters for this estimator.

Parameters: deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns: params – Parameter names mapped to their values.
Return type: dict

pairwise_correlations(views: Iterable[ndarray], **kwargs)

Predicts the correlations between each view for each dimension for the given data using the fit model

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

Returns

all_corrs: an array of the pairwise correlations (k,k,self.latent_dims) where k is the number of views

score(views: Iterable[ndarray], y=None, **kwargs)

Returns average correlation in each dimension (averages over all pairs for multiview)

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
y – unused but needed to integrate with scikit-learn

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters: **params (dict) – Estimator parameters.
Returns: self – Estimator instance.
Return type: estimator instance

class cca_zoo.models.PartialCCA(latent_dims: int = 1, scale: bool = True, centre=True, copy_data=True, random_state=None, c: Optional[Union[Iterable[float], float]] = None, eps=0.001)[source]

Bases: MCCA

A class used to fit a partial cca model. The key difference between this and a vanilla CCA or MCCA is that the canonical score vectors must be orthogonal to the supplied confounding variables.

Citation

Rao, B. Raja. “Partial canonical correlations.” Trabajos de estadistica y de investigación operativa 20.2-3 (1969): 211-219.

Example

>>> from cca_zoo.models import PartialCCA
>>> X1 = np.random.rand(10,5)
>>> X2 = np.random.rand(10,5)
>>> partials = np.random.rand(10,3)
>>> model = PartialCCA()
>>> model._fit((X1,X2),partials=partials).score((X1,X2),partials=partials)
array([0.99993046])

Constructor for Partial CCA

Parameters

latent_dims – number of latent dimensions to fit
scale – normalize variance in each column before fitting
centre – demean data by column before fitting (and before transforming out of sample
copy_data – If True, views will be copied; else, it may be overwritten
random_state – Pass for reproducible output across multiple function calls
c – Iterable of regularisation parameters for each view (between 0:CCA and 1:PLS)
eps – epsilon for stability

transform(views: Iterable[ndarray], partials=None, **kwargs)[source]

Transforms data given a fit model

Parameters: views – numpy arrays with the same number of rows (samples) separated by commas

fit(views: Iterable[ndarray], y=None, **kwargs)

Fits a regularised CCA (canonical ridge) model

Parameters: views – list/tuple of numpy arrays or array likes with the same number of rows (samples)

fit_transform(views: Iterable[ndarray], **kwargs)

Fits and then transforms the training data

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

get_loadings(views: Iterable[ndarray], normalize=False, **kwargs)

Returns the model loadings for each view for the given data

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model
normalize – scales loadings to ensure that they represent correlations between features and scores

get_params(deep=True)

Get parameters for this estimator.

Parameters: deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns: params – Parameter names mapped to their values.
Return type: dict

pairwise_correlations(views: Iterable[ndarray], **kwargs)

Predicts the correlations between each view for each dimension for the given data using the fit model

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

Returns

all_corrs: an array of the pairwise correlations (k,k,self.latent_dims) where k is the number of views

score(views: Iterable[ndarray], y=None, **kwargs)

Returns average correlation in each dimension (averages over all pairs for multiview)

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
y – unused but needed to integrate with scikit-learn

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters: **params (dict) – Estimator parameters.
Returns: self – Estimator instance.
Return type: estimator instance

class cca_zoo.models.rCCA(latent_dims: int = 1, scale: bool = True, centre=True, copy_data=True, random_state=None, c: Optional[Union[Iterable[float], float]] = None, eps=0.001, accept_sparse=None)[source]

Bases: _BaseCCA

A class used to fit Regularised CCA (canonical ridge) model. Uses PCA to perform the optimization efficiently for high dimensional data.

Maths

\[ \begin{align}\begin{aligned}\begin{split}w_{opt}=\underset{w}{\mathrm{argmax}}\{ w_1^TX_1^TX_2w_2 \}\\\end{split}\\\text{subject to:}\\(1-c_1)w_1^TX_1^TX_1w_1+c_1w_1^Tw_1=n\\(1-c_2)w_2^TX_2^TX_2w_2+c_2w_2^Tw_2=n\end{aligned}\end{align} \]

Citation

Vinod, Hrishikesh D. “Canonical ridge and econometrics of joint production.” Journal of econometrics 4.2 (1976): 147-166.

Example

>>> from cca_zoo.models import rCCA
>>> import numpy as np
>>> rng=np.random.RandomState(0)
>>> X1 = rng.random((10,5))
>>> X2 = rng.random((10,5))
>>> model = rCCA(c=[0.1,0.1])
>>> model._fit((X1,X2)).score((X1,X2))
array([0.95222128])

Constructor for rCCA

Parameters

latent_dims – number of latent dimensions to fit
scale – normalize variance in each column before fitting
centre – demean data by column before fitting (and before transforming out of sample
copy_data – If True, views will be copied; else, it may be overwritten
random_state – Pass for reproducible output across multiple function calls
c – Iterable of regularisation parameters for each view (between 0:CCA and 1:PLS)
eps – epsilon for stability
accept_sparse – which forms are accepted for sparse data

fit(views: Iterable[ndarray], y=None, **kwargs)[source]

Fits a regularised CCA (canonical ridge) model

Parameters: views – list/tuple of numpy arrays or array likes with the same number of rows (samples)

fit_transform(views: Iterable[ndarray], **kwargs)

Fits and then transforms the training data

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

get_loadings(views: Iterable[ndarray], normalize=False, **kwargs)

Returns the model loadings for each view for the given data

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model
normalize – scales loadings to ensure that they represent correlations between features and scores

get_params(deep=True)

Get parameters for this estimator.

Parameters: deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns: params – Parameter names mapped to their values.
Return type: dict

pairwise_correlations(views: Iterable[ndarray], **kwargs)

Predicts the correlations between each view for each dimension for the given data using the fit model

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

Returns

all_corrs: an array of the pairwise correlations (k,k,self.latent_dims) where k is the number of views

score(views: Iterable[ndarray], y=None, **kwargs)

Returns average correlation in each dimension (averages over all pairs for multiview)

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
y – unused but needed to integrate with scikit-learn

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters: **params (dict) – Estimator parameters.
Returns: self – Estimator instance.
Return type: estimator instance

transform(views: Iterable[ndarray], **kwargs)

Transforms data given a fit model

Parameters

views – numpy arrays with the same number of rows (samples) separated by commas
kwargs – any additional keyword arguments required by the given model

class cca_zoo.models.CCA(latent_dims: int = 1, scale: bool = True, centre=True, copy_data=True, random_state=None)[source]

Bases: rCCA

A class used to fit a simple CCA model

Implements CCA by inheriting regularised CCA with 0 regularisation

Maths

\[ \begin{align}\begin{aligned}\begin{split}w_{opt}=\underset{w}{\mathrm{argmax}}\{ w_1^TX_1^TX_2w_2 \}\\\end{split}\\\text{subject to:}\\w_1^TX_1^TX_1w_1=n\\w_2^TX_2^TX_2w_2=n\end{aligned}\end{align} \]

Citation

Hotelling, Harold. “Relations between two sets of variates.” Breakthroughs in statistics. Springer, New York, NY, 1992. 162-190.

Example

>>> from cca_zoo.models import CCA
>>> import numpy as np
>>> rng=np.random.RandomState(0)
>>> X1 = rng.random((10,5))
>>> X2 = rng.random((10,5))
>>> model = CCA()
>>> model._fit((X1,X2)).score((X1,X2))
array([1.])

Constructor for CCA

Parameters

latent_dims – number of latent dimensions to fit
scale – normalize variance in each column before fitting
centre – demean data by column before fitting (and before transforming out of sample
copy_data – If True, views will be copied; else, it may be overwritten
random_state – Pass for reproducible output across multiple function calls

fit(views: Iterable[ndarray], y=None, **kwargs)

Fits a regularised CCA (canonical ridge) model

Parameters: views – list/tuple of numpy arrays or array likes with the same number of rows (samples)

fit_transform(views: Iterable[ndarray], **kwargs)

Fits and then transforms the training data

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

get_loadings(views: Iterable[ndarray], normalize=False, **kwargs)

Returns the model loadings for each view for the given data

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model
normalize – scales loadings to ensure that they represent correlations between features and scores

get_params(deep=True)

Get parameters for this estimator.

Parameters: deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns: params – Parameter names mapped to their values.
Return type: dict

pairwise_correlations(views: Iterable[ndarray], **kwargs)

Predicts the correlations between each view for each dimension for the given data using the fit model

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

Returns

all_corrs: an array of the pairwise correlations (k,k,self.latent_dims) where k is the number of views

score(views: Iterable[ndarray], y=None, **kwargs)

Returns average correlation in each dimension (averages over all pairs for multiview)

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
y – unused but needed to integrate with scikit-learn

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters: **params (dict) – Estimator parameters.
Returns: self – Estimator instance.
Return type: estimator instance

transform(views: Iterable[ndarray], **kwargs)

Transforms data given a fit model

Parameters

views – numpy arrays with the same number of rows (samples) separated by commas
kwargs – any additional keyword arguments required by the given model

class cca_zoo.models.PLS(latent_dims: int = 1, scale: bool = True, centre=True, copy_data=True, random_state=None)[source]

Bases: rCCA

A class used to fit a simple PLS model

Implements PLS by inheriting regularised CCA with maximal regularisation

Maths

\[ \begin{align}\begin{aligned}\begin{split}w_{opt}=\underset{w}{\mathrm{argmax}}\{ w_1^TX_1^TX_2w_2 \}\\\end{split}\\\text{subject to:}\\w_1^Tw_1=1\\w_2^Tw_2=1\end{aligned}\end{align} \]

Example

>>> from cca_zoo.models import PLS
>>> import numpy as np
>>> rng=np.random.RandomState(0)
>>> X1 = rng.random((10,5))
>>> X2 = rng.random((10,5))
>>> model = PLS()
>>> model._fit((X1,X2)).score((X1,X2))
array([0.81796873])

Constructor for PLS

Parameters

latent_dims – number of latent dimensions to fit
scale – normalize variance in each column before fitting
centre – demean data by column before fitting (and before transforming out of sample
copy_data – If True, views will be copied; else, it may be overwritten
random_state – Pass for reproducible output across multiple function calls

fit(views: Iterable[ndarray], y=None, **kwargs)

Fits a regularised CCA (canonical ridge) model

Parameters: views – list/tuple of numpy arrays or array likes with the same number of rows (samples)

fit_transform(views: Iterable[ndarray], **kwargs)

Fits and then transforms the training data

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

get_loadings(views: Iterable[ndarray], normalize=False, **kwargs)

Returns the model loadings for each view for the given data

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model
normalize – scales loadings to ensure that they represent correlations between features and scores

get_params(deep=True)

Get parameters for this estimator.

Parameters: deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns: params – Parameter names mapped to their values.
Return type: dict

pairwise_correlations(views: Iterable[ndarray], **kwargs)

Predicts the correlations between each view for each dimension for the given data using the fit model

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

Returns

all_corrs: an array of the pairwise correlations (k,k,self.latent_dims) where k is the number of views

score(views: Iterable[ndarray], y=None, **kwargs)

Returns average correlation in each dimension (averages over all pairs for multiview)

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
y – unused but needed to integrate with scikit-learn

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters: **params (dict) – Estimator parameters.
Returns: self – Estimator instance.
Return type: estimator instance

transform(views: Iterable[ndarray], **kwargs)

Transforms data given a fit model

Parameters

views – numpy arrays with the same number of rows (samples) separated by commas
kwargs – any additional keyword arguments required by the given model

class cca_zoo.models.TCCA(latent_dims: int = 1, scale=True, centre=True, copy_data=True, random_state=None, c: Optional[Union[Iterable[float], float]] = None)[source]

Bases: _BaseCCA

Fits a Tensor CCA model. Tensor CCA maximises higher order correlations

Maths

\[ \begin{align}\begin{aligned}\begin{split}\alpha_{opt}=\underset{\alpha}{\mathrm{argmax}}\{\sum_i\sum_{j\neq i} \alpha_i^TK_i^TK_j\alpha_j \}\\\end{split}\\\text{subject to:}\\\alpha_i^TK_i^TK_i\alpha_i=1\end{aligned}\end{align} \]

Citation

Kim, Tae-Kyun, Shu-Fai Wong, and Roberto Cipolla. “Tensor canonical correlation analysis for action classification.” 2007 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2007 https://github.com/rciszek/mdr_tcca

Example

>>> from cca_zoo.models import TCCA
>>> rng=np.random.RandomState(0)
>>> X1 = rng.random((10,5))
>>> X2 = rng.random((10,5))
>>> X3 = rng.random((10,5))
>>> model = TCCA()
>>> model._fit((X1,X2,X3)).score((X1,X2,X3))
array([1.14595755])

Constructor for TCCA

Parameters

latent_dims – number of latent dimensions to fit
scale – normalize variance in each column before fitting
centre – demean data by column before fitting (and before transforming out of sample
copy_data – If True, views will be copied; else, it may be overwritten
random_state – Pass for reproducible output across multiple function calls
c – Iterable of regularisation parameters for each view (between 0:CCA and 1:PLS)

fit(views: Iterable[ndarray], y=None, **kwargs)[source]

Parameters: views – list/tuple of numpy arrays or array likes with the same number of rows (samples)

correlations(views: Iterable[ndarray], **kwargs)[source]

Predicts the correlation for the given data using the fit model

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

score(views: Iterable[ndarray], **kwargs)[source]

Returns the higher order correlations in each dimension

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

fit_transform(views: Iterable[ndarray], **kwargs)

Fits and then transforms the training data

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

get_loadings(views: Iterable[ndarray], normalize=False, **kwargs)

Returns the model loadings for each view for the given data

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model
normalize – scales loadings to ensure that they represent correlations between features and scores

get_params(deep=True)

Get parameters for this estimator.

Parameters: deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns: params – Parameter names mapped to their values.
Return type: dict

pairwise_correlations(views: Iterable[ndarray], **kwargs)

Predicts the correlations between each view for each dimension for the given data using the fit model

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

Returns

all_corrs: an array of the pairwise correlations (k,k,self.latent_dims) where k is the number of views

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters: **params (dict) – Estimator parameters.
Returns: self – Estimator instance.
Return type: estimator instance

transform(views: Iterable[ndarray], **kwargs)

Transforms data given a fit model

Parameters

views – numpy arrays with the same number of rows (samples) separated by commas
kwargs – any additional keyword arguments required by the given model

class cca_zoo.models.KTCCA(latent_dims: int = 1, scale: bool = True, centre=True, copy_data=True, random_state=None, eps=0.001, c: Optional[Union[Iterable[float], float]] = None, kernel: Optional[Iterable[Union[float, callable]]] = None, gamma: Optional[Iterable[float]] = None, degree: Optional[Iterable[float]] = None, coef0: Optional[Iterable[float]] = None, kernel_params: Optional[Iterable[dict]] = None)[source]

Bases: TCCA

Fits a Kernel Tensor CCA model. Tensor CCA maximises higher order correlations

Maths

\[ \begin{align}\begin{aligned}\begin{split}\alpha_{opt}=\underset{\alpha}{\mathrm{argmax}}\{\sum_i\sum_{j\neq i} \alpha_i^TK_i^TK_j\alpha_j \}\\\end{split}\\\text{subject to:}\\\alpha_i^TK_i^TK_i\alpha_i=1\end{aligned}\end{align} \]

Citation

Kim, Tae-Kyun, Shu-Fai Wong, and Roberto Cipolla. “Tensor canonical correlation analysis for action classification.” 2007 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2007

Example

>>> from cca_zoo.models import KTCCA
>>> rng=np.random.RandomState(0)
>>> X1 = rng.random((10,5))
>>> X2 = rng.random((10,5))
>>> X3 = rng.random((10,5))
>>> model = KTCCA()
>>> model._fit((X1,X2,X3)).score((X1,X2,X3))
array([1.69896269])

Constructor for TCCA

Parameters

latent_dims – number of latent dimensions to fit
scale – normalize variance in each column before fitting
centre – demean data by column before fitting (and before transforming out of sample
copy_data – If True, views will be copied; else, it may be overwritten
random_state – Pass for reproducible output across multiple function calls
c – Iterable of regularisation parameters for each view (between 0:CCA and 1:PLS)
kernel – Iterable of kernel mappings used internally. This parameter is directly passed to pairwise_kernel. If element of kernel is a string, it must be one of the metrics in pairwise.PAIRWISE_KERNEL_FUNCTIONS. Alternatively, if element of kernel is a callable function, it is called on each pair of instances (rows) and the resulting value recorded. The callable should take two rows from views as input and return the corresponding kernel value as a single number. This means that callables from sklearn.metrics.pairwise are not allowed, as they operate on matrices, not single samples. Use the string identifying the kernel instead.
gamma – Iterable of gamma parameters for the RBF, laplacian, polynomial, exponential chi2 and sigmoid kernels. Interpretation of the default value is left to the kernel; see the documentation for sklearn.metrics.pairwise. Ignored by other kernels.
degree – Iterable of degree parameters of the polynomial kernel. Ignored by other kernels.
coef0 – Iterable of zero coefficients for polynomial and sigmoid kernels. Ignored by other kernels.
kernel_params – Iterable of additional parameters (keyword arguments) for kernel function passed as callable object.
eps – epsilon value to ensure stability

transform(views: ndarray, **kwargs)[source]

Transforms data given a fit k=KCCA model

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

correlations(views: Iterable[ndarray], **kwargs)

Predicts the correlation for the given data using the fit model

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

fit(views: Iterable[ndarray], y=None, **kwargs)

Parameters: views – list/tuple of numpy arrays or array likes with the same number of rows (samples)

fit_transform(views: Iterable[ndarray], **kwargs)

Fits and then transforms the training data

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

get_loadings(views: Iterable[ndarray], normalize=False, **kwargs)

Returns the model loadings for each view for the given data

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model
normalize – scales loadings to ensure that they represent correlations between features and scores

get_params(deep=True)

Get parameters for this estimator.

Parameters: deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns: params – Parameter names mapped to their values.
Return type: dict

pairwise_correlations(views: Iterable[ndarray], **kwargs)

Predicts the correlations between each view for each dimension for the given data using the fit model

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

Returns

all_corrs: an array of the pairwise correlations (k,k,self.latent_dims) where k is the number of views

score(views: Iterable[ndarray], **kwargs)

Returns the higher order correlations in each dimension

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters: **params (dict) – Estimator parameters.
Returns: self – Estimator instance.
Return type: estimator instance

class cca_zoo.models.StochasticPowerPLS(latent_dims: int = 1, scale: bool = True, centre=True, copy_data=True, random_state=None, accept_sparse=None, batch_size=1, shuffle=False, sampler=None, batch_sampler=None, num_workers=0, pin_memory=False, drop_last=False, timeout=0, worker_init_fn=None, epochs=1, lr=0.01)[source]

Bases: _BaseStochastic

A class used to fit Stochastic PLS

Maths

\[\]

Citation

Arora, Raman, et al. “Stochastic optimization for PCA and PLS.” 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton). IEEE, 2012.

Example

Constructor for StochasticPowerPLS

Parameters

latent_dims – number of latent dimensions to fit
scale – normalize variance in each column before fitting
centre – demean data by column before fitting (and before transforming out of sample
copy_data – If True, views will be copied; else, it may be overwritten
random_state – Pass for reproducible output across multiple function calls
accept_sparse – which forms are accepted for sparse data

fit(views: Iterable[ndarray], y=None, **kwargs)

Fits a regularised CCA (canonical ridge) model

Parameters: views – list/tuple of numpy arrays or array likes with the same number of rows (samples)

fit_transform(views: Iterable[ndarray], **kwargs)

Fits and then transforms the training data

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

get_loadings(views: Iterable[ndarray], normalize=False, **kwargs)

Returns the model loadings for each view for the given data

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model
normalize – scales loadings to ensure that they represent correlations between features and scores

get_params(deep=True)

Get parameters for this estimator.

Parameters: deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns: params – Parameter names mapped to their values.
Return type: dict

pairwise_correlations(views: Iterable[ndarray], **kwargs)

Predicts the correlations between each view for each dimension for the given data using the fit model

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

Returns

all_corrs: an array of the pairwise correlations (k,k,self.latent_dims) where k is the number of views

score(views: Iterable[ndarray], y=None, **kwargs)

Returns average correlation in each dimension (averages over all pairs for multiview)

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
y – unused but needed to integrate with scikit-learn

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters: **params (dict) – Estimator parameters.
Returns: self – Estimator instance.
Return type: estimator instance

transform(views: Iterable[ndarray], **kwargs)

Transforms data given a fit model

Parameters

views – numpy arrays with the same number of rows (samples) separated by commas
kwargs – any additional keyword arguments required by the given model

class cca_zoo.models.IncrementalPLS(latent_dims: int = 1, scale: bool = True, centre=True, copy_data=True, random_state=None, accept_sparse=None, batch_size=1, shuffle=False, sampler=None, batch_sampler=None, num_workers=0, pin_memory=False, drop_last=False, timeout=0, worker_init_fn=None, epochs=1, simple=False, val_interval=10)[source]

Bases: _BaseStochastic

A class used to fit Incremental PLS

Maths

\[\]

Citation

Arora, Raman, et al. “Stochastic optimization for PCA and PLS.” 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton). IEEE, 2012.

Example

Constructor for IncrementalPLS

Parameters

latent_dims – number of latent dimensions to fit
scale – normalize variance in each column before fitting
centre – demean data by column before fitting (and before transforming out of sample
copy_data – If True, views will be copied; else, it may be overwritten
random_state – Pass for reproducible output across multiple function calls
accept_sparse – which forms are accepted for sparse data

fit(views: Iterable[ndarray], y=None, **kwargs)

Fits a regularised CCA (canonical ridge) model

Parameters: views – list/tuple of numpy arrays or array likes with the same number of rows (samples)

fit_transform(views: Iterable[ndarray], **kwargs)

Fits and then transforms the training data

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

get_loadings(views: Iterable[ndarray], normalize=False, **kwargs)

Returns the model loadings for each view for the given data

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model
normalize – scales loadings to ensure that they represent correlations between features and scores

get_params(deep=True)

Get parameters for this estimator.

Parameters: deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns: params – Parameter names mapped to their values.
Return type: dict

pairwise_correlations(views: Iterable[ndarray], **kwargs)

Predicts the correlations between each view for each dimension for the given data using the fit model

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

Returns

all_corrs: an array of the pairwise correlations (k,k,self.latent_dims) where k is the number of views

score(views: Iterable[ndarray], y=None, **kwargs)

Returns average correlation in each dimension (averages over all pairs for multiview)

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
y – unused but needed to integrate with scikit-learn

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters: **params (dict) – Estimator parameters.
Returns: self – Estimator instance.
Return type: estimator instance

transform(views: Iterable[ndarray], **kwargs)

Transforms data given a fit model

Parameters

views – numpy arrays with the same number of rows (samples) separated by commas
kwargs – any additional keyword arguments required by the given model