Models

Regularized Canonical Correlation Analysis and Partial Least Squares

Canonical Correlation Analysis

class cca_zoo.models.rcca.CCA(latent_dims=1, scale=True, centre=True, copy_data=True, random_state=None)[source]

A class used to fit a simple CCA model

Implements CCA by inheriting regularised CCA with 0 regularisation

Maths

\[ \begin{align}\begin{aligned}\begin{split}w_{opt}=\underset{w}{\mathrm{argmax}}\{ w_1^TX_1^TX_2w_2 \}\\\end{split}\\\text{subject to:}\\w_1^TX_1^TX_1w_1=n\\w_2^TX_2^TX_2w_2=n\end{aligned}\end{align} \]

Citation

Hotelling, Harold. “Relations between two sets of variates.” Breakthroughs in statistics. Springer, New York, NY, 1992. 162-190.

Example

>>> from cca_zoo.models import CCA
>>> import numpy as np
>>> rng=np.random.RandomState(0)
>>> X1 = rng.random((10,5))
>>> X2 = rng.random((10,5))
>>> model = CCA()
>>> model.fit((X1,X2)).score((X1,X2))
array([1.])

Constructor for CCA

Parameters

latent_dims (int) – number of latent dimensions to fit
scale (bool) – normalize variance in each column before fitting
centre – demean data by column before fitting (and before transforming out of sample
copy_data – If True, X will be copied; else, it may be overwritten
random_state – Pass for reproducible output across multiple function calls

fit(views, y=None, **kwargs)

Fits a regularised CCA (canonical ridge) model

Parameters: views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)

fit_transform(views, **kwargs)

Fits and then transforms the training data

Parameters

views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

get_loadings(views, normalize=False, **kwargs)

Returns the model loadings for each view for the given data

Parameters

views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model
normalize – scales loadings to ensure that they represent correlations between features and scores

pairwise_correlations(views, **kwargs)

Predicts the correlations between each view for each dimension for the given data using the fit model

Parameters

views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

Returns

all_corrs: an array of the pairwise correlations (k,k,self.latent_dims) where k is the number of views

score(views, y=None, **kwargs)

Returns average correlation in each dimension (averages over all pairs for multiview)

Parameters

views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
y – unused but needed to integrate with scikit-learn

transform(views, **kwargs)

Transforms data given a fit model

Parameters

views (Iterable[ndarray]) – numpy arrays with the same number of rows (samples) separated by commas
kwargs – any additional keyword arguments required by the given model

Partial Least Squares

class cca_zoo.models.rcca.PLS(latent_dims=1, scale=True, centre=True, copy_data=True, random_state=None)[source]

A class used to fit a simple PLS model

Implements PLS by inheriting regularised CCA with maximal regularisation

Maths

\[ \begin{align}\begin{aligned}\begin{split}w_{opt}=\underset{w}{\mathrm{argmax}}\{ w_1^TX_1^TX_2w_2 \}\\\end{split}\\\text{subject to:}\\w_1^Tw_1=1\\w_2^Tw_2=1\end{aligned}\end{align} \]

Example

>>> from cca_zoo.models import PLS
>>> import numpy as np
>>> rng=np.random.RandomState(0)
>>> X1 = rng.random((10,5))
>>> X2 = rng.random((10,5))
>>> model = PLS()
>>> model.fit((X1,X2)).score((X1,X2))
array([0.81796873])

Constructor for PLS

Parameters

latent_dims (int) – number of latent dimensions to fit
scale (bool) – normalize variance in each column before fitting
centre – demean data by column before fitting (and before transforming out of sample
copy_data – If True, X will be copied; else, it may be overwritten
random_state – Pass for reproducible output across multiple function calls

fit(views, y=None, **kwargs)

Fits a regularised CCA (canonical ridge) model

Parameters: views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)

fit_transform(views, **kwargs)

Fits and then transforms the training data

Parameters

views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

get_loadings(views, normalize=False, **kwargs)

Returns the model loadings for each view for the given data

Parameters

views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model
normalize – scales loadings to ensure that they represent correlations between features and scores

pairwise_correlations(views, **kwargs)

Predicts the correlations between each view for each dimension for the given data using the fit model

Parameters

views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

Returns

all_corrs: an array of the pairwise correlations (k,k,self.latent_dims) where k is the number of views

score(views, y=None, **kwargs)

Returns average correlation in each dimension (averages over all pairs for multiview)

Parameters

views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
y – unused but needed to integrate with scikit-learn

transform(views, **kwargs)

Transforms data given a fit model

Parameters

views (Iterable[ndarray]) – numpy arrays with the same number of rows (samples) separated by commas
kwargs – any additional keyword arguments required by the given model

Ridge Regularized Canonical Correlation Analysis

class cca_zoo.models.rcca.rCCA(latent_dims=1, scale=True, centre=True, copy_data=True, random_state=None, c=None, eps=0.001, accept_sparse=None)[source]

A class used to fit Regularised CCA (canonical ridge) model. Uses PCA to perform the optimization efficiently for high dimensional data.

Maths

\[ \begin{align}\begin{aligned}\begin{split}w_{opt}=\underset{w}{\mathrm{argmax}}\{ w_1^TX_1^TX_2w_2 \}\\\end{split}\\\text{subject to:}\\(1-c_1)w_1^TX_1^TX_1w_1+c_1w_1^Tw_1=n\\(1-c_2)w_2^TX_2^TX_2w_2+c_2w_2^Tw_2=n\end{aligned}\end{align} \]

Citation

Vinod, Hrishikesh D. “Canonical ridge and econometrics of joint production.” Journal of econometrics 4.2 (1976): 147-166.

Example

>>> from cca_zoo.models import rCCA
>>> import numpy as np
>>> rng=np.random.RandomState(0)
>>> X1 = rng.random((10,5))
>>> X2 = rng.random((10,5))
>>> model = rCCA(c=[0.1,0.1])
>>> model.fit((X1,X2)).score((X1,X2))
array([0.95222128])

Constructor for rCCA

Parameters

latent_dims (int) – number of latent dimensions to fit
scale (bool) – normalize variance in each column before fitting
centre – demean data by column before fitting (and before transforming out of sample
copy_data – If True, X will be copied; else, it may be overwritten
random_state – Pass for reproducible output across multiple function calls
c (Union[Iterable[float], float, None]) – Iterable of regularisation parameters for each view (between 0:CCA and 1:PLS)
eps – epsilon for stability
accept_sparse – which forms are accepted for sparse data

fit(views, y=None, **kwargs)[source]

Fits a regularised CCA (canonical ridge) model

Parameters: views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)

fit_transform(views, **kwargs)

Fits and then transforms the training data

Parameters

views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

get_loadings(views, normalize=False, **kwargs)

Returns the model loadings for each view for the given data

Parameters

views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model
normalize – scales loadings to ensure that they represent correlations between features and scores

pairwise_correlations(views, **kwargs)

Predicts the correlations between each view for each dimension for the given data using the fit model

Parameters

views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

Returns

all_corrs: an array of the pairwise correlations (k,k,self.latent_dims) where k is the number of views

score(views, y=None, **kwargs)

Returns average correlation in each dimension (averages over all pairs for multiview)

Parameters

views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
y – unused but needed to integrate with scikit-learn

transform(views, **kwargs)

Transforms data given a fit model

Parameters

views (Iterable[ndarray]) – numpy arrays with the same number of rows (samples) separated by commas
kwargs – any additional keyword arguments required by the given model

GCCA and KGCCA

Generalized (MAXVAR) Canonical Correlation Analysis

class cca_zoo.models.gcca.GCCA(latent_dims=1, scale=True, centre=True, copy_data=True, random_state=None, c=None, view_weights=None)[source]

A class used to fit GCCA model. For more than 2 views, GCCA optimizes the sum of correlations with a shared auxiliary vector

Maths

\[ \begin{align}\begin{aligned}\begin{split}w_{opt}=\underset{w}{\mathrm{argmax}}\{ \sum_iw_i^TX_i^TT \}\\\end{split}\\\text{subject to:}\\T^TT=1\end{aligned}\end{align} \]

Citation

Tenenhaus, Arthur, and Michel Tenenhaus. “Regularized generalized canonical correlation analysis.” Psychometrika 76.2 (2011): 257.

Example

>>> from cca_zoo.models import GCCA
>>> import numpy as np
>>> rng=np.random.RandomState(0)
>>> X1 = rng.random((10,5))
>>> X2 = rng.random((10,5))
>>> X3 = rng.random((10,5))
>>> model = GCCA()
>>> model.fit((X1,X2,X3)).score((X1,X2,X3))
array([0.97229856])

Constructor for GCCA

Parameters

latent_dims (int) – number of latent dimensions to fit
scale (bool) – normalize variance in each column before fitting
centre – demean data by column before fitting (and before transforming out of sample
copy_data – If True, X will be copied; else, it may be overwritten
random_state – Pass for reproducible output across multiple function calls
c (Union[Iterable[float], float, None]) – regularisation between 0 (CCA) and 1 (PLS)
view_weights (Optional[Iterable[float]]) – list of weights of each view

fit(views, y=None, **kwargs)

Fits a regularised CCA (canonical ridge) model

Parameters: views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)

fit_transform(views, **kwargs)

Fits and then transforms the training data

Parameters

views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

get_loadings(views, normalize=False, **kwargs)

Returns the model loadings for each view for the given data

Parameters

views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model
normalize – scales loadings to ensure that they represent correlations between features and scores

pairwise_correlations(views, **kwargs)

Predicts the correlations between each view for each dimension for the given data using the fit model

Parameters

views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

Returns

all_corrs: an array of the pairwise correlations (k,k,self.latent_dims) where k is the number of views

score(views, y=None, **kwargs)

Returns average correlation in each dimension (averages over all pairs for multiview)

Parameters

views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
y – unused but needed to integrate with scikit-learn

transform(views, **kwargs)

Transforms data given a fit model

Parameters

views (Iterable[ndarray]) – numpy arrays with the same number of rows (samples) separated by commas
kwargs – any additional keyword arguments required by the given model

Kernel Generalized (MAXVAR) Canonical Correlation Analysis

class cca_zoo.models.gcca.KGCCA(latent_dims=1, scale=True, centre=True, copy_data=True, random_state=None, c=None, eps=0.001, kernel=None, gamma=None, degree=None, coef0=None, kernel_params=None)[source]

A class used to fit KGCCA model. For more than 2 views, KGCCA optimizes the sum of correlations with a shared auxiliary vector

Maths

\[ \begin{align}\begin{aligned}\begin{split}w_{opt}=\underset{w}{\mathrm{argmax}}\{ \sum_i\alpha_i^TK_i^TT \}\\\end{split}\\\text{subject to:}\\T^TT=1\end{aligned}\end{align} \]

Citation

Tenenhaus, Arthur, Cathy Philippe, and Vincent Frouin. “Kernel generalized canonical correlation analysis.” Computational Statistics & Data Analysis 90 (2015): 114-131.

Example

>>> from cca_zoo.models import KGCCA
>>> import numpy as np
>>> rng=np.random.RandomState(0)
>>> X1 = rng.random((10,5))
>>> X2 = rng.random((10,5))
>>> X3 = rng.random((10,5))
>>> model = KGCCA()
>>> model.fit((X1,X2,X3)).score((X1,X2,X3))
array([0.97019284])

Constructor for PLS

Parameters

latent_dims (int) – number of latent dimensions to fit
scale (bool) – normalize variance in each column before fitting
centre – demean data by column before fitting (and before transforming out of sample
copy_data – If True, X will be copied; else, it may be overwritten
random_state – Pass for reproducible output across multiple function calls
c (Union[Iterable[float], float, None]) – Iterable of regularisation parameters for each view (between 0:CCA and 1:PLS)
eps – epsilon for stability
kernel (Optional[Iterable[Union[float, callable]]]) – Iterable of kernel mappings used internally. This parameter is directly passed to pairwise_kernel. If element of kernel is a string, it must be one of the metrics in pairwise.PAIRWISE_KERNEL_FUNCTIONS. Alternatively, if element of kernel is a callable function, it is called on each pair of instances (rows) and the resulting value recorded. The callable should take two rows from X as input and return the corresponding kernel value as a single number. This means that callables from sklearn.metrics.pairwise are not allowed, as they operate on matrices, not single samples. Use the string identifying the kernel instead.
gamma (Optional[Iterable[float]]) – Iterable of gamma parameters for the RBF, laplacian, polynomial, exponential chi2 and sigmoid kernels. Interpretation of the default value is left to the kernel; see the documentation for sklearn.metrics.pairwise. Ignored by other kernels.
degree (Optional[Iterable[float]]) – Iterable of degree parameters of the polynomial kernel. Ignored by other kernels.
coef0 (Optional[Iterable[float]]) – Iterable of zero coefficients for polynomial and sigmoid kernels. Ignored by other kernels.
kernel_params (Optional[Iterable[dict]]) – Iterable of additional parameters (keyword arguments) for kernel function passed as callable object.
eps – epsilon value to ensure stability of smallest eigenvalues

transform(views, y=None, **kwargs)[source]

Transforms data given a fit KGCCA model

Parameters

views (ndarray) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

fit(views, y=None, **kwargs)

Fits a regularised CCA (canonical ridge) model

Parameters: views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)

fit_transform(views, **kwargs)

Fits and then transforms the training data

Parameters

views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

get_loadings(views, normalize=False, **kwargs)

Returns the model loadings for each view for the given data

Parameters

views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model
normalize – scales loadings to ensure that they represent correlations between features and scores

pairwise_correlations(views, **kwargs)

Predicts the correlations between each view for each dimension for the given data using the fit model

Parameters

views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

Returns

all_corrs: an array of the pairwise correlations (k,k,self.latent_dims) where k is the number of views

score(views, y=None, **kwargs)

Returns average correlation in each dimension (averages over all pairs for multiview)

Parameters

views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
y – unused but needed to integrate with scikit-learn

MCCA and KCCA

Multiset (SUMCOR) Canonical Correlation Analysis

class cca_zoo.models.mcca.MCCA(latent_dims=1, scale=True, centre=True, copy_data=True, random_state=None, c=None, eps=0.001)[source]

A class used to fit MCCA model. For more than 2 views, MCCA optimizes the sum of pairwise correlations.

Maths

\[ \begin{align}\begin{aligned}\begin{split}w_{opt}=\underset{w}{\mathrm{argmax}}\{\sum_i\sum_{j\neq i} w_i^TX_i^TX_jw_j \}\\\end{split}\\\text{subject to:}\\(1-c_i)w_i^TX_i^TX_iw_i+c_iw_i^Tw_i=1\end{aligned}\end{align} \]

Citation

Kettenring, Jon R. “Canonical analysis of several sets of variables.” Biometrika 58.3 (1971): 433-451.

Example

>>> from cca_zoo.models import MCCA
>>> import numpy as np
>>> rng=np.random.RandomState(0)
>>> X1 = rng.random((10,5))
>>> X2 = rng.random((10,5))
>>> X3 = rng.random((10,5))
>>> model = MCCA()
>>> model.fit((X1,X2,X3)).score((X1,X2,X3))
array([0.97200847])

Constructor for MCCA

Parameters

latent_dims (int) – number of latent dimensions to fit
scale (bool) – normalize variance in each column before fitting
centre – demean data by column before fitting (and before transforming out of sample
copy_data – If True, X will be copied; else, it may be overwritten
random_state – Pass for reproducible output across multiple function calls
c (Union[Iterable[float], float, None]) – Iterable of regularisation parameters for each view (between 0:CCA and 1:PLS)
eps – epsilon for stability

fit(views, y=None, **kwargs)

Fits a regularised CCA (canonical ridge) model

Parameters: views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)

fit_transform(views, **kwargs)

Fits and then transforms the training data

Parameters

views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

get_loadings(views, normalize=False, **kwargs)

Returns the model loadings for each view for the given data

Parameters

views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model
normalize – scales loadings to ensure that they represent correlations between features and scores

pairwise_correlations(views, **kwargs)

Predicts the correlations between each view for each dimension for the given data using the fit model

Parameters

views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

Returns

all_corrs: an array of the pairwise correlations (k,k,self.latent_dims) where k is the number of views

score(views, y=None, **kwargs)

Returns average correlation in each dimension (averages over all pairs for multiview)

Parameters

views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
y – unused but needed to integrate with scikit-learn

transform(views, **kwargs)

Transforms data given a fit model

Parameters

views (Iterable[ndarray]) – numpy arrays with the same number of rows (samples) separated by commas
kwargs – any additional keyword arguments required by the given model

Kernel Multiset (SUMCOR) Canonical Correlation Analysis

class cca_zoo.models.mcca.KCCA(latent_dims=1, scale=True, centre=True, copy_data=True, random_state=None, c=None, eps=0.001, kernel=None, gamma=None, degree=None, coef0=None, kernel_params=None)[source]

A class used to fit KCCA model.

Maths

\[ \begin{align}\begin{aligned}\begin{split}\alpha_{opt}=\underset{\alpha}{\mathrm{argmax}}\{\sum_i\sum_{j\neq i} \alpha_i^TK_i^TK_j\alpha_j \}\\\end{split}\\\text{subject to:}\\c_i\alpha_i^TK_i\alpha_i + (1-c_i)\alpha_i^TK_i^TK_i\alpha_i=1\end{aligned}\end{align} \]

Example

>>> from cca_zoo.models import KCCA
>>> import numpy as np
>>> rng=np.random.RandomState(0)
>>> X1 = rng.random((10,5))
>>> X2 = rng.random((10,5))
>>> X3 = rng.random((10,5))
>>> model = KCCA()
>>> model.fit((X1,X2,X3)).score((X1,X2,X3))
array([0.96893666])

Parameters

latent_dims (int) – number of latent dimensions to fit
scale (bool) – normalize variance in each column before fitting
centre – demean data by column before fitting (and before transforming out of sample
copy_data – If True, X will be copied; else, it may be overwritten
random_state – Pass for reproducible output across multiple function calls
c (Union[Iterable[float], float, None]) – Iterable of regularisation parameters for each view (between 0:CCA and 1:PLS)
eps – epsilon for stability
kernel (Optional[Iterable[Union[float, callable]]]) – Iterable of kernel mappings used internally. This parameter is directly passed to pairwise_kernel. If element of kernel is a string, it must be one of the metrics in pairwise.PAIRWISE_KERNEL_FUNCTIONS. Alternatively, if element of kernel is a callable function, it is called on each pair of instances (rows) and the resulting value recorded. The callable should take two rows from X as input and return the corresponding kernel value as a single number. This means that callables from sklearn.metrics.pairwise are not allowed, as they operate on matrices, not single samples. Use the string identifying the kernel instead.
gamma (Optional[Iterable[float]]) – Iterable of gamma parameters for the RBF, laplacian, polynomial, exponential chi2 and sigmoid kernels. Interpretation of the default value is left to the kernel; see the documentation for sklearn.metrics.pairwise. Ignored by other kernels.
degree (Optional[Iterable[float]]) – Iterable of degree parameters of the polynomial kernel. Ignored by other kernels.
coef0 (Optional[Iterable[float]]) – Iterable of zero coefficients for polynomial and sigmoid kernels. Ignored by other kernels.
kernel_params (Optional[Iterable[dict]]) – Iterable of additional parameters (keyword arguments) for kernel function passed as callable object.
eps – epsilon value to ensure stability of smallest eigenvalues

transform(views, **kwargs)[source]

Transforms data given a fit KCCA model

Parameters

views (ndarray) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

fit(views, y=None, **kwargs)

Fits a regularised CCA (canonical ridge) model

Parameters: views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)

fit_transform(views, **kwargs)

Fits and then transforms the training data

Parameters

views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

get_loadings(views, normalize=False, **kwargs)

Returns the model loadings for each view for the given data

Parameters

views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model
normalize – scales loadings to ensure that they represent correlations between features and scores

pairwise_correlations(views, **kwargs)

Predicts the correlations between each view for each dimension for the given data using the fit model

Parameters

views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

Returns

all_corrs: an array of the pairwise correlations (k,k,self.latent_dims) where k is the number of views

score(views, y=None, **kwargs)

Returns average correlation in each dimension (averages over all pairs for multiview)

Parameters

views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
y – unused but needed to integrate with scikit-learn

Tensor Canonical Correlation Analysis

class cca_zoo.models.tcca.TCCA(latent_dims=1, scale=True, centre=True, copy_data=True, random_state=None, c=None)[source]

Fits a Tensor CCA model. Tensor CCA maximises higher order correlations

Maths

\[ \begin{align}\begin{aligned}\begin{split}\alpha_{opt}=\underset{\alpha}{\mathrm{argmax}}\{\sum_i\sum_{j\neq i} \alpha_i^TK_i^TK_j\alpha_j \}\\\end{split}\\\text{subject to:}\\\alpha_i^TK_i^TK_i\alpha_i=1\end{aligned}\end{align} \]

Citation

Kim, Tae-Kyun, Shu-Fai Wong, and Roberto Cipolla. “Tensor canonical correlation analysis for action classification.” 2007 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2007 https://github.com/rciszek/mdr_tcca

Example

>>> from cca_zoo.models import TCCA
>>> rng=np.random.RandomState(0)
>>> X1 = rng.random((10,5))
>>> X2 = rng.random((10,5))
>>> X3 = rng.random((10,5))
>>> model = TCCA()
>>> model.fit((X1,X2,X3)).score((X1,X2,X3))
array([1.14595755])

Constructor for TCCA

Parameters

latent_dims (int) – number of latent dimensions to fit
scale – normalize variance in each column before fitting
centre – demean data by column before fitting (and before transforming out of sample
copy_data – If True, X will be copied; else, it may be overwritten
random_state – Pass for reproducible output across multiple function calls
c (Union[Iterable[float], float, None]) – Iterable of regularisation parameters for each view (between 0:CCA and 1:PLS)

fit(views, y=None, **kwargs)[source]

Parameters: views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)

correlations(views, **kwargs)[source]

Predicts the correlation for the given data using the fit model

Parameters

views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

score(views, **kwargs)[source]

Returns the higher order correlations in each dimension

Parameters

views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

fit_transform(views, **kwargs)

Fits and then transforms the training data

Parameters

views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

get_loadings(views, normalize=False, **kwargs)

Returns the model loadings for each view for the given data

Parameters

views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model
normalize – scales loadings to ensure that they represent correlations between features and scores

pairwise_correlations(views, **kwargs)

Predicts the correlations between each view for each dimension for the given data using the fit model

Parameters

views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

Returns

all_corrs: an array of the pairwise correlations (k,k,self.latent_dims) where k is the number of views

transform(views, **kwargs)

Transforms data given a fit model

Parameters

views (Iterable[ndarray]) – numpy arrays with the same number of rows (samples) separated by commas
kwargs – any additional keyword arguments required by the given model

Kernel Tensor Canonical Correlation Analysis

class cca_zoo.models.tcca.KTCCA(latent_dims=1, scale=True, centre=True, copy_data=True, random_state=None, eps=0.001, c=None, kernel=None, gamma=None, degree=None, coef0=None, kernel_params=None)[source]

Fits a Kernel Tensor CCA model. Tensor CCA maximises higher order correlations

Maths

\[ \begin{align}\begin{aligned}\begin{split}\alpha_{opt}=\underset{\alpha}{\mathrm{argmax}}\{\sum_i\sum_{j\neq i} \alpha_i^TK_i^TK_j\alpha_j \}\\\end{split}\\\text{subject to:}\\\alpha_i^TK_i^TK_i\alpha_i=1\end{aligned}\end{align} \]

Citation

Kim, Tae-Kyun, Shu-Fai Wong, and Roberto Cipolla. “Tensor canonical correlation analysis for action classification.” 2007 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2007

Example

>>> from cca_zoo.models import KTCCA
>>> rng=np.random.RandomState(0)
>>> X1 = rng.random((10,5))
>>> X2 = rng.random((10,5))
>>> X3 = rng.random((10,5))
>>> model = KTCCA()
>>> model.fit((X1,X2,X3)).score((X1,X2,X3))
array([1.69896269])

Constructor for TCCA

Parameters

latent_dims (int) – number of latent dimensions to fit
scale (bool) – normalize variance in each column before fitting
centre – demean data by column before fitting (and before transforming out of sample
copy_data – If True, X will be copied; else, it may be overwritten
random_state – Pass for reproducible output across multiple function calls
c (Union[Iterable[float], float, None]) – Iterable of regularisation parameters for each view (between 0:CCA and 1:PLS)
kernel (Optional[Iterable[Union[float, callable]]]) – Iterable of kernel mappings used internally. This parameter is directly passed to pairwise_kernel. If element of kernel is a string, it must be one of the metrics in pairwise.PAIRWISE_KERNEL_FUNCTIONS. Alternatively, if element of kernel is a callable function, it is called on each pair of instances (rows) and the resulting value recorded. The callable should take two rows from X as input and return the corresponding kernel value as a single number. This means that callables from sklearn.metrics.pairwise are not allowed, as they operate on matrices, not single samples. Use the string identifying the kernel instead.
gamma (Optional[Iterable[float]]) – Iterable of gamma parameters for the RBF, laplacian, polynomial, exponential chi2 and sigmoid kernels. Interpretation of the default value is left to the kernel; see the documentation for sklearn.metrics.pairwise. Ignored by other kernels.
degree (Optional[Iterable[float]]) – Iterable of degree parameters of the polynomial kernel. Ignored by other kernels.
coef0 (Optional[Iterable[float]]) – Iterable of zero coefficients for polynomial and sigmoid kernels. Ignored by other kernels.
kernel_params (Optional[Iterable[dict]]) – Iterable of additional parameters (keyword arguments) for kernel function passed as callable object.
eps – epsilon value to ensure stability

transform(views, **kwargs)[source]

Transforms data given a fit k=KCCA model

Parameters

views (ndarray) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

correlations(views, **kwargs)

Predicts the correlation for the given data using the fit model

Parameters

views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

fit(views, y=None, **kwargs)

Parameters: views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)

fit_transform(views, **kwargs)

Fits and then transforms the training data

Parameters

views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

get_loadings(views, normalize=False, **kwargs)

Returns the model loadings for each view for the given data

Parameters

views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model
normalize – scales loadings to ensure that they represent correlations between features and scores

pairwise_correlations(views, **kwargs)

Predicts the correlations between each view for each dimension for the given data using the fit model

Parameters

views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

Returns

all_corrs: an array of the pairwise correlations (k,k,self.latent_dims) where k is the number of views

score(views, **kwargs)

Returns the higher order correlations in each dimension

Parameters

views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

More Complex Regularisation using Iterative Models

Normal CCA and PLS by alternating least squares

Quicker and more memory efficient for very large data

CCA by Alternating Least Squares

class cca_zoo.models.CCA_ALS(latent_dims=1, scale=True, centre=True, copy_data=True, random_state=None, deflation='cca', max_iter=100, initialization='random', tol=1e-09, stochastic=True, positive=None)[source]

Fits a CCA model with CCA deflation by NIPALS algorithm. Implemented by ElasticCCA with no regularisation

Maths

\[ \begin{align}\begin{aligned}\begin{split}w_{opt}=\underset{w}{\mathrm{argmax}}\{\sum_i\sum_{j\neq i} \|X_iw_i-X_jw_j\|^2\}\\\end{split}\\\text{subject to:}\\w_i^TX_i^TX_iw_i=n\end{aligned}\end{align} \]

Citation

Golub, Gene H., and Hongyuan Zha. “The canonical correlations of matrix pairs and their numerical computation.” Linear algebra for signal processing. Springer, New York, NY, 1995. 27-49.

Example

>>> from cca_zoo.models import CCA_ALS
>>> import numpy as np
>>> rng=np.random.RandomState(0)
>>> X1 = rng.random((10,3))
>>> X2 = rng.random((10,3))
>>> model = CCA_ALS(random_state=0)
>>> model.fit((X1,X2)).score((X1,X2))
array([0.858906])

Constructor for CCA_ALS

Parameters

latent_dims (int) – number of latent dimensions to fit
scale (bool) – normalize variance in each column before fitting
centre – demean data by column before fitting (and before transforming out of sample
copy_data – If True, X will be copied; else, it may be overwritten
random_state – Pass for reproducible output across multiple function calls
max_iter (int) – the maximum number of iterations to perform in the inner optimization loop
initialization (str) – either string from “pls”, “cca”, “random”, “uniform” or callable to initialize the score variables for iterative methods
tol (float) – tolerance value used for early stopping
stochastic – use stochastic regression optimisers for subproblems
positive (Union[Iterable[bool], bool, None]) – constrain model weights to be positive

fit(views, y=None, **kwargs)

Fits the model by running an inner loop to convergence and then using either CCA or PLS deflation

Parameters: views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)

fit_transform(views, **kwargs)

Fits and then transforms the training data

Parameters

views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

get_loadings(views, normalize=False, **kwargs)

Returns the model loadings for each view for the given data

Parameters

views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model
normalize – scales loadings to ensure that they represent correlations between features and scores

pairwise_correlations(views, **kwargs)

Predicts the correlations between each view for each dimension for the given data using the fit model

Parameters

views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

Returns

all_corrs: an array of the pairwise correlations (k,k,self.latent_dims) where k is the number of views

score(views, y=None, **kwargs)

Returns average correlation in each dimension (averages over all pairs for multiview)

Parameters

views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
y – unused but needed to integrate with scikit-learn

transform(views, **kwargs)

Transforms data given a fit model

Parameters

views (Iterable[ndarray]) – numpy arrays with the same number of rows (samples) separated by commas
kwargs – any additional keyword arguments required by the given model

PLS by Alternating Least Squares

class cca_zoo.models.PLS_ALS(latent_dims=1, scale=True, centre=True, copy_data=True, random_state=None, max_iter=100, initialization='random', tol=1e-09)[source]

A class used to fit a PLS model

Fits a partial least squares model with CCA deflation by NIPALS algorithm

\[ \begin{align}\begin{aligned}\begin{split}w_{opt}=\underset{w}{\mathrm{argmax}}\{\sum_i\sum_{j\neq i} \|X_iw_i-X_jw_j\|^2\}\\\end{split}\\\text{subject to:}\\w_i^Tw_i=1\end{aligned}\end{align} \]

Example

>>> from cca_zoo.models import PLS
>>> import numpy as np
>>> rng=np.random.RandomState(0)
>>> X1 = rng.random((10,5))
>>> X2 = rng.random((10,5))
>>> model = PLS_ALS(random_state=0)
>>> model.fit((X1,X2)).score((X1,X2))
array([0.81796854])

Constructor for PLS

Parameters

latent_dims (int) – number of latent dimensions to fit
scale (bool) – normalize variance in each column before fitting
centre – demean data by column before fitting (and before transforming out of sample
copy_data – If True, X will be copied; else, it may be overwritten
random_state – Pass for reproducible output across multiple function calls
max_iter (int) – the maximum number of iterations to perform in the inner optimization loop
initialization (Union[str, callable]) – either string from “pls”, “cca”, “random”, “uniform” or callable to initialize the score variables for iterative methods
tol (float) – tolerance value used for early stopping

fit(views, y=None, **kwargs)

Fits the model by running an inner loop to convergence and then using either CCA or PLS deflation

Parameters: views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)

fit_transform(views, **kwargs)

Fits and then transforms the training data

Parameters

views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

get_loadings(views, normalize=False, **kwargs)

Returns the model loadings for each view for the given data

Parameters

views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model
normalize – scales loadings to ensure that they represent correlations between features and scores

pairwise_correlations(views, **kwargs)

Predicts the correlations between each view for each dimension for the given data using the fit model

Parameters

views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

Returns

all_corrs: an array of the pairwise correlations (k,k,self.latent_dims) where k is the number of views

score(views, y=None, **kwargs)

Returns average correlation in each dimension (averages over all pairs for multiview)

Parameters

views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
y – unused but needed to integrate with scikit-learn

transform(views, **kwargs)

Transforms data given a fit model

Parameters

views (Iterable[ndarray]) – numpy arrays with the same number of rows (samples) separated by commas
kwargs – any additional keyword arguments required by the given model

Sparsity Inducing Models

Penalized Matrix Decomposition (Sparse PLS)

class cca_zoo.models.PMD(latent_dims=1, scale=True, centre=True, copy_data=True, random_state=None, deflation='cca', c=None, max_iter=100, initialization='pls', tol=1e-09, positive=None)[source]

Fits a Sparse CCA (Penalized Matrix Decomposition) model.

Maths

\[ \begin{align}\begin{aligned}\begin{split}w_{opt}=\underset{w}{\mathrm{argmax}}\{ w_1^TX_1^TX_2w_2 \}\\\end{split}\\\text{subject to:}\\w_i^Tw_i=1\\\|w_i\|<=c_i\end{aligned}\end{align} \]

Citation

Witten, Daniela M., Robert Tibshirani, and Trevor Hastie. “A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis.” Biostatistics 10.3 (2009): 515-534.

Example

>>> from cca_zoo.models import PMD
>>> import numpy as np
>>> rng=np.random.RandomState(0)
>>> X1 = rng.random((10,5))
>>> X2 = rng.random((10,5))
>>> model = PMD(c=[1,1],random_state=0)
>>> model.fit((X1,X2)).score((X1,X2))
array([0.81796873])

Constructor for PMD

Parameters

latent_dims (int) – number of latent dimensions to fit
scale (bool) – normalize variance in each column before fitting
centre – demean data by column before fitting (and before transforming out of sample
copy_data – If True, X will be copied; else, it may be overwritten
random_state – Pass for reproducible output across multiple function calls
c (Union[Iterable[float], float, None]) – l1 regularisation parameter between 1 and sqrt(number of features) for each view
max_iter (int) – the maximum number of iterations to perform in the inner optimization loop
initialization (Union[str, callable]) – either string from “pls”, “cca”, “random”, “uniform” or callable to initialize the score variables for iterative methods
tol (float) – tolerance value used for early stopping
positive (Union[Iterable[bool], bool, None]) – constrain model weights to be positive

fit(views, y=None, **kwargs)

Fits the model by running an inner loop to convergence and then using either CCA or PLS deflation

Parameters: views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)

fit_transform(views, **kwargs)

Fits and then transforms the training data

Parameters

views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

get_loadings(views, normalize=False, **kwargs)

Returns the model loadings for each view for the given data

Parameters

views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model
normalize – scales loadings to ensure that they represent correlations between features and scores

pairwise_correlations(views, **kwargs)

Predicts the correlations between each view for each dimension for the given data using the fit model

Parameters

views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

Returns

all_corrs: an array of the pairwise correlations (k,k,self.latent_dims) where k is the number of views

score(views, y=None, **kwargs)

Returns average correlation in each dimension (averages over all pairs for multiview)

Parameters

views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
y – unused but needed to integrate with scikit-learn

transform(views, **kwargs)

Transforms data given a fit model

Parameters

views (Iterable[ndarray]) – numpy arrays with the same number of rows (samples) separated by commas
kwargs – any additional keyword arguments required by the given model

Sparse CCA by iterative lasso regression

class cca_zoo.models.SCCA(latent_dims=1, scale=True, centre=True, copy_data=True, random_state=None, deflation='cca', c=None, max_iter=100, maxvar=False, initialization='pls', tol=1e-09, stochastic=False, positive=None)[source]

Fits a sparse CCA model by iterative rescaled lasso regression. Implemented by ElasticCCA with l1 ratio=1

For default maxvar=False, the optimisation is given by:

Maths

\[ \begin{align}\begin{aligned}\begin{split}w_{opt}=\underset{w}{\mathrm{argmax}}\{\sum_i\sum_{j\neq i} \|X_iw_i-X_jw_j\|^2 + \text{l1_ratio}\|w_i\|_1\}\\\end{split}\\\text{subject to:}\\w_i^TX_i^TX_iw_i=n\end{aligned}\end{align} \]

Citation

Mai, Qing, and Xin Zhang. “An iterative penalized least squares approach to sparse canonical correlation analysis.” Biometrics 75.3 (2019): 734-744.

For maxvar=True, the optimisation is given by the ElasticCCA problem with no l2 regularisation:

Maths

\[ \begin{align}\begin{aligned}\begin{split}w_{opt}, t_{opt}=\underset{w,t}{\mathrm{argmax}}\{\sum_i \|X_iw_i-t\|^2 + \text{l1_ratio}\|w_i\|_1\}\\\end{split}\\\text{subject to:}\\t^Tt=n\end{aligned}\end{align} \]

Citation

Fu, Xiao, et al. “Scalable and flexible multiview MAX-VAR canonical correlation analysis.” IEEE Transactions on Signal Processing 65.16 (2017): 4150-4165.

Example

>>> from cca_zoo.models import SCCA
>>> import numpy as np
>>> rng=np.random.RandomState(0)
>>> X1 = rng.random((10,5))
>>> X2 = rng.random((10,5))
>>> model = SCCA(c=[0.001,0.001], random_state=0)
>>> model.fit((X1,X2)).score((X1,X2))
array([0.99998761])

Constructor for SCCA

Parameters

latent_dims (int) – number of latent dimensions to fit
scale (bool) – normalize variance in each column before fitting
centre – demean data by column before fitting (and before transforming out of sample
copy_data – If True, X will be copied; else, it may be overwritten
random_state – Pass for reproducible output across multiple function calls
max_iter (int) – the maximum number of iterations to perform in the inner optimization loop
maxvar (bool) – use auxiliary variable “maxvar” form
initialization (Union[str, callable]) – either string from “pls”, “cca”, “random”, “uniform” or callable to initialize the score variables for iterative methods
tol (float) – tolerance value used for early stopping
c (Union[Iterable[float], float, None]) – lasso alpha
stochastic – use stochastic regression optimisers for subproblems
positive (Union[Iterable[bool], bool, None]) – constrain model weights to be positive

fit(views, y=None, **kwargs)

Fits the model by running an inner loop to convergence and then using either CCA or PLS deflation

Parameters: views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)

fit_transform(views, **kwargs)

Fits and then transforms the training data

Parameters

views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

get_loadings(views, normalize=False, **kwargs)

Returns the model loadings for each view for the given data

Parameters

views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model
normalize – scales loadings to ensure that they represent correlations between features and scores

pairwise_correlations(views, **kwargs)

Predicts the correlations between each view for each dimension for the given data using the fit model

Parameters

views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

Returns

all_corrs: an array of the pairwise correlations (k,k,self.latent_dims) where k is the number of views

score(views, y=None, **kwargs)

Returns average correlation in each dimension (averages over all pairs for multiview)

Parameters

views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
y – unused but needed to integrate with scikit-learn

transform(views, **kwargs)

Transforms data given a fit model

Parameters

views (Iterable[ndarray]) – numpy arrays with the same number of rows (samples) separated by commas
kwargs – any additional keyword arguments required by the given model

Elastic CCA by MAXVAR

class cca_zoo.models.ElasticCCA(latent_dims=1, scale=True, centre=True, copy_data=True, random_state=None, deflation='cca', max_iter=100, initialization='pls', tol=1e-09, c=None, l1_ratio=None, maxvar=True, stochastic=False, positive=None)[source]

Fits an elastic CCA by iterating elastic net regressions.

By default, ElasticCCA uses CCA with an auxiliary variable target i.e. MAXVAR configuration

\[ \begin{align}\begin{aligned}\begin{split}w_{opt}, t_{opt}=\underset{w,t}{\mathrm{argmax}}\{\sum_i \|X_iw_i-t\|^2 + c\|w_i\|^2_2 + \text{l1_ratio}\|w_i\|_1\}\\\end{split}\\\text{subject to:}\\t^Tt=n\end{aligned}\end{align} \]

Citation

Fu, Xiao, et al. “Scalable and flexible multiview MAX-VAR canonical correlation analysis.” IEEE Transactions on Signal Processing 65.16 (2017): 4150-4165.

But we can force it to attempt to use the SUMCOR form which will approximate a solution to the problem:

\[ \begin{align}\begin{aligned}\begin{split}w_{opt}=\underset{w}{\mathrm{argmax}}\{\sum_i\sum_{j\neq i} \|X_iw_i-X_jw_j\|^2 + c\|w_i\|^2_2 + \text{l1_ratio}\|w_i\|_1\}\\\end{split}\\\text{subject to:}\\w_i^TX_i^TX_iw_i=n\end{aligned}\end{align} \]

Example

>>> from cca_zoo.models import ElasticCCA
>>> import numpy as np
>>> rng=np.random.RandomState(0)
>>> X1 = rng.random((10,5))
>>> X2 = rng.random((10,5))
>>> model = ElasticCCA(c=[1e-1,1e-1],l1_ratio=[0.5,0.5], random_state=0)
>>> model.fit((X1,X2)).score((X1,X2))
array([0.9316638])

Constructor for ElasticCCA

Parameters

latent_dims (int) – number of latent dimensions to fit
scale (bool) – normalize variance in each column before fitting
centre – demean data by column before fitting (and before transforming out of sample
copy_data – If True, X will be copied; else, it may be overwritten
random_state – Pass for reproducible output across multiple function calls
deflation – the type of deflation.
max_iter (int) – the maximum number of iterations to perform in the inner optimization loop
initialization (Union[str, callable]) – either string from “pls”, “cca”, “random”, “uniform” or callable to initialize the score variables for iterative methods
tol (float) – tolerance value used for early stopping
c (Union[Iterable[float], float, None]) – lasso alpha
l1_ratio (Union[Iterable[float], float, None]) – l1 ratio in lasso subproblems
maxvar (bool) – use auxiliary variable “maxvar” formulation
stochastic – use stochastic regression optimisers for subproblems
positive (Union[Iterable[bool], bool, None]) – constrain model weights to be positive

fit(views, y=None, **kwargs)

Fits the model by running an inner loop to convergence and then using either CCA or PLS deflation

Parameters: views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)

fit_transform(views, **kwargs)

Fits and then transforms the training data

Parameters

views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

get_loadings(views, normalize=False, **kwargs)

Returns the model loadings for each view for the given data

Parameters

views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model
normalize – scales loadings to ensure that they represent correlations between features and scores

pairwise_correlations(views, **kwargs)

Predicts the correlations between each view for each dimension for the given data using the fit model

Parameters

views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

Returns

all_corrs: an array of the pairwise correlations (k,k,self.latent_dims) where k is the number of views

score(views, y=None, **kwargs)

Returns average correlation in each dimension (averages over all pairs for multiview)

Parameters

views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
y – unused but needed to integrate with scikit-learn

transform(views, **kwargs)

Transforms data given a fit model

Parameters

views (Iterable[ndarray]) – numpy arrays with the same number of rows (samples) separated by commas
kwargs – any additional keyword arguments required by the given model

Span CCA

class cca_zoo.models.SpanCCA(latent_dims=1, scale=True, centre=True, copy_data=True, max_iter=100, initialization='uniform', tol=1e-09, regularisation='l0', c=None, rank=1, positive=None, random_state=None, deflation='cca')[source]

Fits a Sparse CCA model using SpanCCA.

\[ \begin{align}\begin{aligned}\begin{split}w_{opt}=\underset{w}{\mathrm{argmax}}\{\sum_i\sum_{j\neq i} \|X_iw_i-X_jw_j\|^2 + \text{l1_ratio}\|w_i\|_1\}\\\end{split}\\\text{subject to:}\\w_i^TX_i^TX_iw_i=1\end{aligned}\end{align} \]

Citation

Asteris, Megasthenis, et al. “A simple and provable algorithm for sparse diagonal CCA.” International Conference on Machine Learning. PMLR, 2016.

Example

>>> from cca_zoo.models import SpanCCA
>>> import numpy as np
>>> rng=np.random.RandomState(0)
>>> X1 = rng.random((10,5))
>>> X2 = rng.random((10,5))
>>> model = SpanCCA(regularisation="l0", c=[2, 2])
>>> model.fit((X1,X2)).score((X1,X2))
array([0.84556666])

Parameters

latent_dims (int) – number of latent dimensions to fit
scale (bool) – normalize variance in each column before fitting
centre – demean data by column before fitting (and before transforming out of sample
copy_data – If True, X will be copied; else, it may be overwritten
random_state – Pass for reproducible output across multiple function calls
max_iter (int) – the maximum number of iterations to perform in the inner optimization loop
initialization (str) – either string from “pls”, “cca”, “random”, “uniform” or callable to initialize the score variables for iterative methods
tol (float) – tolerance value used for early stopping
regularisation –
c (Union[Iterable[Union[float, int]], float, int, None]) – regularisation parameter
rank – rank of the approximation
positive (Union[Iterable[bool], bool, None]) – constrain weights to be positive

fit(views, y=None, **kwargs)

Fits the model by running an inner loop to convergence and then using either CCA or PLS deflation

Parameters: views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)

fit_transform(views, **kwargs)

Fits and then transforms the training data

Parameters

views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

get_loadings(views, normalize=False, **kwargs)

Returns the model loadings for each view for the given data

Parameters

views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model
normalize – scales loadings to ensure that they represent correlations between features and scores

pairwise_correlations(views, **kwargs)

Predicts the correlations between each view for each dimension for the given data using the fit model

Parameters

views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

Returns

all_corrs: an array of the pairwise correlations (k,k,self.latent_dims) where k is the number of views

score(views, y=None, **kwargs)

Returns average correlation in each dimension (averages over all pairs for multiview)

Parameters

views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
y – unused but needed to integrate with scikit-learn

transform(views, **kwargs)

Transforms data given a fit model

Parameters

views (Iterable[ndarray]) – numpy arrays with the same number of rows (samples) separated by commas
kwargs – any additional keyword arguments required by the given model

Parkhomenko (penalized) CCA

class cca_zoo.models.ParkhomenkoCCA(latent_dims=1, scale=True, centre=True, copy_data=True, random_state=None, deflation='cca', c=None, max_iter=100, initialization='pls', tol=1e-09)[source]

Fits a sparse CCA (penalized CCA) model

\[ \begin{align}\begin{aligned}\begin{split}w_{opt}=\underset{w}{\mathrm{argmax}}\{ w_1^TX_1^TX_2w_2 \} + c_i\|w_i\|\\\end{split}\\\text{subject to:}\\w_i^Tw_i=1\end{aligned}\end{align} \]

Citation

Parkhomenko, Elena, David Tritchler, and Joseph Beyene. “Sparse canonical correlation analysis with application to genomic data integration.” Statistical applications in genetics and molecular biology 8.1 (2009).

Example

>>> from cca_zoo.models import ParkhomenkoCCA
>>> import numpy as np
>>> rng=np.random.RandomState(0)
>>> X1 = rng.random((10,5))
>>> X2 = rng.random((10,5))
>>> model = ParkhomenkoCCA(c=[0.001,0.001],random_state=0)
>>> model.fit((X1,X2)).score((X1,X2))
array([0.81803527])

Constructor for ParkhomenkoCCA

Parameters

latent_dims (int) – number of latent dimensions to fit
scale (bool) – normalize variance in each column before fitting
centre – demean data by column before fitting (and before transforming out of sample
copy_data – If True, X will be copied; else, it may be overwritten
random_state – Pass for reproducible output across multiple function calls
c (Union[Iterable[float], float, None]) – l1 regularisation parameter
max_iter (int) – the maximum number of iterations to perform in the inner optimization loop
initialization (Union[str, callable]) – either string from “pls”, “cca”, “random”, “uniform” or callable to initialize the score variables for iterative methods
tol (float) – tolerance value used for early stopping

fit(views, y=None, **kwargs)

Fits the model by running an inner loop to convergence and then using either CCA or PLS deflation

Parameters: views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)

fit_transform(views, **kwargs)

Fits and then transforms the training data

Parameters

views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

get_loadings(views, normalize=False, **kwargs)

Returns the model loadings for each view for the given data

Parameters

views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model
normalize – scales loadings to ensure that they represent correlations between features and scores

pairwise_correlations(views, **kwargs)

Predicts the correlations between each view for each dimension for the given data using the fit model

Parameters

views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

Returns

all_corrs: an array of the pairwise correlations (k,k,self.latent_dims) where k is the number of views

score(views, y=None, **kwargs)

Returns average correlation in each dimension (averages over all pairs for multiview)

Parameters

views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
y – unused but needed to integrate with scikit-learn

transform(views, **kwargs)

Transforms data given a fit model

Parameters

views (Iterable[ndarray]) – numpy arrays with the same number of rows (samples) separated by commas
kwargs – any additional keyword arguments required by the given model

Sparse CCA by ADMM

class cca_zoo.models.SCCA_ADMM(latent_dims=1, scale=True, centre=True, copy_data=True, random_state=None, deflation='cca', c=None, mu=None, lam=None, eta=None, max_iter=100, initialization='pls', tol=1e-09)[source]

Fits a sparse CCA model by alternating ADMM

\[ \begin{align}\begin{aligned}\begin{split}w_{opt}=\underset{w}{\mathrm{argmax}}\{\sum_i\sum_{j\neq i} \|X_iw_i-X_jw_j\|^2 + \text{l1_ratio}\|w_i\|_1\}\\\end{split}\\\text{subject to:}\\w_i^TX_i^TX_iw_i=1\end{aligned}\end{align} \]

Citation

Suo, Xiaotong, et al. “Sparse canonical correlation analysis.” arXiv preprint arXiv:1705.10865 (2017).

Example

>>> from cca_zoo.models import SCCA_ADMM
>>> import numpy as np
>>> rng=np.random.RandomState(0)
>>> X1 = rng.random((10,5))
>>> X2 = rng.random((10,5))
>>> model = SCCA_ADMM(random_state=0,c=[1e-1,1e-1])
>>> model.fit((X1,X2)).score((X1,X2))
array([0.84348183])

Constructor for SCCA_ADMM

Parameters

latent_dims (int) – number of latent dimensions to fit
scale (bool) – normalize variance in each column before fitting
centre – demean data by column before fitting (and before transforming out of sample
copy_data – If True, X will be copied; else, it may be overwritten
random_state – Pass for reproducible output across multiple function calls
c (Union[Iterable[float], float, None]) – l1 regularisation parameter
max_iter (int) – the maximum number of iterations to perform in the inner optimization loop
initialization (Union[str, callable]) – either string from “pls”, “cca”, “random”, “uniform” or callable to initialize the score variables for iterative methods
tol (float) – tolerance value used for early stopping
mu (Union[Iterable[float], float, None]) –
lam (Union[Iterable[float], float, None]) –

Param

eta:

fit(views, y=None, **kwargs)

Fits the model by running an inner loop to convergence and then using either CCA or PLS deflation

Parameters: views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)

fit_transform(views, **kwargs)

Fits and then transforms the training data

Parameters

views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

get_loadings(views, normalize=False, **kwargs)

Returns the model loadings for each view for the given data

Parameters

views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model
normalize – scales loadings to ensure that they represent correlations between features and scores

pairwise_correlations(views, **kwargs)

Predicts the correlations between each view for each dimension for the given data using the fit model

Parameters

views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

Returns

all_corrs: an array of the pairwise correlations (k,k,self.latent_dims) where k is the number of views

score(views, y=None, **kwargs)

Returns average correlation in each dimension (averages over all pairs for multiview)

Parameters

views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
y – unused but needed to integrate with scikit-learn

transform(views, **kwargs)

Transforms data given a fit model

Parameters

views (Iterable[ndarray]) – numpy arrays with the same number of rows (samples) separated by commas
kwargs – any additional keyword arguments required by the given model

Miscellaneous

Nonparametric CCA

class cca_zoo.models.NCCA(latent_dims=1, scale=True, centre=True, copy_data=True, accept_sparse=False, random_state=None, nearest_neighbors=None, gamma=None)[source]

A class used to fit nonparametric (NCCA) model.

Citation

Michaeli, Tomer, Weiran Wang, and Karen Livescu. “Nonparametric canonical correlation analysis.” International conference on machine learning. PMLR, 2016.

Example

>>> from cca_zoo.models import NCCA
>>> X1 = np.random.rand(10,5)
>>> X2 = np.random.rand(10,5)
>>> model = NCCA()
>>> model.fit((X1,X2)).score((X1,X2))
array([1.])

Constructor for NCCA

Parameters

latent_dims (int) – number of latent dimensions to fit
scale – normalize variance in each column before fitting
centre – demean data by column before fitting (and before transforming out of sample
copy_data – If True, X will be copied; else, it may be overwritten
accept_sparse – Whether model can take sparse data as input
random_state (Union[int, RandomState, None]) – Pass for reproducible output across multiple function calls
nearest_neighbors – Number of nearest neighbors (l2 distance) to consider when constructing affinity
gamma (Optional[Iterable[float]]) – Bandwidth parameter for rbf kernel

fit(views, y=None, **kwargs)[source]

Fits a given model

Parameters

views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
y – unused but needed to integrate with scikit-learn

transform(views, **kwargs)[source]

Transforms data given a fit model

Parameters

views (Iterable[ndarray]) – numpy arrays with the same number of rows (samples) separated by commas
kwargs – any additional keyword arguments required by the given model

fit_transform(views, **kwargs)

Fits and then transforms the training data

Parameters

views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

get_loadings(views, normalize=False, **kwargs)

Returns the model loadings for each view for the given data

Parameters

views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model
normalize – scales loadings to ensure that they represent correlations between features and scores

pairwise_correlations(views, **kwargs)

Predicts the correlations between each view for each dimension for the given data using the fit model

Parameters

views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

Returns

all_corrs: an array of the pairwise correlations (k,k,self.latent_dims) where k is the number of views

score(views, y=None, **kwargs)

Returns average correlation in each dimension (averages over all pairs for multiview)

Parameters

views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
y – unused but needed to integrate with scikit-learn

Partial CCA

class cca_zoo.models.PartialCCA(latent_dims=1, scale=True, centre=True, copy_data=True, random_state=None, c=None, eps=0.001)[source]

A class used to fit a partial cca model. The key difference between this and a vanilla CCA or MCCA is that the canonical score vectors must be orthogonal to the supplied confounding variables.

Citation

Rao, B. Raja. “Partial canonical correlations.” Trabajos de estadistica y de investigación operativa 20.2-3 (1969): 211-219.

Example

>>> from cca_zoo.models import PartialCCA
>>> X1 = np.random.rand(10,5)
>>> X2 = np.random.rand(10,5)
>>> partials = np.random.rand(10,3)
>>> model = PartialCCA()
>>> model.fit((X1,X2),partials=partials).score((X1,X2),partials=partials)
array([0.99993046])

Constructor for Partial CCA

Parameters

latent_dims (int) – number of latent dimensions to fit
scale (bool) – normalize variance in each column before fitting
centre – demean data by column before fitting (and before transforming out of sample
copy_data – If True, X will be copied; else, it may be overwritten
random_state – Pass for reproducible output across multiple function calls
c (Union[Iterable[float], float, None]) – Iterable of regularisation parameters for each view (between 0:CCA and 1:PLS)
eps – epsilon for stability

transform(views, partials=None, **kwargs)[source]

Transforms data given a fit model

Parameters: views (Iterable[ndarray]) – numpy arrays with the same number of rows (samples) separated by commas

fit(views, y=None, **kwargs)

Fits a regularised CCA (canonical ridge) model

Parameters: views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)

fit_transform(views, **kwargs)

Fits and then transforms the training data

Parameters

views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

get_loadings(views, normalize=False, **kwargs)

Returns the model loadings for each view for the given data

Parameters

views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model
normalize – scales loadings to ensure that they represent correlations between features and scores

pairwise_correlations(views, **kwargs)

Predicts the correlations between each view for each dimension for the given data using the fit model

Parameters

views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

Returns

all_corrs: an array of the pairwise correlations (k,k,self.latent_dims) where k is the number of views

score(views, y=None, **kwargs)

Returns average correlation in each dimension (averages over all pairs for multiview)

Parameters

views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
y – unused but needed to integrate with scikit-learn

Sparse Weighted CCA

class cca_zoo.models.SWCCA(latent_dims=1, scale=True, centre=True, copy_data=True, random_state=None, max_iter=500, initialization='uniform', tol=1e-09, regularisation='l0', c=None, sample_support=None, positive=False)[source]

A class used to fit SWCCA model

Citation

Example

>>> from cca_zoo.models import SWCCA
>>> import numpy as np
>>> rng=np.random.RandomState(0)
>>> X1 = rng.random((10,5))
>>> X2 = rng.random((10,5))
>>> model = SWCCA(regularisation='l0',c=[2, 2], sample_support=5, random_state=0)
>>> model.fit((X1,X2)).score((X1,X2))
array([0.61620969])

Parameters

latent_dims (int) – number of latent dimensions to fit
scale (bool) – normalize variance in each column before fitting
centre – demean data by column before fitting (and before transforming out of sample
copy_data – If True, X will be copied; else, it may be overwritten
random_state – Pass for reproducible output across multiple function calls
max_iter (int) – the maximum number of iterations to perform in the inner optimization loop
initialization (str) – either string from “pls”, “cca”, “random”, “uniform” or callable to initialize the score variables for iterative methods
tol (float) – tolerance value used for early stopping
regularisation – the type of regularisation on the weights either ‘l0’ or ‘l1’
c (Union[Iterable[Union[float, int]], float, int, None]) – regularisation parameter
sample_support – the l0 norm of the sample weights
positive – constrain weights to be positive

fit(views, y=None, **kwargs)

Fits the model by running an inner loop to convergence and then using either CCA or PLS deflation

Parameters: views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)

fit_transform(views, **kwargs)

Fits and then transforms the training data

Parameters

views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

get_loadings(views, normalize=False, **kwargs)

Returns the model loadings for each view for the given data

Parameters

views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model
normalize – scales loadings to ensure that they represent correlations between features and scores

pairwise_correlations(views, **kwargs)

Predicts the correlations between each view for each dimension for the given data using the fit model

Parameters

views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

Returns

all_corrs: an array of the pairwise correlations (k,k,self.latent_dims) where k is the number of views

score(views, y=None, **kwargs)

Returns average correlation in each dimension (averages over all pairs for multiview)

Parameters

views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
y – unused but needed to integrate with scikit-learn

transform(views, **kwargs)

Transforms data given a fit model

Parameters

views (Iterable[ndarray]) – numpy arrays with the same number of rows (samples) separated by commas
kwargs – any additional keyword arguments required by the given model

Base Class

class cca_zoo.models._cca_base._CCA_Base(latent_dims=1, scale=True, centre=True, copy_data=True, accept_sparse=False, random_state=None)[source]

A class used as the base for methods in the package. Allows methods to inherit fit_transform, predict_corr, and gridsearch_fit when only fit (and transform where it is different to the default) is provided.

weights

Type: list of weights for each view

Constructor for _CCA_Base

Parameters

latent_dims (int) – number of latent dimensions to fit
scale – normalize variance in each column before fitting
centre – demean data by column before fitting (and before transforming out of sample
copy_data – If True, X will be copied; else, it may be overwritten
accept_sparse – Whether model can take sparse data as input
random_state (Union[int, RandomState, None]) – Pass for reproducible output across multiple function calls

abstract fit(views, y=None, **kwargs)[source]

Fits a given model

Parameters

views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
y – unused but needed to integrate with scikit-learn

transform(views, **kwargs)[source]

Transforms data given a fit model

Parameters

views (Iterable[ndarray]) – numpy arrays with the same number of rows (samples) separated by commas
kwargs – any additional keyword arguments required by the given model

fit_transform(views, **kwargs)[source]

Fits and then transforms the training data

Parameters

views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

get_loadings(views, normalize=False, **kwargs)[source]

Returns the model loadings for each view for the given data

Parameters

views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model
normalize – scales loadings to ensure that they represent correlations between features and scores

pairwise_correlations(views, **kwargs)[source]

Predicts the correlations between each view for each dimension for the given data using the fit model

Parameters

views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model

Returns

all_corrs: an array of the pairwise correlations (k,k,self.latent_dims) where k is the number of views

score(views, y=None, **kwargs)[source]

Returns average correlation in each dimension (averages over all pairs for multiview)

Parameters

views (Iterable[ndarray]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
y – unused but needed to integrate with scikit-learn