Models
- class cca_zoo.models.GCCA(latent_dims: int = 1, scale: bool = True, centre=True, copy_data=True, random_state=None, c: Optional[Union[Iterable[float], float]] = None, view_weights: Optional[Iterable[float]] = None)[source]
Bases:
rCCA
A class used to fit GCCA model. For more than 2 views, GCCA optimizes the sum of correlations with a shared auxiliary vector
\[ \begin{align}\begin{aligned}\begin{split}w_{opt}=\underset{w}{\mathrm{argmax}}\{ \sum_iw_i^TX_i^TT \}\\\end{split}\\\text{subject to:}\\T^TT=1\end{aligned}\end{align} \]- Parameters
latent_dims (int, optional) – Number of latent dimensions to use, by default 1
scale (bool, optional) – Whether to scale the data, by default True
centre (bool, optional) – Whether to centre the data, by default True
copy_data (bool, optional) – Whether to copy the data, by default True
random_state (int, optional) – Random state, by default None
c (Union[Iterable[float], float], optional) – Regularization parameter, by default None
view_weights (Iterable[float], optional) – Weights for each view, by default None
References
Tenenhaus, Arthur, and Michel Tenenhaus. “Regularized generalized canonical correlation analysis.” Psychometrika 76.2 (2011): 257.
Examples
>>> from cca_zoo.models import GCCA >>> import numpy as np >>> rng=np.random.RandomState(0) >>> X1 = rng.random((10,5)) >>> X2 = rng.random((10,5)) >>> X3 = rng.random((10,5)) >>> model = GCCA() >>> model.fit((X1,X2,X3)).score((X1,X2,X3)) array([0.97229856])
- Parameters
latent_dims (int, optional) – Number of latent dimensions to fit. Default is 1.
scale (bool, optional) – Whether to scale the data to unit variance. Default is True.
centre (bool, optional) – Whether to centre the data. Default is True.
copy_data (bool, optional) – Whether to copy the data. Default is True.
accept_sparse (bool, optional) – Whether to accept sparse data. Default is False.
random_state (int, RandomState instance or None, optional (default=None)) – Pass an int for reproducible output across multiple function calls.
- fit(views: Iterable[ndarray], y=None, **kwargs)
Fits the model to the given data
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
y (None) –
kwargs (any additional keyword arguments required by the given model) –
- Returns
self
- Return type
- fit_transform(views: Iterable[ndarray], **kwargs)
Fits the model to the given data and returns the transformed views
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –
- Returns
transformed_views
- Return type
list of numpy arrays
- get_factor_loadings(views: Iterable[ndarray], normalize=True, **kwargs)
Returns the factor loadings for each view
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
normalize (bool, optional) – Whether to normalize the factor loadings. Default is True.
kwargs (any additional keyword arguments required by the given model) –
- Returns
factor_loadings
- Return type
list of numpy arrays
- get_params(deep=True)
Get parameters for this estimator.
- pairwise_correlations(views: Iterable[ndarray], **kwargs)
Returns the pairwise correlations between the views in each dimension
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –
- Returns
pairwise_correlations
- Return type
numpy array of shape (n_views, n_views, latent_dims)
- score(views: Iterable[ndarray], y=None, **kwargs)
Returns the average pairwise correlation between the views
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
y (None) –
kwargs (any additional keyword arguments required by the given model) –
- Returns
score
- Return type
- set_params(**params)
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters
**params (dict) – Estimator parameters.
- Returns
self – Estimator instance.
- Return type
estimator instance
- class cca_zoo.models.KGCCA(latent_dims: int = 1, scale: bool = True, centre=True, copy_data=True, random_state=None, c: Optional[Union[Iterable[float], float]] = None, eps=0.001, kernel: Optional[Iterable[Union[float, callable]]] = None, gamma: Optional[Iterable[float]] = None, degree: Optional[Iterable[float]] = None, coef0: Optional[Iterable[float]] = None, kernel_params: Optional[Iterable[dict]] = None)[source]
Bases:
GCCA
A class used to fit KGCCA model. For more than 2 views, KGCCA optimizes the sum of correlations with a shared auxiliary vector
\[ \begin{align}\begin{aligned}\begin{split}w_{opt}=\underset{w}{\mathrm{argmax}}\{ \sum_i\alpha_i^TK_i^TT \}\\\end{split}\\\text{subject to:}\\T^TT=1\end{aligned}\end{align} \]References
Tenenhaus, Arthur, Cathy Philippe, and Vincent Frouin. “Kernel generalized canonical correlation analysis.” Computational Statistics & Data Analysis 90 (2015): 114-131.
Examples
>>> from cca_zoo.models import KGCCA >>> import numpy as np >>> rng=np.random.RandomState(0) >>> X1 = rng.random((10,5)) >>> X2 = rng.random((10,5)) >>> X3 = rng.random((10,5)) >>> model = KGCCA() >>> model.fit((X1,X2,X3)).score((X1,X2,X3)) array([0.97019284])
- Parameters
latent_dims (int, optional) – Number of latent dimensions to fit. Default is 1.
scale (bool, optional) – Whether to scale the data to unit variance. Default is True.
centre (bool, optional) – Whether to centre the data. Default is True.
copy_data (bool, optional) – Whether to copy the data. Default is True.
accept_sparse (bool, optional) – Whether to accept sparse data. Default is False.
random_state (int, RandomState instance or None, optional (default=None)) – Pass an int for reproducible output across multiple function calls.
- transform(views: ndarray, y=None, **kwargs)[source]
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –
- Returns
transformed_views
- Return type
list of numpy arrays
- fit(views: Iterable[ndarray], y=None, **kwargs)
Fits the model to the given data
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
y (None) –
kwargs (any additional keyword arguments required by the given model) –
- Returns
self
- Return type
- fit_transform(views: Iterable[ndarray], **kwargs)
Fits the model to the given data and returns the transformed views
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –
- Returns
transformed_views
- Return type
list of numpy arrays
- get_factor_loadings(views: Iterable[ndarray], normalize=True, **kwargs)
Returns the factor loadings for each view
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
normalize (bool, optional) – Whether to normalize the factor loadings. Default is True.
kwargs (any additional keyword arguments required by the given model) –
- Returns
factor_loadings
- Return type
list of numpy arrays
- get_params(deep=True)
Get parameters for this estimator.
- pairwise_correlations(views: Iterable[ndarray], **kwargs)
Returns the pairwise correlations between the views in each dimension
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –
- Returns
pairwise_correlations
- Return type
numpy array of shape (n_views, n_views, latent_dims)
- score(views: Iterable[ndarray], y=None, **kwargs)
Returns the average pairwise correlation between the views
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
y (None) –
kwargs (any additional keyword arguments required by the given model) –
- Returns
score
- Return type
- set_params(**params)
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters
**params (dict) – Estimator parameters.
- Returns
self – Estimator instance.
- Return type
estimator instance
- class cca_zoo.models.PLS_ALS(latent_dims: int = 1, scale: bool = True, centre=True, copy_data=True, random_state=None, max_iter: int = 100, initialization: Union[str, callable] = 'random', tol: float = 0.001, deflation='pls', verbose=0)[source]
Bases:
_BaseIterative
A class used to fit a PLS model to two or more views of data.
Fits a partial least squares model with CCA deflation by NIPALS algorithm
\[ \begin{align}\begin{aligned}\begin{split}w_{opt}=\underset{w}{\mathrm{argmax}}\{\sum_i\sum_{j\neq i} \|X_iw_i-X_jw_j\|^2\}\\\end{split}\\\text{subject to:}\\w_i^Tw_i=1\end{aligned}\end{align} \]Can also be used with more than two views
- Parameters
latent_dims (int, optional) – Number of latent dimensions to use, by default 1
scale (bool, optional) – Whether to scale the data, by default True
centre (bool, optional) – Whether to centre the data, by default True
copy_data (bool, optional) – Whether to copy the data, by default True
random_state (int, optional) – Random state for reproducibility, by default None
max_iter (int, optional) – Maximum number of iterations, by default 100
initialization (Union[str, callable], optional) – Initialization method, by default “random”
tol (float, optional) – Tolerance for convergence, by default 1e-9
verbose (int, optional) – Verbosity level, by default 0
Examples
>>> from cca_zoo.models import PLS >>> import numpy as np >>> rng=np.random.RandomState(0) >>> X1 = rng.random((10,5)) >>> X2 = rng.random((10,5)) >>> model = PLS_ALS(random_state=0) >>> model.fit((X1,X2)).score((X1,X2)) array([0.81796854])
- Parameters
latent_dims (int, optional) – Number of latent dimensions to fit. Default is 1.
scale (bool, optional) – Whether to scale the data to unit variance. Default is True.
centre (bool, optional) – Whether to centre the data. Default is True.
copy_data (bool, optional) – Whether to copy the data. Default is True.
accept_sparse (bool, optional) – Whether to accept sparse data. Default is False.
random_state (int, RandomState instance or None, optional (default=None)) – Pass an int for reproducible output across multiple function calls.
- fit(views: Iterable[ndarray], y=None, **kwargs)
Fits the model to the given data
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
y (None) –
kwargs (any additional keyword arguments required by the given model) –
- Returns
self
- Return type
- fit_transform(views: Iterable[ndarray], **kwargs)
Fits the model to the given data and returns the transformed views
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –
- Returns
transformed_views
- Return type
list of numpy arrays
- get_factor_loadings(views: Iterable[ndarray], normalize=True, **kwargs)
Returns the factor loadings for each view
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
normalize (bool, optional) – Whether to normalize the factor loadings. Default is True.
kwargs (any additional keyword arguments required by the given model) –
- Returns
factor_loadings
- Return type
list of numpy arrays
- get_params(deep=True)
Get parameters for this estimator.
- pairwise_correlations(views: Iterable[ndarray], **kwargs)
Returns the pairwise correlations between the views in each dimension
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –
- Returns
pairwise_correlations
- Return type
numpy array of shape (n_views, n_views, latent_dims)
- score(views: Iterable[ndarray], y=None, **kwargs)
Returns the average pairwise correlation between the views
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
y (None) –
kwargs (any additional keyword arguments required by the given model) –
- Returns
score
- Return type
- set_params(**params)
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters
**params (dict) – Estimator parameters.
- Returns
self – Estimator instance.
- Return type
estimator instance
- class cca_zoo.models.SCCA_PMD(latent_dims: int = 1, scale: bool = True, centre=True, copy_data=True, random_state=None, deflation='cca', tau: Optional[Union[Iterable[float], float]] = None, max_iter: int = 100, initialization: Union[str, callable] = 'pls', tol: float = 0.001, positive: Optional[Union[Iterable[bool], bool]] = None, verbose=0)[source]
Bases:
PLS_ALS
Fits a Sparse CCA (Penalized Matrix Decomposition) model for 2 or more views.
\[ \begin{align}\begin{aligned}\begin{split}w_{opt}=\underset{w}{\mathrm{argmax}}\{ w_1^TX_1^TX_2w_2 \}\\\end{split}\\\text{subject to:}\\w_i^Tw_i=1\\\|w_i\|<=c_i\end{aligned}\end{align} \]- Parameters
latent_dims (int, default=1) – Number of latent dimensions to use in the model.
scale (bool, default=True) – Whether to scale the data to unit variance.
centre (bool, default=True) – Whether to centre the data to have zero mean.
copy_data (bool, default=True) – Whether to copy the data or overwrite it.
random_state (int, default=None) – Random seed for initialisation.
deflation (str, default="cca") – Deflation method to use. Options are “cca” and “pmd”.
tau (float or list of floats, default=None) – Regularisation parameter. If a single float is given, the same value is used for all views. If a list of floats is given, the values are used for each view respectively. If None, the value is set to 1.
max_iter (int, default=100) – Maximum number of iterations to run.
initialization (str or callable, default="pls") – Method to use for initialisation. Options are “pls” and “random”.
tol (float, default=1e-9) – Tolerance for convergence.
positive (bool or list of bools, default=False) – Whether to constrain the weights to be positive.
verbose (int, default=0) – Verbosity level. 0 is silent, 1 prints progress.
References
Witten, Daniela M., Robert Tibshirani, and Trevor Hastie. “A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis.” Biostatistics 10.3 (2009): 515-534.
Examples
>>> from cca_zoo.models import SCCA_PMD >>> import numpy as np >>> rng=np.random.RandomState(0) >>> X1 = rng.random((10,5)) >>> X2 = rng.random((10,5)) >>> model = SCCA_PMD(tau=[1,1],random_state=0) >>> model.fit((X1,X2)).score((X1,X2)) array([0.81796873])
- Parameters
latent_dims (int, optional) – Number of latent dimensions to fit. Default is 1.
scale (bool, optional) – Whether to scale the data to unit variance. Default is True.
centre (bool, optional) – Whether to centre the data. Default is True.
copy_data (bool, optional) – Whether to copy the data. Default is True.
accept_sparse (bool, optional) – Whether to accept sparse data. Default is False.
random_state (int, RandomState instance or None, optional (default=None)) – Pass an int for reproducible output across multiple function calls.
- fit(views: Iterable[ndarray], y=None, **kwargs)
Fits the model to the given data
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
y (None) –
kwargs (any additional keyword arguments required by the given model) –
- Returns
self
- Return type
- fit_transform(views: Iterable[ndarray], **kwargs)
Fits the model to the given data and returns the transformed views
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –
- Returns
transformed_views
- Return type
list of numpy arrays
- get_factor_loadings(views: Iterable[ndarray], normalize=True, **kwargs)
Returns the factor loadings for each view
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
normalize (bool, optional) – Whether to normalize the factor loadings. Default is True.
kwargs (any additional keyword arguments required by the given model) –
- Returns
factor_loadings
- Return type
list of numpy arrays
- get_params(deep=True)
Get parameters for this estimator.
- pairwise_correlations(views: Iterable[ndarray], **kwargs)
Returns the pairwise correlations between the views in each dimension
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –
- Returns
pairwise_correlations
- Return type
numpy array of shape (n_views, n_views, latent_dims)
- score(views: Iterable[ndarray], y=None, **kwargs)
Returns the average pairwise correlation between the views
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
y (None) –
kwargs (any additional keyword arguments required by the given model) –
- Returns
score
- Return type
- set_params(**params)
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters
**params (dict) – Estimator parameters.
- Returns
self – Estimator instance.
- Return type
estimator instance
- class cca_zoo.models.ElasticCCA(latent_dims: int = 1, scale: bool = True, centre=True, copy_data=True, random_state=None, deflation='cca', max_iter: int = 100, initialization: Union[str, callable] = 'pls', tol: float = 0.001, alpha: Optional[Union[Iterable[float], float]] = None, l1_ratio: Optional[Union[Iterable[float], float]] = None, stochastic=False, positive: Optional[Union[Iterable[bool], bool]] = None, verbose=0)[source]
Bases:
_BaseIterative
Fits an elastic CCA by iterating elastic net regressions to two or more views of data.
By default, ElasticCCA uses CCA with an auxiliary variable target i.e. MAXVAR configuration
\[ \begin{align}\begin{aligned}\begin{split}w_{opt}, t_{opt}=\underset{w,t}{\mathrm{argmax}}\{\sum_i \|X_iw_i-t\|^2 + c\|w_i\|^2_2 + \text{l1_ratio}\|w_i\|_1\}\\\end{split}\\\text{subject to:}\\t^Tt=n\end{aligned}\end{align} \]But we can force it to attempt to use the SUMCOR form which will approximate a solution to the problem:
\[ \begin{align}\begin{aligned}\begin{split}w_{opt}=\underset{w}{\mathrm{argmax}}\{\sum_i\sum_{j\neq i} \|X_iw_i-X_jw_j\|^2 + c\|w_i\|^2_2 + \text{l1_ratio}\|w_i\|_1\}\\\end{split}\\\text{subject to:}\\w_i^TX_i^TX_iw_i=n\end{aligned}\end{align} \]- Parameters
latent_dims (int, default=1) – Number of latent dimensions to use
scale (bool, default=True) – Whether to scale the data to unit variance
centre (bool, default=True) – Whether to centre the data to zero mean
copy_data (bool, default=True) – Whether to copy the data or overwrite it
random_state (int, default=None) – Random seed for initialization
deflation (str, default="cca") – Whether to use CCA or PLS deflation
max_iter (int, default=100) – Maximum number of iterations to run
initialization (str or callable, default="pls") – How to initialize the weights. Can be “pls” or “random” or a callable
tol (float, default=1e-3) – Tolerance for convergence
alpha (float or list of floats, default=None) – Regularisation parameter for the L2 penalty. If None, defaults to 1.0
l1_ratio (float or list of floats, default=None) – Regularisation parameter for the L1 penalty. If None, defaults to 0.0
stochastic (bool, default=False) – Whether to use stochastic gradient descent
positive (bool or list of bools, default=None) – Whether to use non-negative constraints
verbose (int, default=0) – Verbosity level
Examples
>>> from cca_zoo.models import ElasticCCA >>> import numpy as np >>> rng=np.random.RandomState(0) >>> X1 = rng.random((10,5)) >>> X2 = rng.random((10,5)) >>> model = ElasticCCA(c=[1e-1,1e-1],l1_ratio=[0.5,0.5], random_state=0) >>> model.fit((X1,X2)).score((X1,X2)) array([0.9316638])
- Parameters
latent_dims (int, optional) – Number of latent dimensions to fit. Default is 1.
scale (bool, optional) – Whether to scale the data to unit variance. Default is True.
centre (bool, optional) – Whether to centre the data. Default is True.
copy_data (bool, optional) – Whether to copy the data. Default is True.
accept_sparse (bool, optional) – Whether to accept sparse data. Default is False.
random_state (int, RandomState instance or None, optional (default=None)) – Pass an int for reproducible output across multiple function calls.
- fit(views: Iterable[ndarray], y=None, **kwargs)
Fits the model to the given data
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
y (None) –
kwargs (any additional keyword arguments required by the given model) –
- Returns
self
- Return type
- fit_transform(views: Iterable[ndarray], **kwargs)
Fits the model to the given data and returns the transformed views
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –
- Returns
transformed_views
- Return type
list of numpy arrays
- get_factor_loadings(views: Iterable[ndarray], normalize=True, **kwargs)
Returns the factor loadings for each view
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
normalize (bool, optional) – Whether to normalize the factor loadings. Default is True.
kwargs (any additional keyword arguments required by the given model) –
- Returns
factor_loadings
- Return type
list of numpy arrays
- get_params(deep=True)
Get parameters for this estimator.
- pairwise_correlations(views: Iterable[ndarray], **kwargs)
Returns the pairwise correlations between the views in each dimension
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –
- Returns
pairwise_correlations
- Return type
numpy array of shape (n_views, n_views, latent_dims)
- score(views: Iterable[ndarray], y=None, **kwargs)
Returns the average pairwise correlation between the views
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
y (None) –
kwargs (any additional keyword arguments required by the given model) –
- Returns
score
- Return type
- set_params(**params)
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters
**params (dict) – Estimator parameters.
- Returns
self – Estimator instance.
- Return type
estimator instance
- class cca_zoo.models.SCCA_Parkhomenko(latent_dims: int = 1, scale: bool = True, centre=True, copy_data=True, random_state=None, deflation='cca', tau: Optional[Union[Iterable[float], float]] = None, max_iter: int = 100, initialization: Union[str, callable] = 'pls', tol: float = 0.001, verbose=0)[source]
Bases:
_BaseIterative
Fits a sparse CCA (penalized CCA) model for 2 or more views.
\[ \begin{align}\begin{aligned}\begin{split}w_{opt}=\underset{w}{\mathrm{argmax}}\{ w_1^TX_1^TX_2w_2 \} + c_i\|w_i\|\\\end{split}\\\text{subject to:}\\w_i^Tw_i=1\end{aligned}\end{align} \]References
Parkhomenko, Elena, David Tritchler, and Joseph Beyene. “Sparse canonical correlation analysis with application to genomic data integration.” Statistical applications in genetics and molecular biology 8.1 (2009).
Examples
>>> from cca_zoo.models import SCCA_Parkhomenko >>> import numpy as np >>> rng=np.random.RandomState(0) >>> X1 = rng.random((10,5)) >>> X2 = rng.random((10,5)) >>> model = SCCA_Parkhomenko(tau=[0.001,0.001],random_state=0) >>> model.fit((X1,X2)).score((X1,X2)) array([0.81803527])
- Parameters
latent_dims (int, optional) – Number of latent dimensions to fit. Default is 1.
scale (bool, optional) – Whether to scale the data to unit variance. Default is True.
centre (bool, optional) – Whether to centre the data. Default is True.
copy_data (bool, optional) – Whether to copy the data. Default is True.
accept_sparse (bool, optional) – Whether to accept sparse data. Default is False.
random_state (int, RandomState instance or None, optional (default=None)) – Pass an int for reproducible output across multiple function calls.
- fit(views: Iterable[ndarray], y=None, **kwargs)
Fits the model to the given data
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
y (None) –
kwargs (any additional keyword arguments required by the given model) –
- Returns
self
- Return type
- fit_transform(views: Iterable[ndarray], **kwargs)
Fits the model to the given data and returns the transformed views
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –
- Returns
transformed_views
- Return type
list of numpy arrays
- get_factor_loadings(views: Iterable[ndarray], normalize=True, **kwargs)
Returns the factor loadings for each view
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
normalize (bool, optional) – Whether to normalize the factor loadings. Default is True.
kwargs (any additional keyword arguments required by the given model) –
- Returns
factor_loadings
- Return type
list of numpy arrays
- get_params(deep=True)
Get parameters for this estimator.
- pairwise_correlations(views: Iterable[ndarray], **kwargs)
Returns the pairwise correlations between the views in each dimension
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –
- Returns
pairwise_correlations
- Return type
numpy array of shape (n_views, n_views, latent_dims)
- score(views: Iterable[ndarray], y=None, **kwargs)
Returns the average pairwise correlation between the views
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
y (None) –
kwargs (any additional keyword arguments required by the given model) –
- Returns
score
- Return type
- set_params(**params)
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters
**params (dict) – Estimator parameters.
- Returns
self – Estimator instance.
- Return type
estimator instance
- class cca_zoo.models.SCCA_IPLS(latent_dims: int = 1, scale: bool = True, centre=True, copy_data=True, random_state=None, deflation='cca', tau: Optional[Union[Iterable[float], float]] = None, max_iter: int = 100, initialization: Union[str, callable] = 'pls', tol: float = 0.001, stochastic=False, positive: Optional[Union[Iterable[bool], bool]] = None, verbose=0)[source]
Bases:
ElasticCCA
Fits a sparse CCA model by _iterative rescaled lasso regression. Implemented by ElasticCCA with l1 ratio=1
The optimisation is given by:
- Maths
\[ \begin{align}\begin{aligned}\begin{split}w_{opt}=\underset{w}{\mathrm{argmax}}\{\sum_i\sum_{j\neq i} \|X_iw_i-X_jw_j\|^2 + \text{l1_ratio}\|w_i\|_1\}\\\end{split}\\\text{subject to:}\\w_i^TX_i^TX_iw_i=n\end{aligned}\end{align} \]- Citation
Mai, Qing, and Xin Zhang. “An _iterative penalized least squares approach to sparse canonical correlation analysis.” Biometrics 75.3 (2019): 734-744.
- Example
>>> from cca_zoo.models import SCCA_IPLS >>> import numpy as np >>> rng=np.random.RandomState(0) >>> X1 = rng.random((10,5)) >>> X2 = rng.random((10,5)) >>> model = SCCA_IPLS(c=[0.001,0.001], random_state=0) >>> model.fit((X1,X2)).score((X1,X2)) array([0.99998761])
- Parameters
latent_dims (int, optional) – Number of latent dimensions to fit. Default is 1.
scale (bool, optional) – Whether to scale the data to unit variance. Default is True.
centre (bool, optional) – Whether to centre the data. Default is True.
copy_data (bool, optional) – Whether to copy the data. Default is True.
accept_sparse (bool, optional) – Whether to accept sparse data. Default is False.
random_state (int, RandomState instance or None, optional (default=None)) – Pass an int for reproducible output across multiple function calls.
- fit(views: Iterable[ndarray], y=None, **kwargs)
Fits the model to the given data
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
y (None) –
kwargs (any additional keyword arguments required by the given model) –
- Returns
self
- Return type
- fit_transform(views: Iterable[ndarray], **kwargs)
Fits the model to the given data and returns the transformed views
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –
- Returns
transformed_views
- Return type
list of numpy arrays
- get_factor_loadings(views: Iterable[ndarray], normalize=True, **kwargs)
Returns the factor loadings for each view
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
normalize (bool, optional) – Whether to normalize the factor loadings. Default is True.
kwargs (any additional keyword arguments required by the given model) –
- Returns
factor_loadings
- Return type
list of numpy arrays
- get_params(deep=True)
Get parameters for this estimator.
- pairwise_correlations(views: Iterable[ndarray], **kwargs)
Returns the pairwise correlations between the views in each dimension
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –
- Returns
pairwise_correlations
- Return type
numpy array of shape (n_views, n_views, latent_dims)
- score(views: Iterable[ndarray], y=None, **kwargs)
Returns the average pairwise correlation between the views
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
y (None) –
kwargs (any additional keyword arguments required by the given model) –
- Returns
score
- Return type
- set_params(**params)
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters
**params (dict) – Estimator parameters.
- Returns
self – Estimator instance.
- Return type
estimator instance
- class cca_zoo.models.SCCA_ADMM(latent_dims: int = 1, scale: bool = True, centre=True, copy_data=True, random_state=None, deflation='cca', tau: Optional[Union[Iterable[float], float]] = None, mu: Optional[Union[Iterable[float], float]] = None, lam: Optional[Union[Iterable[float], float]] = None, eta: Optional[Union[Iterable[float], float]] = None, max_iter: int = 100, initialization: Union[str, callable] = 'pls', tol: float = 0.001, verbose=0)[source]
Bases:
_BaseIterative
Fits a sparse CCA model by alternating ADMM for two or more views.
\[ \begin{align}\begin{aligned}\begin{split}w_{opt}=\underset{w}{\mathrm{argmax}}\{\sum_i\sum_{j\neq i} \|X_iw_i-X_jw_j\|^2 + \|w_i\|_1\}\\\end{split}\\\text{subject to:}\\w_i^TX_i^TX_iw_i=1\end{aligned}\end{align} \]- Parameters
latent_dims (int, default=1) – Number of latent dimensions to use in the model.
scale (bool, default=True) – Whether to scale the data to unit variance.
centre (bool, default=True) – Whether to centre the data to have zero mean.
copy_data (bool, default=True) – Whether to copy the data or overwrite it.
random_state (int, default=None) – Random seed for initialisation.
deflation (str, default="cca") – Deflation method to use. Options are “cca” and “pls”.
tau (float or list of floats, default=None) – Regularisation parameter. If a single float is given, the same value is used for all views. If a list of floats is given, the values are used for each view.
mu (float or list of floats, default=None) – Regularisation parameter. If a single float is given, the same value is used for all views. If a list of floats is given, the values are used for each view.
lam (float or list of floats, default=None) – Regularisation parameter. If a single float is given, the same value is used for all views. If a list of floats is given, the values are used for each view.
eta (float or list of floats, default=None) – Regularisation parameter. If a single float is given, the same value is used for all views. If a list of floats is given, the values are used for each view.
max_iter (int, default=100) – Maximum number of iterations to run.
initialization (str or callable, default="pls") – Method to use for initialisation. Options are “pls” and “random”.
tol (float, default=1e-9) – Tolerance for convergence.
verbose (int, default=0) – Verbosity level. If 0, no output is printed. If 1, output is printed every 10 iterations.
References
Suo, Xiaotong, et al. “Sparse canonical correlation analysis.” arXiv preprint arXiv:1705.10865 (2017).
Examples
>>> from cca_zoo.models import SCCA_ADMM >>> import numpy as np >>> rng=np.random.RandomState(0) >>> X1 = rng.random((10,5)) >>> X2 = rng.random((10,5)) >>> model = SCCA_ADMM(random_state=0,tau=[1e-1,1e-1]) >>> model.fit((X1,X2)).score((X1,X2)) array([0.84348183])
- Parameters
latent_dims (int, optional) – Number of latent dimensions to fit. Default is 1.
scale (bool, optional) – Whether to scale the data to unit variance. Default is True.
centre (bool, optional) – Whether to centre the data. Default is True.
copy_data (bool, optional) – Whether to copy the data. Default is True.
accept_sparse (bool, optional) – Whether to accept sparse data. Default is False.
random_state (int, RandomState instance or None, optional (default=None)) – Pass an int for reproducible output across multiple function calls.
- fit(views: Iterable[ndarray], y=None, **kwargs)
Fits the model to the given data
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
y (None) –
kwargs (any additional keyword arguments required by the given model) –
- Returns
self
- Return type
- fit_transform(views: Iterable[ndarray], **kwargs)
Fits the model to the given data and returns the transformed views
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –
- Returns
transformed_views
- Return type
list of numpy arrays
- get_factor_loadings(views: Iterable[ndarray], normalize=True, **kwargs)
Returns the factor loadings for each view
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
normalize (bool, optional) – Whether to normalize the factor loadings. Default is True.
kwargs (any additional keyword arguments required by the given model) –
- Returns
factor_loadings
- Return type
list of numpy arrays
- get_params(deep=True)
Get parameters for this estimator.
- pairwise_correlations(views: Iterable[ndarray], **kwargs)
Returns the pairwise correlations between the views in each dimension
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –
- Returns
pairwise_correlations
- Return type
numpy array of shape (n_views, n_views, latent_dims)
- score(views: Iterable[ndarray], y=None, **kwargs)
Returns the average pairwise correlation between the views
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
y (None) –
kwargs (any additional keyword arguments required by the given model) –
- Returns
score
- Return type
- set_params(**params)
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters
**params (dict) – Estimator parameters.
- Returns
self – Estimator instance.
- Return type
estimator instance
- class cca_zoo.models.SCCA_Span(latent_dims: int = 1, scale: bool = True, centre=True, copy_data=True, max_iter: int = 100, initialization: str = 'uniform', tol: float = 0.001, regularisation='l0', tau: Optional[Union[Iterable[Union[float, int]], float, int]] = None, rank=1, positive: Optional[Union[Iterable[bool], bool]] = None, random_state=None, deflation='cca', verbose=0)[source]
Bases:
_BaseIterative
Fits a Sparse CCA model using SpanCCA.
\[ \begin{align}\begin{aligned}\begin{split}w_{opt}=\underset{w}{\mathrm{argmax}}\{\sum_i\sum_{j\neq i} \|X_iw_i-X_jw_j\|^2 + \text{l1_ratio}\|w_i\|_1\}\\\end{split}\\\text{subject to:}\\w_i^TX_i^TX_iw_i=1\end{aligned}\end{align} \]References
Asteris, Megasthenis, et al. “A simple and provable algorithm for sparse diagonal CCA.” International Conference on Machine Learning. PMLR, 2016.
Examples
>>> from cca_zoo.models import SCCA_Span >>> import numpy as np >>> rng=np.random.RandomState(0) >>> X1 = rng.random((10,5)) >>> X2 = rng.random((10,5)) >>> model = SCCA_Span(regularisation="l0", tau=[2, 2]) >>> model.fit((X1,X2)).score((X1,X2)) array([0.84556666])
- Parameters
latent_dims (int, optional) – Number of latent dimensions to fit. Default is 1.
scale (bool, optional) – Whether to scale the data to unit variance. Default is True.
centre (bool, optional) – Whether to centre the data. Default is True.
copy_data (bool, optional) – Whether to copy the data. Default is True.
accept_sparse (bool, optional) – Whether to accept sparse data. Default is False.
random_state (int, RandomState instance or None, optional (default=None)) – Pass an int for reproducible output across multiple function calls.
- fit(views: Iterable[ndarray], y=None, **kwargs)
Fits the model to the given data
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
y (None) –
kwargs (any additional keyword arguments required by the given model) –
- Returns
self
- Return type
- fit_transform(views: Iterable[ndarray], **kwargs)
Fits the model to the given data and returns the transformed views
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –
- Returns
transformed_views
- Return type
list of numpy arrays
- get_factor_loadings(views: Iterable[ndarray], normalize=True, **kwargs)
Returns the factor loadings for each view
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
normalize (bool, optional) – Whether to normalize the factor loadings. Default is True.
kwargs (any additional keyword arguments required by the given model) –
- Returns
factor_loadings
- Return type
list of numpy arrays
- get_params(deep=True)
Get parameters for this estimator.
- pairwise_correlations(views: Iterable[ndarray], **kwargs)
Returns the pairwise correlations between the views in each dimension
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –
- Returns
pairwise_correlations
- Return type
numpy array of shape (n_views, n_views, latent_dims)
- score(views: Iterable[ndarray], y=None, **kwargs)
Returns the average pairwise correlation between the views
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
y (None) –
kwargs (any additional keyword arguments required by the given model) –
- Returns
score
- Return type
- set_params(**params)
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters
**params (dict) – Estimator parameters.
- Returns
self – Estimator instance.
- Return type
estimator instance
- class cca_zoo.models.SWCCA(scale: bool = True, centre=True, copy_data=True, random_state=None, max_iter: int = 500, initialization: str = 'random', tol: float = 0.001, regularisation='l0', tau: Optional[Union[Iterable[Union[float, int]], float, int]] = None, sample_support=None, positive=False, verbose=0)[source]
Bases:
_BaseIterative
A class used to fit SWCCA model
References
Examples
>>> from cca_zoo.models import SWCCA >>> import numpy as np >>> rng=np.random.RandomState(0) >>> X1 = rng.random((10,5)) >>> X2 = rng.random((10,5)) >>> model = SWCCA(regularisation='l0',tau=[2, 2], sample_support=5, random_state=0) >>> model.fit((X1,X2)).score((X1,X2)) array([0.61620969])
- Parameters
latent_dims (int, optional) – Number of latent dimensions to fit. Default is 1.
scale (bool, optional) – Whether to scale the data to unit variance. Default is True.
centre (bool, optional) – Whether to centre the data. Default is True.
copy_data (bool, optional) – Whether to copy the data. Default is True.
accept_sparse (bool, optional) – Whether to accept sparse data. Default is False.
random_state (int, RandomState instance or None, optional (default=None)) – Pass an int for reproducible output across multiple function calls.
- fit(views: Iterable[ndarray], y=None, **kwargs)
Fits the model to the given data
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
y (None) –
kwargs (any additional keyword arguments required by the given model) –
- Returns
self
- Return type
- fit_transform(views: Iterable[ndarray], **kwargs)
Fits the model to the given data and returns the transformed views
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –
- Returns
transformed_views
- Return type
list of numpy arrays
- get_factor_loadings(views: Iterable[ndarray], normalize=True, **kwargs)
Returns the factor loadings for each view
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
normalize (bool, optional) – Whether to normalize the factor loadings. Default is True.
kwargs (any additional keyword arguments required by the given model) –
- Returns
factor_loadings
- Return type
list of numpy arrays
- get_params(deep=True)
Get parameters for this estimator.
- pairwise_correlations(views: Iterable[ndarray], **kwargs)
Returns the pairwise correlations between the views in each dimension
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –
- Returns
pairwise_correlations
- Return type
numpy array of shape (n_views, n_views, latent_dims)
- score(views: Iterable[ndarray], y=None, **kwargs)
Returns the average pairwise correlation between the views
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
y (None) –
kwargs (any additional keyword arguments required by the given model) –
- Returns
score
- Return type
- set_params(**params)
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters
**params (dict) – Estimator parameters.
- Returns
self – Estimator instance.
- Return type
estimator instance
- class cca_zoo.models.MCCA(latent_dims: int = 1, scale: bool = True, centre=True, copy_data=True, random_state=None, c: Optional[Union[Iterable[float], float]] = None, eps=1e-09)[source]
Bases:
rCCA
A class used to fit MCCA model. For more than 2 views, MCCA optimizes the sum of pairwise correlations.
\[ \begin{align}\begin{aligned}\begin{split}w_{opt}=\underset{w}{\mathrm{argmax}}\{\sum_i\sum_{j\neq i} w_i^TX_i^TX_jw_j \}\\\end{split}\\\text{subject to:}\\(1-c_i)w_i^TX_i^TX_iw_i+c_iw_i^Tw_i=1\end{aligned}\end{align} \]- Parameters
latent_dims (int, optional) – Number of latent dimensions to use, by default 1
scale (bool, optional) – Whether to scale the data, by default True
centre (bool, optional) – Whether to centre the data, by default True
copy_data (bool, optional) – Whether to copy the data, by default True
random_state (int, optional) – Random state, by default None
c (Union[Iterable[float], float], optional) – Regularisation parameter, by default None
eps (float, optional) – Small value to add to the diagonal of the regularisation matrix, by default 1e-9
References
Kettenring, Jon R. “Canonical analysis of several sets of variables.” Biometrika 58.3 (1971): 433-451.
Examples
>>> from cca_zoo.models import MCCA >>> import numpy as np >>> rng=np.random.RandomState(0) >>> X1 = rng.random((10,5)) >>> X2 = rng.random((10,5)) >>> X3 = rng.random((10,5)) >>> model = MCCA() >>> model.fit((X1,X2,X3)).score((X1,X2,X3)) array([0.97200847])
- Parameters
latent_dims (int, optional) – Number of latent dimensions to fit. Default is 1.
scale (bool, optional) – Whether to scale the data to unit variance. Default is True.
centre (bool, optional) – Whether to centre the data. Default is True.
copy_data (bool, optional) – Whether to copy the data. Default is True.
accept_sparse (bool, optional) – Whether to accept sparse data. Default is False.
random_state (int, RandomState instance or None, optional (default=None)) – Pass an int for reproducible output across multiple function calls.
- fit(views: Iterable[ndarray], y=None, **kwargs)
Fits the model to the given data
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
y (None) –
kwargs (any additional keyword arguments required by the given model) –
- Returns
self
- Return type
- fit_transform(views: Iterable[ndarray], **kwargs)
Fits the model to the given data and returns the transformed views
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –
- Returns
transformed_views
- Return type
list of numpy arrays
- get_factor_loadings(views: Iterable[ndarray], normalize=True, **kwargs)
Returns the factor loadings for each view
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
normalize (bool, optional) – Whether to normalize the factor loadings. Default is True.
kwargs (any additional keyword arguments required by the given model) –
- Returns
factor_loadings
- Return type
list of numpy arrays
- get_params(deep=True)
Get parameters for this estimator.
- pairwise_correlations(views: Iterable[ndarray], **kwargs)
Returns the pairwise correlations between the views in each dimension
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –
- Returns
pairwise_correlations
- Return type
numpy array of shape (n_views, n_views, latent_dims)
- score(views: Iterable[ndarray], y=None, **kwargs)
Returns the average pairwise correlation between the views
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
y (None) –
kwargs (any additional keyword arguments required by the given model) –
- Returns
score
- Return type
- set_params(**params)
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters
**params (dict) – Estimator parameters.
- Returns
self – Estimator instance.
- Return type
estimator instance
- class cca_zoo.models.KCCA(latent_dims: int = 1, scale: bool = True, centre=True, copy_data=True, random_state=None, c: Optional[Union[Iterable[float], float]] = None, eps=0.001, kernel: Optional[Iterable[Union[float, callable]]] = None, gamma: Optional[Iterable[float]] = None, degree: Optional[Iterable[float]] = None, coef0: Optional[Iterable[float]] = None, kernel_params: Optional[Iterable[dict]] = None)[source]
Bases:
MCCA
A class used to fit KCCA model.
\[ \begin{align}\begin{aligned}\begin{split}\alpha_{opt}=\underset{\alpha}{\mathrm{argmax}}\{\sum_i\sum_{j\neq i} \alpha_i^TK_i^TK_j\alpha_j \}\\\end{split}\\\text{subject to:}\\c_i\alpha_i^TK_i\alpha_i + (1-c_i)\alpha_i^TK_i^TK_i\alpha_i=1\end{aligned}\end{align} \]Examples
>>> from cca_zoo.models import KCCA >>> import numpy as np >>> rng=np.random.RandomState(0) >>> X1 = rng.random((10,5)) >>> X2 = rng.random((10,5)) >>> X3 = rng.random((10,5)) >>> model = KCCA() >>> model.fit((X1,X2,X3)).score((X1,X2,X3)) array([0.96893666])
- Parameters
latent_dims (int, optional) – Number of latent dimensions to fit. Default is 1.
scale (bool, optional) – Whether to scale the data to unit variance. Default is True.
centre (bool, optional) – Whether to centre the data. Default is True.
copy_data (bool, optional) – Whether to copy the data. Default is True.
accept_sparse (bool, optional) – Whether to accept sparse data. Default is False.
random_state (int, RandomState instance or None, optional (default=None)) – Pass an int for reproducible output across multiple function calls.
- transform(views: ndarray, **kwargs)[source]
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –
- Returns
transformed_views
- Return type
list of numpy arrays
- fit(views: Iterable[ndarray], y=None, **kwargs)
Fits the model to the given data
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
y (None) –
kwargs (any additional keyword arguments required by the given model) –
- Returns
self
- Return type
- fit_transform(views: Iterable[ndarray], **kwargs)
Fits the model to the given data and returns the transformed views
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –
- Returns
transformed_views
- Return type
list of numpy arrays
- get_factor_loadings(views: Iterable[ndarray], normalize=True, **kwargs)
Returns the factor loadings for each view
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
normalize (bool, optional) – Whether to normalize the factor loadings. Default is True.
kwargs (any additional keyword arguments required by the given model) –
- Returns
factor_loadings
- Return type
list of numpy arrays
- get_params(deep=True)
Get parameters for this estimator.
- pairwise_correlations(views: Iterable[ndarray], **kwargs)
Returns the pairwise correlations between the views in each dimension
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –
- Returns
pairwise_correlations
- Return type
numpy array of shape (n_views, n_views, latent_dims)
- score(views: Iterable[ndarray], y=None, **kwargs)
Returns the average pairwise correlation between the views
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
y (None) –
kwargs (any additional keyword arguments required by the given model) –
- Returns
score
- Return type
- set_params(**params)
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters
**params (dict) – Estimator parameters.
- Returns
self – Estimator instance.
- Return type
estimator instance
- class cca_zoo.models.NCCA(latent_dims: int = 1, scale=True, centre=True, copy_data=True, accept_sparse=False, random_state: Optional[Union[int, RandomState]] = None, nearest_neighbors=None, gamma: Optional[Iterable[float]] = None)[source]
Bases:
_BaseCCA
A class used to fit nonparametric (NCCA) model.
References
Michaeli, Tomer, Weiran Wang, and Karen Livescu. “Nonparametric canonical correlation analysis.” International conference on machine learning. PMLR, 2016.
Example
>>> from cca_zoo.models import NCCA >>> X1 = np.random.rand(10,5) >>> X2 = np.random.rand(10,5) >>> model = NCCA() >>> model.fit((X1,X2)).score((X1,X2)) array([1.])
- Parameters
latent_dims (int, optional) – Number of latent dimensions to fit. Default is 1.
scale (bool, optional) – Whether to scale the data to unit variance. Default is True.
centre (bool, optional) – Whether to centre the data. Default is True.
copy_data (bool, optional) – Whether to copy the data. Default is True.
accept_sparse (bool, optional) – Whether to accept sparse data. Default is False.
random_state (int, RandomState instance or None, optional (default=None)) – Pass an int for reproducible output across multiple function calls.
- fit(views: Iterable[ndarray], y=None, **kwargs)[source]
Fits the model to the given data
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
y (None) –
kwargs (any additional keyword arguments required by the given model) –
- Returns
self
- Return type
- transform(views: Iterable[ndarray], **kwargs)[source]
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –
- Returns
transformed_views
- Return type
list of numpy arrays
- fit_transform(views: Iterable[ndarray], **kwargs)
Fits the model to the given data and returns the transformed views
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –
- Returns
transformed_views
- Return type
list of numpy arrays
- get_factor_loadings(views: Iterable[ndarray], normalize=True, **kwargs)
Returns the factor loadings for each view
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
normalize (bool, optional) – Whether to normalize the factor loadings. Default is True.
kwargs (any additional keyword arguments required by the given model) –
- Returns
factor_loadings
- Return type
list of numpy arrays
- get_params(deep=True)
Get parameters for this estimator.
- pairwise_correlations(views: Iterable[ndarray], **kwargs)
Returns the pairwise correlations between the views in each dimension
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –
- Returns
pairwise_correlations
- Return type
numpy array of shape (n_views, n_views, latent_dims)
- score(views: Iterable[ndarray], y=None, **kwargs)
Returns the average pairwise correlation between the views
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
y (None) –
kwargs (any additional keyword arguments required by the given model) –
- Returns
score
- Return type
- set_params(**params)
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters
**params (dict) – Estimator parameters.
- Returns
self – Estimator instance.
- Return type
estimator instance
- class cca_zoo.models.PartialCCA(latent_dims: int = 1, scale: bool = True, centre=True, copy_data=True, random_state=None, c: Optional[Union[Iterable[float], float]] = None)[source]
Bases:
MCCA
A class used to fit a partial cca model. The key difference between this and a vanilla CCA or MCCA is that the canonical score vectors must be orthogonal to the supplied confounding variables.
- Parameters
latent_dims (int, optional) – The number of latent dimensions to use, by default 1
scale (bool, optional) – Whether to scale the data, by default True
centre (bool, optional) – Whether to centre the data, by default True
copy_data (bool, optional) – Whether to copy the data, by default True
random_state (int, optional) – The random state to use, by default None
c (Union[Iterable[float], float], optional) – The regularization parameter, by default None
References
Rao, B. Raja. “Partial canonical correlations.” Trabajos de estadistica y de investigación operativa 20.2-3 (1969): 211-219.
Example
>>> from cca_zoo.models import PartialCCA >>> X1 = np.random.rand(10,5) >>> X2 = np.random.rand(10,5) >>> partials = np.random.rand(10,3) >>> model = PartialCCA() >>> model.fit((X1,X2),partials=partials).score((X1,X2)) array([0.99993046])
- Parameters
latent_dims (int, optional) – Number of latent dimensions to fit. Default is 1.
scale (bool, optional) – Whether to scale the data to unit variance. Default is True.
centre (bool, optional) – Whether to centre the data. Default is True.
copy_data (bool, optional) – Whether to copy the data. Default is True.
accept_sparse (bool, optional) – Whether to accept sparse data. Default is False.
random_state (int, RandomState instance or None, optional (default=None)) – Pass an int for reproducible output across multiple function calls.
- fit(views: Iterable[ndarray], y=None, partials=None, **kwargs)[source]
Fits the model to the given data
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
y (None) –
kwargs (any additional keyword arguments required by the given model) –
- Returns
self
- Return type
- transform(views: Iterable[ndarray], partials=None, **kwargs)[source]
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –
- Returns
transformed_views
- Return type
list of numpy arrays
- fit_transform(views: Iterable[ndarray], **kwargs)
Fits the model to the given data and returns the transformed views
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –
- Returns
transformed_views
- Return type
list of numpy arrays
- get_factor_loadings(views: Iterable[ndarray], normalize=True, **kwargs)
Returns the factor loadings for each view
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
normalize (bool, optional) – Whether to normalize the factor loadings. Default is True.
kwargs (any additional keyword arguments required by the given model) –
- Returns
factor_loadings
- Return type
list of numpy arrays
- get_params(deep=True)
Get parameters for this estimator.
- pairwise_correlations(views: Iterable[ndarray], **kwargs)
Returns the pairwise correlations between the views in each dimension
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –
- Returns
pairwise_correlations
- Return type
numpy array of shape (n_views, n_views, latent_dims)
- score(views: Iterable[ndarray], y=None, **kwargs)
Returns the average pairwise correlation between the views
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
y (None) –
kwargs (any additional keyword arguments required by the given model) –
- Returns
score
- Return type
- set_params(**params)
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters
**params (dict) – Estimator parameters.
- Returns
self – Estimator instance.
- Return type
estimator instance
- class cca_zoo.models.rCCA(latent_dims: int = 1, scale: bool = True, centre=True, copy_data=True, random_state=None, c: Optional[Union[Iterable[float], float]] = None, accept_sparse=None)[source]
Bases:
_BaseCCA
A class used to fit Regularised CCA (canonical ridge) model. Uses PCA to perform the optimization efficiently for high dimensional data.
\[ \begin{align}\begin{aligned}\begin{split}w_{opt}=\underset{w}{\mathrm{argmax}}\{ w_1^TX_1^TX_2w_2 \}\\\end{split}\\\text{subject to:}\\(1-c_1)w_1^TX_1^TX_1w_1+c_1w_1^Tw_1=n\\(1-c_2)w_2^TX_2^TX_2w_2+c_2w_2^Tw_2=n\end{aligned}\end{align} \]- Parameters
latent_dims (int, optional) – Number of latent dimensions to use, by default 1
scale (bool, optional) – Whether to scale the data, by default True
centre (bool, optional) – Whether to centre the data, by default True
copy_data (bool, optional) – Whether to copy the data, by default True
random_state (int, optional) – Random state, by default None
c (Union[Iterable[float], float], optional) – Regularisation parameter, by default None
accept_sparse (Union[bool, str], optional) – Whether to accept sparse data, by default None
References
Vinod, Hrishikesh D. “Canonical ridge and econometrics of joint production.” Journal of econometrics 4.2 (1976): 147-166.
Example
>>> from cca_zoo.models import rCCA >>> import numpy as np >>> rng=np.random.RandomState(0) >>> X1 = rng.random((10,5)) >>> X2 = rng.random((10,5)) >>> model = rCCA(c=[0.1,0.1]) >>> model.fit((X1,X2)).score((X1,X2)) array([0.95222128])
- Parameters
latent_dims (int, optional) – Number of latent dimensions to fit. Default is 1.
scale (bool, optional) – Whether to scale the data to unit variance. Default is True.
centre (bool, optional) – Whether to centre the data. Default is True.
copy_data (bool, optional) – Whether to copy the data. Default is True.
accept_sparse (bool, optional) – Whether to accept sparse data. Default is False.
random_state (int, RandomState instance or None, optional (default=None)) – Pass an int for reproducible output across multiple function calls.
- fit(views: Iterable[ndarray], y=None, **kwargs)[source]
Fits the model to the given data
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
y (None) –
kwargs (any additional keyword arguments required by the given model) –
- Returns
self
- Return type
- fit_transform(views: Iterable[ndarray], **kwargs)
Fits the model to the given data and returns the transformed views
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –
- Returns
transformed_views
- Return type
list of numpy arrays
- get_factor_loadings(views: Iterable[ndarray], normalize=True, **kwargs)
Returns the factor loadings for each view
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
normalize (bool, optional) – Whether to normalize the factor loadings. Default is True.
kwargs (any additional keyword arguments required by the given model) –
- Returns
factor_loadings
- Return type
list of numpy arrays
- get_params(deep=True)
Get parameters for this estimator.
- pairwise_correlations(views: Iterable[ndarray], **kwargs)
Returns the pairwise correlations between the views in each dimension
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –
- Returns
pairwise_correlations
- Return type
numpy array of shape (n_views, n_views, latent_dims)
- score(views: Iterable[ndarray], y=None, **kwargs)
Returns the average pairwise correlation between the views
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
y (None) –
kwargs (any additional keyword arguments required by the given model) –
- Returns
score
- Return type
- set_params(**params)
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters
**params (dict) – Estimator parameters.
- Returns
self – Estimator instance.
- Return type
estimator instance
- class cca_zoo.models.CCA(latent_dims: int = 1, scale: bool = True, centre=True, copy_data=True, random_state=None)[source]
Bases:
rCCA
A class used to fit a simple CCA model
\[ \begin{align}\begin{aligned}\begin{split}w_{opt}=\underset{w}{\mathrm{argmax}}\{ w_1^TX_1^TX_2w_2 \}\\\end{split}\\\text{subject to:}\\w_1^TX_1^TX_1w_1=n\\w_2^TX_2^TX_2w_2=n\end{aligned}\end{align} \]- Parameters
latent_dims (int, optional) – Number of latent dimensions to use, by default 1
scale (bool, optional) – Whether to scale the data, by default True
centre (bool, optional) – Whether to centre the data, by default True
copy_data (bool, optional) – Whether to copy the data, by default True
random_state (int, optional) – Random state, by default None
accept_sparse (Union[bool, str], optional) – Whether to accept sparse data, by default None
References
Hotelling, Harold. “Relations between two sets of variates.” Breakthroughs in statistics. Springer, New York, NY, 1992. 162-190.
Example
>>> from cca_zoo.models import CCA >>> import numpy as np >>> rng=np.random.RandomState(0) >>> X1 = rng.random((10,5)) >>> X2 = rng.random((10,5)) >>> model = CCA() >>> model.fit((X1,X2)).score((X1,X2)) array([1.])
- Parameters
latent_dims (int, optional) – Number of latent dimensions to fit. Default is 1.
scale (bool, optional) – Whether to scale the data to unit variance. Default is True.
centre (bool, optional) – Whether to centre the data. Default is True.
copy_data (bool, optional) – Whether to copy the data. Default is True.
accept_sparse (bool, optional) – Whether to accept sparse data. Default is False.
random_state (int, RandomState instance or None, optional (default=None)) – Pass an int for reproducible output across multiple function calls.
- fit(views: Iterable[ndarray], y=None, **kwargs)
Fits the model to the given data
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
y (None) –
kwargs (any additional keyword arguments required by the given model) –
- Returns
self
- Return type
- fit_transform(views: Iterable[ndarray], **kwargs)
Fits the model to the given data and returns the transformed views
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –
- Returns
transformed_views
- Return type
list of numpy arrays
- get_factor_loadings(views: Iterable[ndarray], normalize=True, **kwargs)
Returns the factor loadings for each view
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
normalize (bool, optional) – Whether to normalize the factor loadings. Default is True.
kwargs (any additional keyword arguments required by the given model) –
- Returns
factor_loadings
- Return type
list of numpy arrays
- get_params(deep=True)
Get parameters for this estimator.
- pairwise_correlations(views: Iterable[ndarray], **kwargs)
Returns the pairwise correlations between the views in each dimension
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –
- Returns
pairwise_correlations
- Return type
numpy array of shape (n_views, n_views, latent_dims)
- score(views: Iterable[ndarray], y=None, **kwargs)
Returns the average pairwise correlation between the views
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
y (None) –
kwargs (any additional keyword arguments required by the given model) –
- Returns
score
- Return type
- set_params(**params)
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters
**params (dict) – Estimator parameters.
- Returns
self – Estimator instance.
- Return type
estimator instance
- class cca_zoo.models.PLS(latent_dims: int = 1, scale: bool = True, centre=True, copy_data=True, random_state=None)[source]
Bases:
rCCA
A class used to fit a simple PLS model
Implements PLS by inheriting regularised CCA with maximal regularisation
\[ \begin{align}\begin{aligned}\begin{split}w_{opt}=\underset{w}{\mathrm{argmax}}\{ w_1^TX_1^TX_2w_2 \}\\\end{split}\\\text{subject to:}\\w_1^Tw_1=1\\w_2^Tw_2=1\end{aligned}\end{align} \]- Parameters
latent_dims (int, optional) – Number of latent dimensions to use, by default 1
scale (bool, optional) – Whether to scale the data, by default True
centre (bool, optional) – Whether to centre the data, by default True
copy_data (bool, optional) – Whether to copy the data, by default True
random_state (int, optional) – Random state, by default None
accept_sparse (Union[bool, str], optional) – Whether to accept sparse data, by default None
Example
>>> from cca_zoo.models import PLS >>> import numpy as np >>> rng=np.random.RandomState(0) >>> X1 = rng.random((10,5)) >>> X2 = rng.random((10,5)) >>> model = PLS() >>> model.fit((X1,X2)).score((X1,X2)) array([0.81796873])
- Parameters
latent_dims (int, optional) – Number of latent dimensions to fit. Default is 1.
scale (bool, optional) – Whether to scale the data to unit variance. Default is True.
centre (bool, optional) – Whether to centre the data. Default is True.
copy_data (bool, optional) – Whether to copy the data. Default is True.
accept_sparse (bool, optional) – Whether to accept sparse data. Default is False.
random_state (int, RandomState instance or None, optional (default=None)) – Pass an int for reproducible output across multiple function calls.
- fit(views: Iterable[ndarray], y=None, **kwargs)
Fits the model to the given data
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
y (None) –
kwargs (any additional keyword arguments required by the given model) –
- Returns
self
- Return type
- fit_transform(views: Iterable[ndarray], **kwargs)
Fits the model to the given data and returns the transformed views
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –
- Returns
transformed_views
- Return type
list of numpy arrays
- get_factor_loadings(views: Iterable[ndarray], normalize=True, **kwargs)
Returns the factor loadings for each view
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
normalize (bool, optional) – Whether to normalize the factor loadings. Default is True.
kwargs (any additional keyword arguments required by the given model) –
- Returns
factor_loadings
- Return type
list of numpy arrays
- get_params(deep=True)
Get parameters for this estimator.
- pairwise_correlations(views: Iterable[ndarray], **kwargs)
Returns the pairwise correlations between the views in each dimension
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –
- Returns
pairwise_correlations
- Return type
numpy array of shape (n_views, n_views, latent_dims)
- score(views: Iterable[ndarray], y=None, **kwargs)
Returns the average pairwise correlation between the views
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
y (None) –
kwargs (any additional keyword arguments required by the given model) –
- Returns
score
- Return type
- set_params(**params)
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters
**params (dict) – Estimator parameters.
- Returns
self – Estimator instance.
- Return type
estimator instance
- class cca_zoo.models.TCCA(latent_dims: int = 1, scale=True, centre=True, copy_data=True, random_state=None, c: Optional[Union[Iterable[float], float]] = None)[source]
Bases:
_BaseCCA
Fits a Tensor CCA model. Tensor CCA maximises higher order correlations between the views.
\[ \begin{align}\begin{aligned}\begin{split}\alpha_{opt}=\underset{\alpha}{\mathrm{argmax}}\{\sum_i\sum_{j\neq i} \alpha_i^TK_i^TK_j\alpha_j \}\\\end{split}\\\text{subject to:}\\\alpha_i^TK_i^TK_i\alpha_i=1\end{aligned}\end{align} \]- Parameters
latent_dims (int, default=1) – The number of latent dimensions to use
scale (bool, default=True) – Whether to scale the data to unit variance
centre (bool, default=True) – Whether to centre the data
copy_data (bool, default=True) – Whether to copy the data or not
random_state (int, default=None) – The random state to use
c (float or list of floats, default=None) – The regularisation parameter for each view. If None, defaults to 0 for each view.
References
Kim, Tae-Kyun, Shu-Fai Wong, and Roberto Cipolla. “Tensor canonical correlation analysis for action classification.” 2007 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2007 https://github.com/rciszek/mdr_tcca
Examples
>>> from cca_zoo.models import TCCA >>> rng=np.random.RandomState(0) >>> X1 = rng.random((10,5)) >>> X2 = rng.random((10,5)) >>> X3 = rng.random((10,5)) >>> model = TCCA() >>> model._fit((X1,X2,X3)).score((X1,X2,X3)) array([1.14595755])
- Parameters
latent_dims (int, optional) – Number of latent dimensions to fit. Default is 1.
scale (bool, optional) – Whether to scale the data to unit variance. Default is True.
centre (bool, optional) – Whether to centre the data. Default is True.
copy_data (bool, optional) – Whether to copy the data. Default is True.
accept_sparse (bool, optional) – Whether to accept sparse data. Default is False.
random_state (int, RandomState instance or None, optional (default=None)) – Pass an int for reproducible output across multiple function calls.
- fit(views: Iterable[ndarray], y=None, **kwargs)[source]
Fits the model to the given data
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
y (None) –
kwargs (any additional keyword arguments required by the given model) –
- Returns
self
- Return type
- correlations(views: Iterable[ndarray], **kwargs)[source]
Predicts the correlation for the given data using the fit model
- Parameters
views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model
- score(views: Iterable[ndarray], **kwargs)[source]
Returns the higher order correlations in each dimension
- Parameters
views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model
- fit_transform(views: Iterable[ndarray], **kwargs)
Fits the model to the given data and returns the transformed views
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –
- Returns
transformed_views
- Return type
list of numpy arrays
- get_factor_loadings(views: Iterable[ndarray], normalize=True, **kwargs)
Returns the factor loadings for each view
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
normalize (bool, optional) – Whether to normalize the factor loadings. Default is True.
kwargs (any additional keyword arguments required by the given model) –
- Returns
factor_loadings
- Return type
list of numpy arrays
- get_params(deep=True)
Get parameters for this estimator.
- pairwise_correlations(views: Iterable[ndarray], **kwargs)
Returns the pairwise correlations between the views in each dimension
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –
- Returns
pairwise_correlations
- Return type
numpy array of shape (n_views, n_views, latent_dims)
- set_params(**params)
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters
**params (dict) – Estimator parameters.
- Returns
self – Estimator instance.
- Return type
estimator instance
- class cca_zoo.models.KTCCA(latent_dims: int = 1, scale: bool = True, centre=True, copy_data=True, random_state=None, eps=0.001, c: Optional[Union[Iterable[float], float]] = None, kernel: Optional[Iterable[Union[float, callable]]] = None, gamma: Optional[Iterable[float]] = None, degree: Optional[Iterable[float]] = None, coef0: Optional[Iterable[float]] = None, kernel_params: Optional[Iterable[dict]] = None)[source]
Bases:
TCCA
Fits a Kernel Tensor CCA model. Tensor CCA maximises higher order correlations
\[ \begin{align}\begin{aligned}\begin{split}\alpha_{opt}=\underset{\alpha}{\mathrm{argmax}}\{\sum_i\sum_{j\neq i} \alpha_i^TK_i^TK_j\alpha_j \}\\\end{split}\\\text{subject to:}\\\alpha_i^TK_i^TK_i\alpha_i=1\end{aligned}\end{align} \]References
Kim, Tae-Kyun, Shu-Fai Wong, and Roberto Cipolla. “Tensor canonical correlation analysis for action classification.” 2007 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2007
Examples
>>> from cca_zoo.models import KTCCA >>> rng=np.random.RandomState(0) >>> X1 = rng.random((10,5)) >>> X2 = rng.random((10,5)) >>> X3 = rng.random((10,5)) >>> model = KTCCA() >>> model.fit((X1,X2,X3)).score((X1,X2,X3)) array([1.69896269])
- Parameters
latent_dims (int, optional) – Number of latent dimensions to fit. Default is 1.
scale (bool, optional) – Whether to scale the data to unit variance. Default is True.
centre (bool, optional) – Whether to centre the data. Default is True.
copy_data (bool, optional) – Whether to copy the data. Default is True.
accept_sparse (bool, optional) – Whether to accept sparse data. Default is False.
random_state (int, RandomState instance or None, optional (default=None)) – Pass an int for reproducible output across multiple function calls.
- transform(views: ndarray, **kwargs)[source]
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –
- Returns
transformed_views
- Return type
list of numpy arrays
- correlations(views: Iterable[ndarray], **kwargs)
Predicts the correlation for the given data using the fit model
- Parameters
views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model
- fit(views: Iterable[ndarray], y=None, **kwargs)
Fits the model to the given data
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
y (None) –
kwargs (any additional keyword arguments required by the given model) –
- Returns
self
- Return type
- fit_transform(views: Iterable[ndarray], **kwargs)
Fits the model to the given data and returns the transformed views
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –
- Returns
transformed_views
- Return type
list of numpy arrays
- get_factor_loadings(views: Iterable[ndarray], normalize=True, **kwargs)
Returns the factor loadings for each view
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
normalize (bool, optional) – Whether to normalize the factor loadings. Default is True.
kwargs (any additional keyword arguments required by the given model) –
- Returns
factor_loadings
- Return type
list of numpy arrays
- get_params(deep=True)
Get parameters for this estimator.
- pairwise_correlations(views: Iterable[ndarray], **kwargs)
Returns the pairwise correlations between the views in each dimension
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –
- Returns
pairwise_correlations
- Return type
numpy array of shape (n_views, n_views, latent_dims)
- score(views: Iterable[ndarray], **kwargs)
Returns the higher order correlations in each dimension
- Parameters
views – list/tuple of numpy arrays or array likes with the same number of rows (samples)
kwargs – any additional keyword arguments required by the given model
- set_params(**params)
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters
**params (dict) – Estimator parameters.
- Returns
self – Estimator instance.
- Return type
estimator instance
- class cca_zoo.models.PRCCA(latent_dims: int = 1, scale: bool = True, centre=True, copy_data=True, random_state=None, eps=0.001, c=0)[source]
Bases:
MCCA
Partially Regularized Canonical Correlation Analysis
- Parameters
latent_dims (int, optional) – Number of latent dimensions to use, by default 1
scale (bool, optional) – Whether to scale the data, by default True
centre (bool, optional) – Whether to centre the data, by default True
copy_data (bool, optional) – Whether to copy the data, by default True
random_state (int, optional) – Random state for reproducibility, by default None
eps (float, optional) – Tolerance for convergence, by default 1e-3
c (Union[Iterable[float], float], optional) – Regularisation parameter, by default None
References
Tuzhilina, Elena, Leonardo Tozzi, and Trevor Hastie. “Canonical correlation analysis in high dimensions with structured regularization.” Statistical Modelling (2021): 1471082X211041033.
- Parameters
- fit(views: Iterable[ndarray], y=None, idxs=None, **kwargs)[source]
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
y (None) –
idxs (list/tuple of integers indicating which features from each view are the partially regularised features) –
kwargs (any additional keyword arguments required by the given model) –
- fit_transform(views: Iterable[ndarray], **kwargs)
Fits the model to the given data and returns the transformed views
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –
- Returns
transformed_views
- Return type
list of numpy arrays
- get_factor_loadings(views: Iterable[ndarray], normalize=True, **kwargs)
Returns the factor loadings for each view
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
normalize (bool, optional) – Whether to normalize the factor loadings. Default is True.
kwargs (any additional keyword arguments required by the given model) –
- Returns
factor_loadings
- Return type
list of numpy arrays
- get_params(deep=True)
Get parameters for this estimator.
- pairwise_correlations(views: Iterable[ndarray], **kwargs)
Returns the pairwise correlations between the views in each dimension
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –
- Returns
pairwise_correlations
- Return type
numpy array of shape (n_views, n_views, latent_dims)
- score(views: Iterable[ndarray], y=None, **kwargs)
Returns the average pairwise correlation between the views
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
y (None) –
kwargs (any additional keyword arguments required by the given model) –
- Returns
score
- Return type
- set_params(**params)
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters
**params (dict) – Estimator parameters.
- Returns
self – Estimator instance.
- Return type
estimator instance
- class cca_zoo.models.GRCCA(latent_dims: int = 1, scale: bool = True, centre=True, copy_data=True, random_state=None, eps=0.001, c: float = 0, mu: float = 0)[source]
Bases:
MCCA
Grouped Regularized Canonical Correlation Analysis
- Parameters
latent_dims (int, default=1) – Number of latent dimensions to use
scale (bool, default=True) – Whether to scale the data to unit variance
centre (bool, default=True) – Whether to centre the data
copy_data (bool, default=True) – Whether to copy the data
random_state (int, default=None) – Random state for initialisation
eps (float, default=1e-3) – Tolerance for convergence
c (float, default=0) – Regularization parameter for the group means
mu (float, default=0) – Regularization parameter for the group sizes
References
Tuzhilina, Elena, Leonardo Tozzi, and Trevor Hastie. “Canonical correlation analysis in high dimensions with structured regularization.” Statistical Modelling (2021): 1471082X211041033.
- Parameters
latent_dims (int, optional) – Number of latent dimensions to fit. Default is 1.
scale (bool, optional) – Whether to scale the data to unit variance. Default is True.
centre (bool, optional) – Whether to centre the data. Default is True.
copy_data (bool, optional) – Whether to copy the data. Default is True.
accept_sparse (bool, optional) – Whether to accept sparse data. Default is False.
random_state (int, RandomState instance or None, optional (default=None)) – Pass an int for reproducible output across multiple function calls.
- fit(views: Iterable[ndarray], y=None, feature_groups=None, **kwargs)[source]
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
y (None) –
feature_groups (list/tuple of integer numpy arrays or array likes with dimensions (,view shape)) –
kwargs (any additional keyword arguments required by the given model) –
- fit_transform(views: Iterable[ndarray], **kwargs)
Fits the model to the given data and returns the transformed views
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –
- Returns
transformed_views
- Return type
list of numpy arrays
- get_factor_loadings(views: Iterable[ndarray], normalize=True, **kwargs)
Returns the factor loadings for each view
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
normalize (bool, optional) – Whether to normalize the factor loadings. Default is True.
kwargs (any additional keyword arguments required by the given model) –
- Returns
factor_loadings
- Return type
list of numpy arrays
- get_params(deep=True)
Get parameters for this estimator.
- pairwise_correlations(views: Iterable[ndarray], **kwargs)
Returns the pairwise correlations between the views in each dimension
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
kwargs (any additional keyword arguments required by the given model) –
- Returns
pairwise_correlations
- Return type
numpy array of shape (n_views, n_views, latent_dims)
- score(views: Iterable[ndarray], y=None, **kwargs)
Returns the average pairwise correlation between the views
- Parameters
views (list/tuple of numpy arrays or array likes with the same number of rows (samples)) –
y (None) –
kwargs (any additional keyword arguments required by the given model) –
- Returns
score
- Return type
- set_params(**params)
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters
**params (dict) – Estimator parameters.
- Returns
self – Estimator instance.
- Return type
estimator instance