Models¶
Regularized Canonical Correlation Analysis and Partial Least Squares¶
Canonical Correlation Analysis¶
-
class
cca_zoo.models.rcca.
CCA
(latent_dims=1, scale=True, centre=True, copy_data=True, random_state=None)[source]¶ Bases:
cca_zoo.models.rcca.rCCA
A class used to fit a simple CCA model
Implements CCA by inheriting regularised CCA with 0 regularisation
Maths: \[ \begin{align}\begin{aligned}\begin{split}w_{opt}=\underset{w}{\mathrm{argmax}}\{ w_1^TX_1^TX_2w_2 \}\\\end{split}\\\text{subject to:}\\w_1^TX_1^TX_1w_1=1\\w_2^TX_2^TX_2w_2=1\end{aligned}\end{align} \]Citation: Hotelling, Harold. “Relations between two sets of variates.” Breakthroughs in statistics. Springer, New York, NY, 1992. 162-190.
Example: >>> from cca_zoo.models import CCA >>> import numpy as np >>> rng=np.random.RandomState(0) >>> X1 = rng.random((10,5)) >>> X2 = rng.random((10,5)) >>> model = CCA() >>> model.fit((X1,X2)).score((X1,X2)) array([1.])
Constructor for CCA
Parameters: - latent_dims (
int
) – number of latent dimensions to fit - scale (
bool
) – normalize variance in each column before fitting - centre – demean data by column before fitting (and before transforming out of sample
- copy_data – If True, X will be copied; else, it may be overwritten
- random_state – Pass for reproducible output across multiple function calls
-
correlations
(views, y=None, **kwargs)¶ Predicts the correlation for the given data using the fit model
Parameters: - views (
Iterable
[ndarray
]) – list/tuple of numpy arrays or array likes with the same number of rows (samples) - kwargs – any additional keyword arguments required by the given model
Returns: all_corrs: an array of the pairwise correlations (k,k,self.latent_dims) where k is the number of views
Return type: np.ndarray
- views (
-
fit
(views, y=None, **kwargs)¶ Fits a regularised CCA (canonical ridge) model
Parameters: views ( Iterable
[ndarray
]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
-
fit_transform
(views, y=None, **kwargs)¶ Fits and then transforms the training data
Parameters: - views (
Iterable
[ndarray
]) – list/tuple of numpy arrays or array likes with the same number of rows (samples) - kwargs – any additional keyword arguments required by the given model
- views (
-
get_loadings
(views, y=None, **kwargs)¶ Returns the model loadings for each view for the given data
Parameters: - views (
Iterable
[ndarray
]) – list/tuple of numpy arrays or array likes with the same number of rows (samples) - kwargs – any additional keyword arguments required by the given model
- views (
-
score
(views, y=None, **kwargs)¶ Return the coefficient of determination of the prediction.
The coefficient of determination \(R^2\) is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares
((y_true - y_pred)** 2).sum()
and \(v\) is the total sum of squares((y_true - y_true.mean()) ** 2).sum()
. The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.Parameters: - X (array-like of shape (n_samples, n_features)) – Test samples. For some estimators this may be a precomputed
kernel matrix or a list of generic objects instead with shape
(n_samples, n_samples_fitted)
, wheren_samples_fitted
is the number of samples used in the fitting for the estimator. - y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – True values for X.
- sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.
Returns: score – \(R^2\) of
self.predict(X)
wrt. y.Return type: float
Notes
The \(R^2\) score used when calling
score
on a regressor usesmultioutput='uniform_average'
from version 0.23 to keep consistent with default value ofr2_score()
. This influences thescore
method of all the multioutput regressors (except forMultiOutputRegressor
).- X (array-like of shape (n_samples, n_features)) – Test samples. For some estimators this may be a precomputed
kernel matrix or a list of generic objects instead with shape
-
transform
(views, y=None, **kwargs)¶ Transforms data given a fit model
Parameters: - views (
Iterable
[ndarray
]) – numpy arrays with the same number of rows (samples) separated by commas - kwargs – any additional keyword arguments required by the given model
- views (
- latent_dims (
Partial Least Squares¶
-
class
cca_zoo.models.rcca.
PLS
(latent_dims=1, scale=True, centre=True, copy_data=True, random_state=None)[source]¶ Bases:
cca_zoo.models.rcca.rCCA
A class used to fit a simple PLS model
Implements PLS by inheriting regularised CCA with maximal regularisation
Maths: \[ \begin{align}\begin{aligned}\begin{split}w_{opt}=\underset{w}{\mathrm{argmax}}\{ w_1^TX_1^TX_2w_2 \}\\\end{split}\\\text{subject to:}\\w_1^Tw_1=1\\w_2^Tw_2=1\end{aligned}\end{align} \]Example: >>> from cca_zoo.models import PLS >>> import numpy as np >>> rng=np.random.RandomState(0) >>> X1 = rng.random((10,5)) >>> X2 = rng.random((10,5)) >>> model = PLS() >>> model.fit((X1,X2)).score((X1,X2)) array([0.81796873])
Constructor for PLS
Parameters: - latent_dims (
int
) – number of latent dimensions to fit - scale (
bool
) – normalize variance in each column before fitting - centre – demean data by column before fitting (and before transforming out of sample
- copy_data – If True, X will be copied; else, it may be overwritten
- random_state – Pass for reproducible output across multiple function calls
-
correlations
(views, y=None, **kwargs)¶ Predicts the correlation for the given data using the fit model
Parameters: - views (
Iterable
[ndarray
]) – list/tuple of numpy arrays or array likes with the same number of rows (samples) - kwargs – any additional keyword arguments required by the given model
Returns: all_corrs: an array of the pairwise correlations (k,k,self.latent_dims) where k is the number of views
Return type: np.ndarray
- views (
-
fit
(views, y=None, **kwargs)¶ Fits a regularised CCA (canonical ridge) model
Parameters: views ( Iterable
[ndarray
]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
-
fit_transform
(views, y=None, **kwargs)¶ Fits and then transforms the training data
Parameters: - views (
Iterable
[ndarray
]) – list/tuple of numpy arrays or array likes with the same number of rows (samples) - kwargs – any additional keyword arguments required by the given model
- views (
-
get_loadings
(views, y=None, **kwargs)¶ Returns the model loadings for each view for the given data
Parameters: - views (
Iterable
[ndarray
]) – list/tuple of numpy arrays or array likes with the same number of rows (samples) - kwargs – any additional keyword arguments required by the given model
- views (
-
score
(views, y=None, **kwargs)¶ Return the coefficient of determination of the prediction.
The coefficient of determination \(R^2\) is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares
((y_true - y_pred)** 2).sum()
and \(v\) is the total sum of squares((y_true - y_true.mean()) ** 2).sum()
. The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.Parameters: - X (array-like of shape (n_samples, n_features)) – Test samples. For some estimators this may be a precomputed
kernel matrix or a list of generic objects instead with shape
(n_samples, n_samples_fitted)
, wheren_samples_fitted
is the number of samples used in the fitting for the estimator. - y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – True values for X.
- sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.
Returns: score – \(R^2\) of
self.predict(X)
wrt. y.Return type: float
Notes
The \(R^2\) score used when calling
score
on a regressor usesmultioutput='uniform_average'
from version 0.23 to keep consistent with default value ofr2_score()
. This influences thescore
method of all the multioutput regressors (except forMultiOutputRegressor
).- X (array-like of shape (n_samples, n_features)) – Test samples. For some estimators this may be a precomputed
kernel matrix or a list of generic objects instead with shape
-
transform
(views, y=None, **kwargs)¶ Transforms data given a fit model
Parameters: - views (
Iterable
[ndarray
]) – numpy arrays with the same number of rows (samples) separated by commas - kwargs – any additional keyword arguments required by the given model
- views (
- latent_dims (
Ridge Regularized Canonical Correlation Analysis¶
-
class
cca_zoo.models.rcca.
rCCA
(latent_dims=1, scale=True, centre=True, copy_data=True, random_state=None, c=None, eps=0.001, accept_sparse=None)[source]¶ Bases:
cca_zoo.models.cca_base._CCA_Base
A class used to fit Regularised CCA (canonical ridge) model. Uses PCA to perform the optimization efficiently for high dimensional data.
Maths: \[ \begin{align}\begin{aligned}\begin{split}w_{opt}=\underset{w}{\mathrm{argmax}}\{ w_1^TX_1^TX_2w_2 \}\\\end{split}\\\text{subject to:}\\(1-c_1)w_1^TX_1^TX_1w_1+c_1w_1^Tw_1=1\\(1-c_2)w_2^TX_2^TX_2w_2+c_2w_2^Tw_2=1\end{aligned}\end{align} \]Citation: Vinod, Hrishikesh D. “Canonical ridge and econometrics of joint production.” Journal of econometrics 4.2 (1976): 147-166.
Example: >>> from cca_zoo.models import rCCA >>> import numpy as np >>> rng=np.random.RandomState(0) >>> X1 = rng.random((10,5)) >>> X2 = rng.random((10,5)) >>> model = rCCA(c=[0.1,0.1]) >>> model.fit((X1,X2)).score((X1,X2)) array([0.95222128])
Constructor for rCCA
Parameters: - latent_dims (
int
) – number of latent dimensions to fit - scale (
bool
) – normalize variance in each column before fitting - centre – demean data by column before fitting (and before transforming out of sample
- copy_data – If True, X will be copied; else, it may be overwritten
- random_state – Pass for reproducible output across multiple function calls
- c (
Union
[Iterable
[float
],float
,None
]) – Iterable of regularisation parameters for each view (between 0:CCA and 1:PLS) - eps – epsilon for stability
- accept_sparse – which forms are accepted for sparse data
-
fit
(views, y=None, **kwargs)[source]¶ Fits a regularised CCA (canonical ridge) model
Parameters: views ( Iterable
[ndarray
]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
-
correlations
(views, y=None, **kwargs)¶ Predicts the correlation for the given data using the fit model
Parameters: - views (
Iterable
[ndarray
]) – list/tuple of numpy arrays or array likes with the same number of rows (samples) - kwargs – any additional keyword arguments required by the given model
Returns: all_corrs: an array of the pairwise correlations (k,k,self.latent_dims) where k is the number of views
Return type: np.ndarray
- views (
-
fit_transform
(views, y=None, **kwargs)¶ Fits and then transforms the training data
Parameters: - views (
Iterable
[ndarray
]) – list/tuple of numpy arrays or array likes with the same number of rows (samples) - kwargs – any additional keyword arguments required by the given model
- views (
-
get_loadings
(views, y=None, **kwargs)¶ Returns the model loadings for each view for the given data
Parameters: - views (
Iterable
[ndarray
]) – list/tuple of numpy arrays or array likes with the same number of rows (samples) - kwargs – any additional keyword arguments required by the given model
- views (
-
score
(views, y=None, **kwargs)¶ Return the coefficient of determination of the prediction.
The coefficient of determination \(R^2\) is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares
((y_true - y_pred)** 2).sum()
and \(v\) is the total sum of squares((y_true - y_true.mean()) ** 2).sum()
. The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.Parameters: - X (array-like of shape (n_samples, n_features)) – Test samples. For some estimators this may be a precomputed
kernel matrix or a list of generic objects instead with shape
(n_samples, n_samples_fitted)
, wheren_samples_fitted
is the number of samples used in the fitting for the estimator. - y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – True values for X.
- sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.
Returns: score – \(R^2\) of
self.predict(X)
wrt. y.Return type: float
Notes
The \(R^2\) score used when calling
score
on a regressor usesmultioutput='uniform_average'
from version 0.23 to keep consistent with default value ofr2_score()
. This influences thescore
method of all the multioutput regressors (except forMultiOutputRegressor
).- X (array-like of shape (n_samples, n_features)) – Test samples. For some estimators this may be a precomputed
kernel matrix or a list of generic objects instead with shape
-
transform
(views, y=None, **kwargs)¶ Transforms data given a fit model
Parameters: - views (
Iterable
[ndarray
]) – numpy arrays with the same number of rows (samples) separated by commas - kwargs – any additional keyword arguments required by the given model
- views (
- latent_dims (
GCCA and KGCCA¶
Generalized (MAXVAR) Canonical Correlation Analysis¶
-
class
cca_zoo.models.gcca.
GCCA
(latent_dims=1, scale=True, centre=True, copy_data=True, random_state=None, c=None, view_weights=None)[source]¶ Bases:
cca_zoo.models.rcca.rCCA
A class used to fit GCCA model. For more than 2 views, GCCA optimizes the sum of correlations with a shared auxiliary vector
Maths: \[ \begin{align}\begin{aligned}\begin{split}w_{opt}=\underset{w}{\mathrm{argmax}}\{ \sum_iw_i^TX_i^TT \}\\\end{split}\\\text{subject to:}\\T^TT=1\end{aligned}\end{align} \]Citation: Tenenhaus, Arthur, and Michel Tenenhaus. “Regularized generalized canonical correlation analysis.” Psychometrika 76.2 (2011): 257.
Example: >>> from cca_zoo.models import GCCA >>> import numpy as np >>> rng=np.random.RandomState(0) >>> X1 = rng.random((10,5)) >>> X2 = rng.random((10,5)) >>> X3 = rng.random((10,5)) >>> model = GCCA() >>> model.fit((X1,X2,X3)).score((X1,X2,X3)) array([0.97229856])
Constructor for GCCA
Parameters: - latent_dims (
int
) – number of latent dimensions to fit - scale (
bool
) – normalize variance in each column before fitting - centre – demean data by column before fitting (and before transforming out of sample
- copy_data – If True, X will be copied; else, it may be overwritten
- random_state – Pass for reproducible output across multiple function calls
- c (
Union
[Iterable
[float
],float
,None
]) – regularisation between 0 (CCA) and 1 (PLS) - view_weights (
Optional
[Iterable
[float
]]) – list of weights of each view
-
correlations
(views, y=None, **kwargs)¶ Predicts the correlation for the given data using the fit model
Parameters: - views (
Iterable
[ndarray
]) – list/tuple of numpy arrays or array likes with the same number of rows (samples) - kwargs – any additional keyword arguments required by the given model
Returns: all_corrs: an array of the pairwise correlations (k,k,self.latent_dims) where k is the number of views
Return type: np.ndarray
- views (
-
fit
(views, y=None, **kwargs)¶ Fits a regularised CCA (canonical ridge) model
Parameters: views ( Iterable
[ndarray
]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
-
fit_transform
(views, y=None, **kwargs)¶ Fits and then transforms the training data
Parameters: - views (
Iterable
[ndarray
]) – list/tuple of numpy arrays or array likes with the same number of rows (samples) - kwargs – any additional keyword arguments required by the given model
- views (
-
get_loadings
(views, y=None, **kwargs)¶ Returns the model loadings for each view for the given data
Parameters: - views (
Iterable
[ndarray
]) – list/tuple of numpy arrays or array likes with the same number of rows (samples) - kwargs – any additional keyword arguments required by the given model
- views (
-
score
(views, y=None, **kwargs)¶ Return the coefficient of determination of the prediction.
The coefficient of determination \(R^2\) is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares
((y_true - y_pred)** 2).sum()
and \(v\) is the total sum of squares((y_true - y_true.mean()) ** 2).sum()
. The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.Parameters: - X (array-like of shape (n_samples, n_features)) – Test samples. For some estimators this may be a precomputed
kernel matrix or a list of generic objects instead with shape
(n_samples, n_samples_fitted)
, wheren_samples_fitted
is the number of samples used in the fitting for the estimator. - y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – True values for X.
- sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.
Returns: score – \(R^2\) of
self.predict(X)
wrt. y.Return type: float
Notes
The \(R^2\) score used when calling
score
on a regressor usesmultioutput='uniform_average'
from version 0.23 to keep consistent with default value ofr2_score()
. This influences thescore
method of all the multioutput regressors (except forMultiOutputRegressor
).- X (array-like of shape (n_samples, n_features)) – Test samples. For some estimators this may be a precomputed
kernel matrix or a list of generic objects instead with shape
-
transform
(views, y=None, **kwargs)¶ Transforms data given a fit model
Parameters: - views (
Iterable
[ndarray
]) – numpy arrays with the same number of rows (samples) separated by commas - kwargs – any additional keyword arguments required by the given model
- views (
- latent_dims (
Kernel Generalized (MAXVAR) Canonical Correlation Analysis¶
-
class
cca_zoo.models.gcca.
KGCCA
(latent_dims=1, scale=True, centre=True, copy_data=True, random_state=None, c=None, eps=0.001, kernel=None, gamma=None, degree=None, coef0=None, kernel_params=None)[source]¶ Bases:
cca_zoo.models.gcca.GCCA
A class used to fit KGCCA model. For more than 2 views, KGCCA optimizes the sum of correlations with a shared auxiliary vector
Maths: \[ \begin{align}\begin{aligned}\begin{split}w_{opt}=\underset{w}{\mathrm{argmax}}\{ \sum_i\alpha_i^TK_i^TT \}\\\end{split}\\\text{subject to:}\\T^TT=1\end{aligned}\end{align} \]Citation: Tenenhaus, Arthur, Cathy Philippe, and Vincent Frouin. “Kernel generalized canonical correlation analysis.” Computational Statistics & Data Analysis 90 (2015): 114-131.
Example: >>> from cca_zoo.models import KGCCA >>> import numpy as np >>> rng=np.random.RandomState(0) >>> X1 = rng.random((10,5)) >>> X2 = rng.random((10,5)) >>> X3 = rng.random((10,5)) >>> model = KGCCA() >>> model.fit((X1,X2,X3)).score((X1,X2,X3)) array([0.97019284])
Constructor for PLS
Parameters: - latent_dims (
int
) – number of latent dimensions to fit - scale (
bool
) – normalize variance in each column before fitting - centre – demean data by column before fitting (and before transforming out of sample
- copy_data – If True, X will be copied; else, it may be overwritten
- random_state – Pass for reproducible output across multiple function calls
- c (
Union
[Iterable
[float
],float
,None
]) – Iterable of regularisation parameters for each view (between 0:CCA and 1:PLS) - eps – epsilon for stability
- kernel (
Optional
[Iterable
[Union
[float
, <built-in function callable>]]]) – Iterable of kernel mappings used internally. This parameter is directly passed topairwise_kernel
. If element of kernel is a string, it must be one of the metrics in pairwise.PAIRWISE_KERNEL_FUNCTIONS. Alternatively, if element of kernel is a callable function, it is called on each pair of instances (rows) and the resulting value recorded. The callable should take two rows from X as input and return the corresponding kernel value as a single number. This means that callables fromsklearn.metrics.pairwise
are not allowed, as they operate on matrices, not single samples. Use the string identifying the kernel instead. - gamma (
Optional
[Iterable
[float
]]) – Iterable of gamma parameters for the RBF, laplacian, polynomial, exponential chi2 and sigmoid kernels. Interpretation of the default value is left to the kernel; see the documentation for sklearn.metrics.pairwise. Ignored by other kernels. - degree (
Optional
[Iterable
[float
]]) – Iterable of degree parameters of the polynomial kernel. Ignored by other kernels. - coef0 (
Optional
[Iterable
[float
]]) – Iterable of zero coefficients for polynomial and sigmoid kernels. Ignored by other kernels. - kernel_params (
Optional
[Iterable
[dict
]]) – Iterable of additional parameters (keyword arguments) for kernel function passed as callable object. - eps – epsilon value to ensure stability of smallest eigenvalues
-
transform
(views, y=None, **kwargs)[source]¶ Transforms data given a fit KGCCA model
Parameters: - views (
ndarray
) – list/tuple of numpy arrays or array likes with the same number of rows (samples) - kwargs – any additional keyword arguments required by the given model
- views (
-
correlations
(views, y=None, **kwargs)¶ Predicts the correlation for the given data using the fit model
Parameters: - views (
Iterable
[ndarray
]) – list/tuple of numpy arrays or array likes with the same number of rows (samples) - kwargs – any additional keyword arguments required by the given model
Returns: all_corrs: an array of the pairwise correlations (k,k,self.latent_dims) where k is the number of views
Return type: np.ndarray
- views (
-
fit
(views, y=None, **kwargs)¶ Fits a regularised CCA (canonical ridge) model
Parameters: views ( Iterable
[ndarray
]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
-
fit_transform
(views, y=None, **kwargs)¶ Fits and then transforms the training data
Parameters: - views (
Iterable
[ndarray
]) – list/tuple of numpy arrays or array likes with the same number of rows (samples) - kwargs – any additional keyword arguments required by the given model
- views (
-
get_loadings
(views, y=None, **kwargs)¶ Returns the model loadings for each view for the given data
Parameters: - views (
Iterable
[ndarray
]) – list/tuple of numpy arrays or array likes with the same number of rows (samples) - kwargs – any additional keyword arguments required by the given model
- views (
-
score
(views, y=None, **kwargs)¶ Return the coefficient of determination of the prediction.
The coefficient of determination \(R^2\) is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares
((y_true - y_pred)** 2).sum()
and \(v\) is the total sum of squares((y_true - y_true.mean()) ** 2).sum()
. The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.Parameters: - X (array-like of shape (n_samples, n_features)) – Test samples. For some estimators this may be a precomputed
kernel matrix or a list of generic objects instead with shape
(n_samples, n_samples_fitted)
, wheren_samples_fitted
is the number of samples used in the fitting for the estimator. - y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – True values for X.
- sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.
Returns: score – \(R^2\) of
self.predict(X)
wrt. y.Return type: float
Notes
The \(R^2\) score used when calling
score
on a regressor usesmultioutput='uniform_average'
from version 0.23 to keep consistent with default value ofr2_score()
. This influences thescore
method of all the multioutput regressors (except forMultiOutputRegressor
).- X (array-like of shape (n_samples, n_features)) – Test samples. For some estimators this may be a precomputed
kernel matrix or a list of generic objects instead with shape
- latent_dims (
MCCA and KCCA¶
Multiset (SUMCOR) Canonical Correlation Analysis¶
-
class
cca_zoo.models.mcca.
MCCA
(latent_dims=1, scale=True, centre=True, copy_data=True, random_state=None, c=None, eps=0.001)[source]¶ Bases:
cca_zoo.models.rcca.rCCA
A class used to fit MCCA model. For more than 2 views, MCCA optimizes the sum of pairwise correlations.
Maths: \[ \begin{align}\begin{aligned}\begin{split}w_{opt}=\underset{w}{\mathrm{argmax}}\{\sum_i\sum_{j\neq i} w_i^TX_i^TX_jw_j \}\\\end{split}\\\text{subject to:}\\(1-c_i)w_i^TX_i^TX_iw_i+c_iw_i^Tw_i=1\end{aligned}\end{align} \]Citation: Kettenring, Jon R. “Canonical analysis of several sets of variables.” Biometrika 58.3 (1971): 433-451.
Example: >>> from cca_zoo.models import MCCA >>> import numpy as np >>> rng=np.random.RandomState(0) >>> X1 = rng.random((10,5)) >>> X2 = rng.random((10,5)) >>> X3 = rng.random((10,5)) >>> model = MCCA() >>> model.fit((X1,X2,X3)).score((X1,X2,X3)) array([0.97200847])
Constructor for MCCA
Parameters: - latent_dims (
int
) – number of latent dimensions to fit - scale (
bool
) – normalize variance in each column before fitting - centre – demean data by column before fitting (and before transforming out of sample
- copy_data – If True, X will be copied; else, it may be overwritten
- random_state – Pass for reproducible output across multiple function calls
- c (
Union
[Iterable
[float
],float
,None
]) – Iterable of regularisation parameters for each view (between 0:CCA and 1:PLS) - eps – epsilon for stability
-
correlations
(views, y=None, **kwargs)¶ Predicts the correlation for the given data using the fit model
Parameters: - views (
Iterable
[ndarray
]) – list/tuple of numpy arrays or array likes with the same number of rows (samples) - kwargs – any additional keyword arguments required by the given model
Returns: all_corrs: an array of the pairwise correlations (k,k,self.latent_dims) where k is the number of views
Return type: np.ndarray
- views (
-
fit
(views, y=None, **kwargs)¶ Fits a regularised CCA (canonical ridge) model
Parameters: views ( Iterable
[ndarray
]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
-
fit_transform
(views, y=None, **kwargs)¶ Fits and then transforms the training data
Parameters: - views (
Iterable
[ndarray
]) – list/tuple of numpy arrays or array likes with the same number of rows (samples) - kwargs – any additional keyword arguments required by the given model
- views (
-
get_loadings
(views, y=None, **kwargs)¶ Returns the model loadings for each view for the given data
Parameters: - views (
Iterable
[ndarray
]) – list/tuple of numpy arrays or array likes with the same number of rows (samples) - kwargs – any additional keyword arguments required by the given model
- views (
-
score
(views, y=None, **kwargs)¶ Return the coefficient of determination of the prediction.
The coefficient of determination \(R^2\) is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares
((y_true - y_pred)** 2).sum()
and \(v\) is the total sum of squares((y_true - y_true.mean()) ** 2).sum()
. The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.Parameters: - X (array-like of shape (n_samples, n_features)) – Test samples. For some estimators this may be a precomputed
kernel matrix or a list of generic objects instead with shape
(n_samples, n_samples_fitted)
, wheren_samples_fitted
is the number of samples used in the fitting for the estimator. - y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – True values for X.
- sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.
Returns: score – \(R^2\) of
self.predict(X)
wrt. y.Return type: float
Notes
The \(R^2\) score used when calling
score
on a regressor usesmultioutput='uniform_average'
from version 0.23 to keep consistent with default value ofr2_score()
. This influences thescore
method of all the multioutput regressors (except forMultiOutputRegressor
).- X (array-like of shape (n_samples, n_features)) – Test samples. For some estimators this may be a precomputed
kernel matrix or a list of generic objects instead with shape
-
transform
(views, y=None, **kwargs)¶ Transforms data given a fit model
Parameters: - views (
Iterable
[ndarray
]) – numpy arrays with the same number of rows (samples) separated by commas - kwargs – any additional keyword arguments required by the given model
- views (
- latent_dims (
Kernel Multiset (SUMCOR) Canonical Correlation Analysis¶
-
class
cca_zoo.models.mcca.
KCCA
(latent_dims=1, scale=True, centre=True, copy_data=True, random_state=None, c=None, eps=0.001, kernel=None, gamma=None, degree=None, coef0=None, kernel_params=None)[source]¶ Bases:
cca_zoo.models.mcca.MCCA
A class used to fit KCCA model.
Maths: \[ \begin{align}\begin{aligned}\begin{split}\alpha_{opt}=\underset{\alpha}{\mathrm{argmax}}\{\sum_i\sum_{j\neq i} \alpha_i^TK_i^TK_j\alpha_j \}\\\end{split}\\\text{subject to:}\\c_i\alpha_i^TK_i\alpha_i + (1-c_i)\alpha_i^TK_i^TK_i\alpha_i=1\end{aligned}\end{align} \]Example: >>> from cca_zoo.models import KCCA >>> import numpy as np >>> rng=np.random.RandomState(0) >>> X1 = rng.random((10,5)) >>> X2 = rng.random((10,5)) >>> X3 = rng.random((10,5)) >>> model = KCCA() >>> model.fit((X1,X2,X3)).score((X1,X2,X3)) array([0.96893666])
Parameters: - latent_dims (
int
) – number of latent dimensions to fit - scale (
bool
) – normalize variance in each column before fitting - centre – demean data by column before fitting (and before transforming out of sample
- copy_data – If True, X will be copied; else, it may be overwritten
- random_state – Pass for reproducible output across multiple function calls
- c (
Union
[Iterable
[float
],float
,None
]) – Iterable of regularisation parameters for each view (between 0:CCA and 1:PLS) - eps – epsilon for stability
- kernel (
Optional
[Iterable
[Union
[float
, <built-in function callable>]]]) – Iterable of kernel mappings used internally. This parameter is directly passed topairwise_kernel
. If element of kernel is a string, it must be one of the metrics in pairwise.PAIRWISE_KERNEL_FUNCTIONS. Alternatively, if element of kernel is a callable function, it is called on each pair of instances (rows) and the resulting value recorded. The callable should take two rows from X as input and return the corresponding kernel value as a single number. This means that callables fromsklearn.metrics.pairwise
are not allowed, as they operate on matrices, not single samples. Use the string identifying the kernel instead. - gamma (
Optional
[Iterable
[float
]]) – Iterable of gamma parameters for the RBF, laplacian, polynomial, exponential chi2 and sigmoid kernels. Interpretation of the default value is left to the kernel; see the documentation for sklearn.metrics.pairwise. Ignored by other kernels. - degree (
Optional
[Iterable
[float
]]) – Iterable of degree parameters of the polynomial kernel. Ignored by other kernels. - coef0 (
Optional
[Iterable
[float
]]) – Iterable of zero coefficients for polynomial and sigmoid kernels. Ignored by other kernels. - kernel_params (
Optional
[Iterable
[dict
]]) – Iterable of additional parameters (keyword arguments) for kernel function passed as callable object. - eps – epsilon value to ensure stability of smallest eigenvalues
-
transform
(views, y=None, **kwargs)[source]¶ Transforms data given a fit KCCA model
Parameters: - views (
ndarray
) – list/tuple of numpy arrays or array likes with the same number of rows (samples) - kwargs – any additional keyword arguments required by the given model
- views (
-
correlations
(views, y=None, **kwargs)¶ Predicts the correlation for the given data using the fit model
Parameters: - views (
Iterable
[ndarray
]) – list/tuple of numpy arrays or array likes with the same number of rows (samples) - kwargs – any additional keyword arguments required by the given model
Returns: all_corrs: an array of the pairwise correlations (k,k,self.latent_dims) where k is the number of views
Return type: np.ndarray
- views (
-
fit
(views, y=None, **kwargs)¶ Fits a regularised CCA (canonical ridge) model
Parameters: views ( Iterable
[ndarray
]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
-
fit_transform
(views, y=None, **kwargs)¶ Fits and then transforms the training data
Parameters: - views (
Iterable
[ndarray
]) – list/tuple of numpy arrays or array likes with the same number of rows (samples) - kwargs – any additional keyword arguments required by the given model
- views (
-
get_loadings
(views, y=None, **kwargs)¶ Returns the model loadings for each view for the given data
Parameters: - views (
Iterable
[ndarray
]) – list/tuple of numpy arrays or array likes with the same number of rows (samples) - kwargs – any additional keyword arguments required by the given model
- views (
-
score
(views, y=None, **kwargs)¶ Return the coefficient of determination of the prediction.
The coefficient of determination \(R^2\) is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares
((y_true - y_pred)** 2).sum()
and \(v\) is the total sum of squares((y_true - y_true.mean()) ** 2).sum()
. The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.Parameters: - X (array-like of shape (n_samples, n_features)) – Test samples. For some estimators this may be a precomputed
kernel matrix or a list of generic objects instead with shape
(n_samples, n_samples_fitted)
, wheren_samples_fitted
is the number of samples used in the fitting for the estimator. - y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – True values for X.
- sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.
Returns: score – \(R^2\) of
self.predict(X)
wrt. y.Return type: float
Notes
The \(R^2\) score used when calling
score
on a regressor usesmultioutput='uniform_average'
from version 0.23 to keep consistent with default value ofr2_score()
. This influences thescore
method of all the multioutput regressors (except forMultiOutputRegressor
).- X (array-like of shape (n_samples, n_features)) – Test samples. For some estimators this may be a precomputed
kernel matrix or a list of generic objects instead with shape
- latent_dims (
Tensor Canonical Correlation Analysis¶
Tensor Canonical Correlation Analysis¶
-
class
cca_zoo.models.tcca.
TCCA
(latent_dims=1, scale=True, centre=True, copy_data=True, random_state=None, c=None)[source]¶ Bases:
cca_zoo.models.cca_base._CCA_Base
Fits a Tensor CCA model. Tensor CCA maximises higher order correlations
Maths: \[ \begin{align}\begin{aligned}\begin{split}\alpha_{opt}=\underset{\alpha}{\mathrm{argmax}}\{\sum_i\sum_{j\neq i} \alpha_i^TK_i^TK_j\alpha_j \}\\\end{split}\\\text{subject to:}\\\alpha_i^TK_i^TK_i\alpha_i=1\end{aligned}\end{align} \]Citation: Kim, Tae-Kyun, Shu-Fai Wong, and Roberto Cipolla. “Tensor canonical correlation analysis for action classification.” 2007 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2007 https://github.com/rciszek/mdr_tcca
Example: >>> from cca_zoo.models import TCCA >>> rng=np.random.RandomState(0) >>> X1 = rng.random((10,5)) >>> X2 = rng.random((10,5)) >>> X3 = rng.random((10,5)) >>> model = TCCA() >>> model.fit((X1,X2,X3)).score((X1,X2,X3)) array([1.14595755])
Constructor for TCCA
Parameters: - latent_dims (
int
) – number of latent dimensions to fit - scale – normalize variance in each column before fitting
- centre – demean data by column before fitting (and before transforming out of sample
- copy_data – If True, X will be copied; else, it may be overwritten
- random_state – Pass for reproducible output across multiple function calls
- c (
Union
[Iterable
[float
],float
,None
]) – Iterable of regularisation parameters for each view (between 0:CCA and 1:PLS)
-
fit
(views, y=None, **kwargs)[source]¶ Parameters: views ( Iterable
[ndarray
]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
-
correlations
(views, y=None, **kwargs)[source]¶ Predicts the correlation for the given data using the fit model
Parameters: - views (
Iterable
[ndarray
]) – list/tuple of numpy arrays or array likes with the same number of rows (samples) - kwargs – any additional keyword arguments required by the given model
- views (
-
score
(views, y=None, **kwargs)[source]¶ Returns the higher order correlations in each dimension
Parameters: - views (
Iterable
[ndarray
]) – list/tuple of numpy arrays or array likes with the same number of rows (samples) - kwargs – any additional keyword arguments required by the given model
- views (
-
fit_transform
(views, y=None, **kwargs)¶ Fits and then transforms the training data
Parameters: - views (
Iterable
[ndarray
]) – list/tuple of numpy arrays or array likes with the same number of rows (samples) - kwargs – any additional keyword arguments required by the given model
- views (
-
get_loadings
(views, y=None, **kwargs)¶ Returns the model loadings for each view for the given data
Parameters: - views (
Iterable
[ndarray
]) – list/tuple of numpy arrays or array likes with the same number of rows (samples) - kwargs – any additional keyword arguments required by the given model
- views (
-
transform
(views, y=None, **kwargs)¶ Transforms data given a fit model
Parameters: - views (
Iterable
[ndarray
]) – numpy arrays with the same number of rows (samples) separated by commas - kwargs – any additional keyword arguments required by the given model
- views (
- latent_dims (
Kernel Tensor Canonical Correlation Analysis¶
-
class
cca_zoo.models.tcca.
KTCCA
(latent_dims=1, scale=True, centre=True, copy_data=True, random_state=None, eps=0.001, c=None, kernel=None, gamma=None, degree=None, coef0=None, kernel_params=None)[source]¶ Bases:
cca_zoo.models.tcca.TCCA
Fits a Kernel Tensor CCA model. Tensor CCA maximises higher order correlations
Maths: \[ \begin{align}\begin{aligned}\begin{split}\alpha_{opt}=\underset{\alpha}{\mathrm{argmax}}\{\sum_i\sum_{j\neq i} \alpha_i^TK_i^TK_j\alpha_j \}\\\end{split}\\\text{subject to:}\\\alpha_i^TK_i^TK_i\alpha_i=1\end{aligned}\end{align} \]Citation: Kim, Tae-Kyun, Shu-Fai Wong, and Roberto Cipolla. “Tensor canonical correlation analysis for action classification.” 2007 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2007
Example: >>> from cca_zoo.models import KTCCA >>> rng=np.random.RandomState(0) >>> X1 = rng.random((10,5)) >>> X2 = rng.random((10,5)) >>> X3 = rng.random((10,5)) >>> model = KTCCA() >>> model.fit((X1,X2,X3)).score((X1,X2,X3)) array([1.69896269])
Constructor for TCCA
Parameters: - latent_dims (
int
) – number of latent dimensions to fit - scale (
bool
) – normalize variance in each column before fitting - centre – demean data by column before fitting (and before transforming out of sample
- copy_data – If True, X will be copied; else, it may be overwritten
- random_state – Pass for reproducible output across multiple function calls
- c (
Union
[Iterable
[float
],float
,None
]) – Iterable of regularisation parameters for each view (between 0:CCA and 1:PLS) - kernel (
Optional
[Iterable
[Union
[float
, <built-in function callable>]]]) – Iterable of kernel mappings used internally. This parameter is directly passed topairwise_kernel
. If element of kernel is a string, it must be one of the metrics in pairwise.PAIRWISE_KERNEL_FUNCTIONS. Alternatively, if element of kernel is a callable function, it is called on each pair of instances (rows) and the resulting value recorded. The callable should take two rows from X as input and return the corresponding kernel value as a single number. This means that callables fromsklearn.metrics.pairwise
are not allowed, as they operate on matrices, not single samples. Use the string identifying the kernel instead. - gamma (
Optional
[Iterable
[float
]]) – Iterable of gamma parameters for the RBF, laplacian, polynomial, exponential chi2 and sigmoid kernels. Interpretation of the default value is left to the kernel; see the documentation for sklearn.metrics.pairwise. Ignored by other kernels. - degree (
Optional
[Iterable
[float
]]) – Iterable of degree parameters of the polynomial kernel. Ignored by other kernels. - coef0 (
Optional
[Iterable
[float
]]) – Iterable of zero coefficients for polynomial and sigmoid kernels. Ignored by other kernels. - kernel_params (
Optional
[Iterable
[dict
]]) – Iterable of additional parameters (keyword arguments) for kernel function passed as callable object. - eps – epsilon value to ensure stability
-
transform
(views, y=None, **kwargs)[source]¶ Transforms data given a fit k=KCCA model
Parameters: - views (
ndarray
) – list/tuple of numpy arrays or array likes with the same number of rows (samples) - kwargs – any additional keyword arguments required by the given model
- views (
-
correlations
(views, y=None, **kwargs)¶ Predicts the correlation for the given data using the fit model
Parameters: - views (
Iterable
[ndarray
]) – list/tuple of numpy arrays or array likes with the same number of rows (samples) - kwargs – any additional keyword arguments required by the given model
- views (
-
fit
(views, y=None, **kwargs)¶ Parameters: views ( Iterable
[ndarray
]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
-
fit_transform
(views, y=None, **kwargs)¶ Fits and then transforms the training data
Parameters: - views (
Iterable
[ndarray
]) – list/tuple of numpy arrays or array likes with the same number of rows (samples) - kwargs – any additional keyword arguments required by the given model
- views (
-
get_loadings
(views, y=None, **kwargs)¶ Returns the model loadings for each view for the given data
Parameters: - views (
Iterable
[ndarray
]) – list/tuple of numpy arrays or array likes with the same number of rows (samples) - kwargs – any additional keyword arguments required by the given model
- views (
-
score
(views, y=None, **kwargs)¶ Returns the higher order correlations in each dimension
Parameters: - views (
Iterable
[ndarray
]) – list/tuple of numpy arrays or array likes with the same number of rows (samples) - kwargs – any additional keyword arguments required by the given model
- views (
- latent_dims (
More Complex Regularisation using Iterative Models¶
Normal CCA and PLS by alternating least squares¶
Quicker and more memory efficient for very large data
CCA by Alternating Least Squares¶
-
class
cca_zoo.models.
CCA_ALS
(latent_dims=1, scale=True, centre=True, copy_data=True, random_state=None, deflation='cca', max_iter=100, initialization='random', tol=1e-09, stochastic=True, positive=None)[source]¶ Bases:
cca_zoo.models.iterative.ElasticCCA
Fits a CCA model with CCA deflation by NIPALS algorithm. Implemented by ElasticCCA with 0 regularisation
\[ \begin{align}\begin{aligned}\begin{split}w_{opt}=\underset{w}{\mathrm{argmax}}\{\sum_i\sum_{j\neq i} \|X_iw_i-X_jw_j\|^2 }\\\end{split}\\\text{subject to:}\\w_i^TX_i^TX_iw_i=1\end{aligned}\end{align} \]Citation: Golub, Gene H., and Hongyuan Zha. “The canonical correlations of matrix pairs and their numerical computation.” Linear algebra for signal processing. Springer, New York, NY, 1995. 27-49.
Example: >>> from cca_zoo.models import CCA_ALS >>> import numpy as np >>> rng=np.random.RandomState(0) >>> X1 = rng.random((10,3)) >>> X2 = rng.random((10,3)) >>> model = CCA_ALS(random_state=0) >>> model.fit((X1,X2)).score((X1,X2)) array([0.858906])
Constructor for CCA_ALS
Parameters: - latent_dims (
int
) – number of latent dimensions to fit - scale (
bool
) – normalize variance in each column before fitting - centre – demean data by column before fitting (and before transforming out of sample
- copy_data – If True, X will be copied; else, it may be overwritten
- random_state – Pass for reproducible output across multiple function calls
- max_iter (
int
) – the maximum number of iterations to perform in the inner optimization loop - initialization (
str
) – initialization for optimisation. ‘unregularized’ uses CCA or PLS solution,’random’ uses random initialization,’uniform’ uses uniform initialization of weights and scores - tol (
float
) – tolerance value used for early stopping - stochastic – use stochastic regression optimisers for subproblems
- positive (
Union
[Iterable
[bool
],bool
,None
]) – constrain model weights to be positive
-
correlations
(views, y=None, **kwargs)¶ Predicts the correlation for the given data using the fit model
Parameters: - views (
Iterable
[ndarray
]) – list/tuple of numpy arrays or array likes with the same number of rows (samples) - kwargs – any additional keyword arguments required by the given model
Returns: all_corrs: an array of the pairwise correlations (k,k,self.latent_dims) where k is the number of views
Return type: np.ndarray
- views (
-
fit
(views, y=None, **kwargs)¶ Fits the model by running an inner loop to convergence and then using either CCA or PLS deflation
Parameters: views ( Iterable
[ndarray
]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
-
fit_transform
(views, y=None, **kwargs)¶ Fits and then transforms the training data
Parameters: - views (
Iterable
[ndarray
]) – list/tuple of numpy arrays or array likes with the same number of rows (samples) - kwargs – any additional keyword arguments required by the given model
- views (
-
get_loadings
(views, y=None, **kwargs)¶ Returns the model loadings for each view for the given data
Parameters: - views (
Iterable
[ndarray
]) – list/tuple of numpy arrays or array likes with the same number of rows (samples) - kwargs – any additional keyword arguments required by the given model
- views (
-
score
(views, y=None, **kwargs)¶ Return the coefficient of determination of the prediction.
The coefficient of determination \(R^2\) is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares
((y_true - y_pred)** 2).sum()
and \(v\) is the total sum of squares((y_true - y_true.mean()) ** 2).sum()
. The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.Parameters: - X (array-like of shape (n_samples, n_features)) – Test samples. For some estimators this may be a precomputed
kernel matrix or a list of generic objects instead with shape
(n_samples, n_samples_fitted)
, wheren_samples_fitted
is the number of samples used in the fitting for the estimator. - y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – True values for X.
- sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.
Returns: score – \(R^2\) of
self.predict(X)
wrt. y.Return type: float
Notes
The \(R^2\) score used when calling
score
on a regressor usesmultioutput='uniform_average'
from version 0.23 to keep consistent with default value ofr2_score()
. This influences thescore
method of all the multioutput regressors (except forMultiOutputRegressor
).- X (array-like of shape (n_samples, n_features)) – Test samples. For some estimators this may be a precomputed
kernel matrix or a list of generic objects instead with shape
-
transform
(views, y=None, **kwargs)¶ Transforms data given a fit model
Parameters: - views (
Iterable
[ndarray
]) – numpy arrays with the same number of rows (samples) separated by commas - kwargs – any additional keyword arguments required by the given model
- views (
- latent_dims (
PLS by Alternating Least Squares¶
-
class
cca_zoo.models.
PLS_ALS
(latent_dims=1, scale=True, centre=True, copy_data=True, random_state=None, deflation='cca', max_iter=100, initialization='unregularized', tol=1e-09)[source]¶ Bases:
cca_zoo.models.iterative._Iterative
A class used to fit a PLS model
Fits a partial least squares model with CCA deflation by NIPALS algorithm
\[ \begin{align}\begin{aligned}\begin{split}w_{opt}=\underset{w}{\mathrm{argmax}}\{\sum_i\sum_{j\neq i} \|X_iw_i-X_jw_j\|^2\}\\\end{split}\\\text{subject to:}\\w_i^Tw_i=1\end{aligned}\end{align} \]Example: >>> from cca_zoo.models import PLS >>> import numpy as np >>> rng=np.random.RandomState(0) >>> X1 = rng.random((10,5)) >>> X2 = rng.random((10,5)) >>> model = PLS_ALS(random_state=0) >>> model.fit((X1,X2)).score((X1,X2)) array([0.81796873])
Constructor for PLS
Parameters: - latent_dims (
int
) – number of latent dimensions to fit - scale (
bool
) – normalize variance in each column before fitting - centre – demean data by column before fitting (and before transforming out of sample
- copy_data – If True, X will be copied; else, it may be overwritten
- random_state – Pass for reproducible output across multiple function calls
- max_iter (
int
) – the maximum number of iterations to perform in the inner optimization loop - initialization (
str
) – intialization for optimisation. ‘unregularized’ uses CCA or PLS solution,’random’ uses random initialization,’uniform’ uses uniform initialization of weights and scores - tol (
float
) – tolerance value used for early stopping
-
correlations
(views, y=None, **kwargs)¶ Predicts the correlation for the given data using the fit model
Parameters: - views (
Iterable
[ndarray
]) – list/tuple of numpy arrays or array likes with the same number of rows (samples) - kwargs – any additional keyword arguments required by the given model
Returns: all_corrs: an array of the pairwise correlations (k,k,self.latent_dims) where k is the number of views
Return type: np.ndarray
- views (
-
fit
(views, y=None, **kwargs)¶ Fits the model by running an inner loop to convergence and then using either CCA or PLS deflation
Parameters: views ( Iterable
[ndarray
]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
-
fit_transform
(views, y=None, **kwargs)¶ Fits and then transforms the training data
Parameters: - views (
Iterable
[ndarray
]) – list/tuple of numpy arrays or array likes with the same number of rows (samples) - kwargs – any additional keyword arguments required by the given model
- views (
-
get_loadings
(views, y=None, **kwargs)¶ Returns the model loadings for each view for the given data
Parameters: - views (
Iterable
[ndarray
]) – list/tuple of numpy arrays or array likes with the same number of rows (samples) - kwargs – any additional keyword arguments required by the given model
- views (
-
score
(views, y=None, **kwargs)¶ Return the coefficient of determination of the prediction.
The coefficient of determination \(R^2\) is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares
((y_true - y_pred)** 2).sum()
and \(v\) is the total sum of squares((y_true - y_true.mean()) ** 2).sum()
. The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.Parameters: - X (array-like of shape (n_samples, n_features)) – Test samples. For some estimators this may be a precomputed
kernel matrix or a list of generic objects instead with shape
(n_samples, n_samples_fitted)
, wheren_samples_fitted
is the number of samples used in the fitting for the estimator. - y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – True values for X.
- sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.
Returns: score – \(R^2\) of
self.predict(X)
wrt. y.Return type: float
Notes
The \(R^2\) score used when calling
score
on a regressor usesmultioutput='uniform_average'
from version 0.23 to keep consistent with default value ofr2_score()
. This influences thescore
method of all the multioutput regressors (except forMultiOutputRegressor
).- X (array-like of shape (n_samples, n_features)) – Test samples. For some estimators this may be a precomputed
kernel matrix or a list of generic objects instead with shape
-
transform
(views, y=None, **kwargs)¶ Transforms data given a fit model
Parameters: - views (
Iterable
[ndarray
]) – numpy arrays with the same number of rows (samples) separated by commas - kwargs – any additional keyword arguments required by the given model
- views (
- latent_dims (
Sparsity Inducing Models¶
Penalized Matrix Decomposition (Sparse PLS)¶
-
class
cca_zoo.models.
PMD
(latent_dims=1, scale=True, centre=True, copy_data=True, random_state=None, deflation='cca', c=None, max_iter=100, initialization='unregularized', tol=1e-09, positive=None)[source]¶ Bases:
cca_zoo.models.iterative._Iterative
Fits a Sparse CCA (Penalized Matrix Decomposition) model.
\[ \begin{align}\begin{aligned}\begin{split}w_{opt}=\underset{w}{\mathrm{argmax}}\{ w_1^TX_1^TX_2w_2 \}\\\end{split}\\\text{subject to:}\\w_i^Tw_i=1\\\|w_i\|<=c_i\end{aligned}\end{align} \]Citation: Witten, Daniela M., Robert Tibshirani, and Trevor Hastie. “A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis.” Biostatistics 10.3 (2009): 515-534.
Example: >>> from cca_zoo.models import PMD >>> import numpy as np >>> rng=np.random.RandomState(0) >>> X1 = rng.random((10,5)) >>> X2 = rng.random((10,5)) >>> model = PMD(c=[1,1],random_state=0) >>> model.fit((X1,X2)).score((X1,X2)) array([0.69792082])
Constructor for PMD
Parameters: - latent_dims (
int
) – number of latent dimensions to fit - scale (
bool
) – normalize variance in each column before fitting - centre – demean data by column before fitting (and before transforming out of sample
- copy_data – If True, X will be copied; else, it may be overwritten
- random_state – Pass for reproducible output across multiple function calls
- c (
Union
[Iterable
[float
],float
,None
]) – l1 regularisation parameter between 1 and sqrt(number of features) for each view - max_iter (
int
) – the maximum number of iterations to perform in the inner optimization loop - initialization (
str
) – intialization for optimisation. ‘unregularized’ uses CCA or PLS solution,’random’ uses random initialization,’uniform’ uses uniform initialization of weights and scores - tol (
float
) – tolerance value used for early stopping - positive (
Union
[Iterable
[bool
],bool
,None
]) – constrain model weights to be positive
-
correlations
(views, y=None, **kwargs)¶ Predicts the correlation for the given data using the fit model
Parameters: - views (
Iterable
[ndarray
]) – list/tuple of numpy arrays or array likes with the same number of rows (samples) - kwargs – any additional keyword arguments required by the given model
Returns: all_corrs: an array of the pairwise correlations (k,k,self.latent_dims) where k is the number of views
Return type: np.ndarray
- views (
-
fit
(views, y=None, **kwargs)¶ Fits the model by running an inner loop to convergence and then using either CCA or PLS deflation
Parameters: views ( Iterable
[ndarray
]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
-
fit_transform
(views, y=None, **kwargs)¶ Fits and then transforms the training data
Parameters: - views (
Iterable
[ndarray
]) – list/tuple of numpy arrays or array likes with the same number of rows (samples) - kwargs – any additional keyword arguments required by the given model
- views (
-
get_loadings
(views, y=None, **kwargs)¶ Returns the model loadings for each view for the given data
Parameters: - views (
Iterable
[ndarray
]) – list/tuple of numpy arrays or array likes with the same number of rows (samples) - kwargs – any additional keyword arguments required by the given model
- views (
-
score
(views, y=None, **kwargs)¶ Return the coefficient of determination of the prediction.
The coefficient of determination \(R^2\) is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares
((y_true - y_pred)** 2).sum()
and \(v\) is the total sum of squares((y_true - y_true.mean()) ** 2).sum()
. The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.Parameters: - X (array-like of shape (n_samples, n_features)) – Test samples. For some estimators this may be a precomputed
kernel matrix or a list of generic objects instead with shape
(n_samples, n_samples_fitted)
, wheren_samples_fitted
is the number of samples used in the fitting for the estimator. - y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – True values for X.
- sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.
Returns: score – \(R^2\) of
self.predict(X)
wrt. y.Return type: float
Notes
The \(R^2\) score used when calling
score
on a regressor usesmultioutput='uniform_average'
from version 0.23 to keep consistent with default value ofr2_score()
. This influences thescore
method of all the multioutput regressors (except forMultiOutputRegressor
).- X (array-like of shape (n_samples, n_features)) – Test samples. For some estimators this may be a precomputed
kernel matrix or a list of generic objects instead with shape
-
transform
(views, y=None, **kwargs)¶ Transforms data given a fit model
Parameters: - views (
Iterable
[ndarray
]) – numpy arrays with the same number of rows (samples) separated by commas - kwargs – any additional keyword arguments required by the given model
- views (
- latent_dims (
Sparse CCA by iterative lasso regression¶
-
class
cca_zoo.models.
SCCA
(latent_dims=1, scale=True, centre=True, copy_data=True, random_state=None, deflation='cca', c=None, max_iter=100, maxvar=False, initialization='unregularized', tol=1e-09, stochastic=False, positive=None)[source]¶ Bases:
cca_zoo.models.iterative.ElasticCCA
Fits a sparse CCA model by iterative rescaled lasso regression. Implemented by ElasticCCA with l1 ratio=1
\[ \begin{align}\begin{aligned}\begin{split}w_{opt}=\underset{w}{\mathrm{argmax}}\{\sum_i\sum_{j\neq i} \|X_iw_i-X_jw_j\|^2 + \text{l1_ratio}\|w_i\|_1\}\\\end{split}\\\text{subject to:}\\w_i^TX_i^TX_iw_i=1\end{aligned}\end{align} \]Citation: Mai, Qing, and Xin Zhang. “An iterative penalized least squares approach to sparse canonical correlation analysis.” Biometrics 75.3 (2019): 734-744.
Example: >>> from cca_zoo.models import SCCA >>> import numpy as np >>> rng=np.random.RandomState(0) >>> X1 = rng.random((10,5)) >>> X2 = rng.random((10,5)) >>> model = SCCA(c=[0.001,0.001], random_state=0) >>> model.fit((X1,X2)).score((X1,X2)) array([0.99998919])
Constructor for SCCA
Parameters: - latent_dims (
int
) – number of latent dimensions to fit - scale (
bool
) – normalize variance in each column before fitting - centre – demean data by column before fitting (and before transforming out of sample
- copy_data – If True, X will be copied; else, it may be overwritten
- random_state – Pass for reproducible output across multiple function calls
- max_iter (
int
) – the maximum number of iterations to perform in the inner optimization loop - maxvar (
bool
) – use auxiliary variable “maxvar” form - initialization (
str
) – intialization for optimisation. ‘unregularized’ uses CCA or PLS solution,’random’ uses random initialization,’uniform’ uses uniform initialization of weights and scores - tol (
float
) – tolerance value used for early stopping - c (
Union
[Iterable
[float
],float
,None
]) – lasso alpha - stochastic – use stochastic regression optimisers for subproblems
- positive (
Union
[Iterable
[bool
],bool
,None
]) – constrain model weights to be positive
-
correlations
(views, y=None, **kwargs)¶ Predicts the correlation for the given data using the fit model
Parameters: - views (
Iterable
[ndarray
]) – list/tuple of numpy arrays or array likes with the same number of rows (samples) - kwargs – any additional keyword arguments required by the given model
Returns: all_corrs: an array of the pairwise correlations (k,k,self.latent_dims) where k is the number of views
Return type: np.ndarray
- views (
-
fit
(views, y=None, **kwargs)¶ Fits the model by running an inner loop to convergence and then using either CCA or PLS deflation
Parameters: views ( Iterable
[ndarray
]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
-
fit_transform
(views, y=None, **kwargs)¶ Fits and then transforms the training data
Parameters: - views (
Iterable
[ndarray
]) – list/tuple of numpy arrays or array likes with the same number of rows (samples) - kwargs – any additional keyword arguments required by the given model
- views (
-
get_loadings
(views, y=None, **kwargs)¶ Returns the model loadings for each view for the given data
Parameters: - views (
Iterable
[ndarray
]) – list/tuple of numpy arrays or array likes with the same number of rows (samples) - kwargs – any additional keyword arguments required by the given model
- views (
-
score
(views, y=None, **kwargs)¶ Return the coefficient of determination of the prediction.
The coefficient of determination \(R^2\) is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares
((y_true - y_pred)** 2).sum()
and \(v\) is the total sum of squares((y_true - y_true.mean()) ** 2).sum()
. The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.Parameters: - X (array-like of shape (n_samples, n_features)) – Test samples. For some estimators this may be a precomputed
kernel matrix or a list of generic objects instead with shape
(n_samples, n_samples_fitted)
, wheren_samples_fitted
is the number of samples used in the fitting for the estimator. - y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – True values for X.
- sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.
Returns: score – \(R^2\) of
self.predict(X)
wrt. y.Return type: float
Notes
The \(R^2\) score used when calling
score
on a regressor usesmultioutput='uniform_average'
from version 0.23 to keep consistent with default value ofr2_score()
. This influences thescore
method of all the multioutput regressors (except forMultiOutputRegressor
).- X (array-like of shape (n_samples, n_features)) – Test samples. For some estimators this may be a precomputed
kernel matrix or a list of generic objects instead with shape
-
transform
(views, y=None, **kwargs)¶ Transforms data given a fit model
Parameters: - views (
Iterable
[ndarray
]) – numpy arrays with the same number of rows (samples) separated by commas - kwargs – any additional keyword arguments required by the given model
- views (
- latent_dims (
Elastic CCA by MAXVAR¶
-
class
cca_zoo.models.
ElasticCCA
(latent_dims=1, scale=True, centre=True, copy_data=True, random_state=None, deflation='cca', max_iter=100, initialization='unregularized', tol=1e-09, c=None, l1_ratio=None, maxvar=True, stochastic=False, positive=None)[source]¶ Bases:
cca_zoo.models.iterative._Iterative
Fits an elastic CCA by iterating elastic net regression
\[ \begin{align}\begin{aligned}\begin{split}w_{opt}=\underset{w}{\mathrm{argmax}}\{\sum_i\sum_{j\neq i} \|X_iw_i-X_jw_j\|^2 + c\|w_i\|^2_2 + \text{l1_ratio}\|w_i\|_1\}\\\end{split}\\\text{subject to:}\\w_i^TX_i^TX_iw_i=1\end{aligned}\end{align} \]Citation: Fu, Xiao, et al. “Scalable and flexible multiview MAX-VAR canonical correlation analysis.” IEEE Transactions on Signal Processing 65.16 (2017): 4150-4165.
Example: >>> from cca_zoo.models import ElasticCCA >>> import numpy as np >>> rng=np.random.RandomState(0) >>> X1 = rng.random((10,5)) >>> X2 = rng.random((10,5)) >>> model = ElasticCCA(c=[1e-1,1e-1],l1_ratio=[0.5,0.5], random_state=0) >>> model.fit((X1,X2)).score((X1,X2)) array([0.9316638])
Constructor for ElasticCCA
Parameters: - latent_dims (
int
) – number of latent dimensions to fit - scale (
bool
) – normalize variance in each column before fitting - centre – demean data by column before fitting (and before transforming out of sample
- copy_data – If True, X will be copied; else, it may be overwritten
- random_state – Pass for reproducible output across multiple function calls
- deflation – the type of deflation.
- max_iter (
int
) – the maximum number of iterations to perform in the inner optimization loop - initialization (
str
) – intialization for optimisation. ‘unregularized’ uses CCA or PLS solution,’random’ uses random initialization,’uniform’ uses uniform initialization of weights and scores - tol (
float
) – tolerance value used for early stopping - c (
Union
[Iterable
[float
],float
,None
]) – lasso alpha - l1_ratio (
Union
[Iterable
[float
],float
,None
]) – l1 ratio in lasso subproblems - maxvar (
bool
) – use auxiliary variable “maxvar” formulation - stochastic – use stochastic regression optimisers for subproblems
- positive (
Union
[Iterable
[bool
],bool
,None
]) – constrain model weights to be positive
-
correlations
(views, y=None, **kwargs)¶ Predicts the correlation for the given data using the fit model
Parameters: - views (
Iterable
[ndarray
]) – list/tuple of numpy arrays or array likes with the same number of rows (samples) - kwargs – any additional keyword arguments required by the given model
Returns: all_corrs: an array of the pairwise correlations (k,k,self.latent_dims) where k is the number of views
Return type: np.ndarray
- views (
-
fit
(views, y=None, **kwargs)¶ Fits the model by running an inner loop to convergence and then using either CCA or PLS deflation
Parameters: views ( Iterable
[ndarray
]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
-
fit_transform
(views, y=None, **kwargs)¶ Fits and then transforms the training data
Parameters: - views (
Iterable
[ndarray
]) – list/tuple of numpy arrays or array likes with the same number of rows (samples) - kwargs – any additional keyword arguments required by the given model
- views (
-
get_loadings
(views, y=None, **kwargs)¶ Returns the model loadings for each view for the given data
Parameters: - views (
Iterable
[ndarray
]) – list/tuple of numpy arrays or array likes with the same number of rows (samples) - kwargs – any additional keyword arguments required by the given model
- views (
-
score
(views, y=None, **kwargs)¶ Return the coefficient of determination of the prediction.
The coefficient of determination \(R^2\) is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares
((y_true - y_pred)** 2).sum()
and \(v\) is the total sum of squares((y_true - y_true.mean()) ** 2).sum()
. The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.Parameters: - X (array-like of shape (n_samples, n_features)) – Test samples. For some estimators this may be a precomputed
kernel matrix or a list of generic objects instead with shape
(n_samples, n_samples_fitted)
, wheren_samples_fitted
is the number of samples used in the fitting for the estimator. - y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – True values for X.
- sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.
Returns: score – \(R^2\) of
self.predict(X)
wrt. y.Return type: float
Notes
The \(R^2\) score used when calling
score
on a regressor usesmultioutput='uniform_average'
from version 0.23 to keep consistent with default value ofr2_score()
. This influences thescore
method of all the multioutput regressors (except forMultiOutputRegressor
).- X (array-like of shape (n_samples, n_features)) – Test samples. For some estimators this may be a precomputed
kernel matrix or a list of generic objects instead with shape
-
transform
(views, y=None, **kwargs)¶ Transforms data given a fit model
Parameters: - views (
Iterable
[ndarray
]) – numpy arrays with the same number of rows (samples) separated by commas - kwargs – any additional keyword arguments required by the given model
- views (
- latent_dims (
Span CCA¶
-
class
cca_zoo.models.
SpanCCA
(latent_dims=1, scale=True, centre=True, copy_data=True, max_iter=100, initialization='uniform', tol=1e-09, regularisation='l0', c=None, rank=1, positive=None, random_state=None, deflation='cca')[source]¶ Bases:
cca_zoo.models.iterative._Iterative
Fits a Sparse CCA model using SpanCCA.
\[ \begin{align}\begin{aligned}\begin{split}w_{opt}=\underset{w}{\mathrm{argmax}}\{\sum_i\sum_{j\neq i} \|X_iw_i-X_jw_j\|^2 + \text{l1_ratio}\|w_i\|_1\}\\\end{split}\\\text{subject to:}\\w_i^TX_i^TX_iw_i=1\end{aligned}\end{align} \]Citation: Asteris, Megasthenis, et al. “A simple and provable algorithm for sparse diagonal CCA.” International Conference on Machine Learning. PMLR, 2016.
Example: >>> from cca_zoo.models import SpanCCA >>> import numpy as np >>> rng=np.random.RandomState(0) >>> X1 = rng.random((10,5)) >>> X2 = rng.random((10,5)) >>> model = SpanCCA(regularisation="l0", c=[2, 2]) >>> model.fit((X1,X2)).score((X1,X2)) array([0.84556666])
Parameters: - latent_dims (
int
) – number of latent dimensions to fit - scale (
bool
) – normalize variance in each column before fitting - centre – demean data by column before fitting (and before transforming out of sample
- copy_data – If True, X will be copied; else, it may be overwritten
- random_state – Pass for reproducible output across multiple function calls
- max_iter (
int
) – the maximum number of iterations to perform in the inner optimization loop - initialization (
str
) – intialization for optimisation. ‘unregularized’ uses CCA or PLS solution,’random’ uses random initialization,’uniform’ uses uniform initialization of weights and scores - tol (
float
) – tolerance value used for early stopping - regularisation –
- c (
Union
[Iterable
[Union
[float
,int
]],float
,int
,None
]) – regularisation parameter - rank – rank of the approximation
- positive (
Union
[Iterable
[bool
],bool
,None
]) – constrain weights to be positive
-
correlations
(views, y=None, **kwargs)¶ Predicts the correlation for the given data using the fit model
Parameters: - views (
Iterable
[ndarray
]) – list/tuple of numpy arrays or array likes with the same number of rows (samples) - kwargs – any additional keyword arguments required by the given model
Returns: all_corrs: an array of the pairwise correlations (k,k,self.latent_dims) where k is the number of views
Return type: np.ndarray
- views (
-
fit
(views, y=None, **kwargs)¶ Fits the model by running an inner loop to convergence and then using either CCA or PLS deflation
Parameters: views ( Iterable
[ndarray
]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
-
fit_transform
(views, y=None, **kwargs)¶ Fits and then transforms the training data
Parameters: - views (
Iterable
[ndarray
]) – list/tuple of numpy arrays or array likes with the same number of rows (samples) - kwargs – any additional keyword arguments required by the given model
- views (
-
get_loadings
(views, y=None, **kwargs)¶ Returns the model loadings for each view for the given data
Parameters: - views (
Iterable
[ndarray
]) – list/tuple of numpy arrays or array likes with the same number of rows (samples) - kwargs – any additional keyword arguments required by the given model
- views (
-
score
(views, y=None, **kwargs)¶ Return the coefficient of determination of the prediction.
The coefficient of determination \(R^2\) is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares
((y_true - y_pred)** 2).sum()
and \(v\) is the total sum of squares((y_true - y_true.mean()) ** 2).sum()
. The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.Parameters: - X (array-like of shape (n_samples, n_features)) – Test samples. For some estimators this may be a precomputed
kernel matrix or a list of generic objects instead with shape
(n_samples, n_samples_fitted)
, wheren_samples_fitted
is the number of samples used in the fitting for the estimator. - y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – True values for X.
- sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.
Returns: score – \(R^2\) of
self.predict(X)
wrt. y.Return type: float
Notes
The \(R^2\) score used when calling
score
on a regressor usesmultioutput='uniform_average'
from version 0.23 to keep consistent with default value ofr2_score()
. This influences thescore
method of all the multioutput regressors (except forMultiOutputRegressor
).- X (array-like of shape (n_samples, n_features)) – Test samples. For some estimators this may be a precomputed
kernel matrix or a list of generic objects instead with shape
-
transform
(views, y=None, **kwargs)¶ Transforms data given a fit model
Parameters: - views (
Iterable
[ndarray
]) – numpy arrays with the same number of rows (samples) separated by commas - kwargs – any additional keyword arguments required by the given model
- views (
- latent_dims (
Parkhomenko (penalized) CCA¶
-
class
cca_zoo.models.
ParkhomenkoCCA
(latent_dims=1, scale=True, centre=True, copy_data=True, random_state=None, deflation='cca', c=None, max_iter=100, initialization='unregularized', tol=1e-09)[source]¶ Bases:
cca_zoo.models.iterative._Iterative
Fits a sparse CCA (penalized CCA) model
\[ \begin{align}\begin{aligned}\begin{split}w_{opt}=\underset{w}{\mathrm{argmax}}\{ w_1^TX_1^TX_2w_2 \} + c_i\|w_i\|\\\end{split}\\\text{subject to:}\\w_i^Tw_i=1\end{aligned}\end{align} \]Citation: Parkhomenko, Elena, David Tritchler, and Joseph Beyene. “Sparse canonical correlation analysis with application to genomic data integration.” Statistical applications in genetics and molecular biology 8.1 (2009).
Example: >>> from cca_zoo.models import ParkhomenkoCCA >>> import numpy as np >>> rng=np.random.RandomState(0) >>> X1 = rng.random((10,5)) >>> X2 = rng.random((10,5)) >>> model = ParkhomenkoCCA(c=[0.001,0.001],random_state=0) >>> model.fit((X1,X2)).score((X1,X2)) array([0.81803543])
Constructor for ParkhomenkoCCA
Parameters: - latent_dims (
int
) – number of latent dimensions to fit - scale (
bool
) – normalize variance in each column before fitting - centre – demean data by column before fitting (and before transforming out of sample
- copy_data – If True, X will be copied; else, it may be overwritten
- random_state – Pass for reproducible output across multiple function calls
- c (
Union
[Iterable
[float
],float
,None
]) – l1 regularisation parameter - max_iter (
int
) – the maximum number of iterations to perform in the inner optimization loop - initialization (
str
) – intialization for optimisation. ‘unregularized’ uses CCA or PLS solution,’random’ uses random initialization,’uniform’ uses uniform initialization of weights and scores - tol (
float
) – tolerance value used for early stopping
-
correlations
(views, y=None, **kwargs)¶ Predicts the correlation for the given data using the fit model
Parameters: - views (
Iterable
[ndarray
]) – list/tuple of numpy arrays or array likes with the same number of rows (samples) - kwargs – any additional keyword arguments required by the given model
Returns: all_corrs: an array of the pairwise correlations (k,k,self.latent_dims) where k is the number of views
Return type: np.ndarray
- views (
-
fit
(views, y=None, **kwargs)¶ Fits the model by running an inner loop to convergence and then using either CCA or PLS deflation
Parameters: views ( Iterable
[ndarray
]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
-
fit_transform
(views, y=None, **kwargs)¶ Fits and then transforms the training data
Parameters: - views (
Iterable
[ndarray
]) – list/tuple of numpy arrays or array likes with the same number of rows (samples) - kwargs – any additional keyword arguments required by the given model
- views (
-
get_loadings
(views, y=None, **kwargs)¶ Returns the model loadings for each view for the given data
Parameters: - views (
Iterable
[ndarray
]) – list/tuple of numpy arrays or array likes with the same number of rows (samples) - kwargs – any additional keyword arguments required by the given model
- views (
-
score
(views, y=None, **kwargs)¶ Return the coefficient of determination of the prediction.
The coefficient of determination \(R^2\) is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares
((y_true - y_pred)** 2).sum()
and \(v\) is the total sum of squares((y_true - y_true.mean()) ** 2).sum()
. The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.Parameters: - X (array-like of shape (n_samples, n_features)) – Test samples. For some estimators this may be a precomputed
kernel matrix or a list of generic objects instead with shape
(n_samples, n_samples_fitted)
, wheren_samples_fitted
is the number of samples used in the fitting for the estimator. - y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – True values for X.
- sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.
Returns: score – \(R^2\) of
self.predict(X)
wrt. y.Return type: float
Notes
The \(R^2\) score used when calling
score
on a regressor usesmultioutput='uniform_average'
from version 0.23 to keep consistent with default value ofr2_score()
. This influences thescore
method of all the multioutput regressors (except forMultiOutputRegressor
).- X (array-like of shape (n_samples, n_features)) – Test samples. For some estimators this may be a precomputed
kernel matrix or a list of generic objects instead with shape
-
transform
(views, y=None, **kwargs)¶ Transforms data given a fit model
Parameters: - views (
Iterable
[ndarray
]) – numpy arrays with the same number of rows (samples) separated by commas - kwargs – any additional keyword arguments required by the given model
- views (
- latent_dims (
Sparse CCA by ADMM¶
-
class
cca_zoo.models.
SCCA_ADMM
(latent_dims=1, scale=True, centre=True, copy_data=True, random_state=None, deflation='cca', c=None, mu=None, lam=None, eta=None, max_iter=100, initialization='unregularized', tol=1e-09)[source]¶ Bases:
cca_zoo.models.iterative._Iterative
Fits a sparse CCA model by alternating ADMM
\[ \begin{align}\begin{aligned}\begin{split}w_{opt}=\underset{w}{\mathrm{argmax}}\{\sum_i\sum_{j\neq i} \|X_iw_i-X_jw_j\|^2 + \text{l1_ratio}\|w_i\|_1\}\\\end{split}\\\text{subject to:}\\w_i^TX_i^TX_iw_i=1\end{aligned}\end{align} \]Citation: Suo, Xiaotong, et al. “Sparse canonical correlation analysis.” arXiv preprint arXiv:1705.10865 (2017).
Example: >>> from cca_zoo.models import SCCA_ADMM >>> import numpy as np >>> rng=np.random.RandomState(0) >>> X1 = rng.random((10,5)) >>> X2 = rng.random((10,5)) >>> model = SCCA_ADMM(random_state=0) >>> model.fit((X1,X2)).score((X1,X2)) array([0.99999997])
Constructor for SCCA_ADMM
Parameters: - latent_dims (
int
) – number of latent dimensions to fit - scale (
bool
) – normalize variance in each column before fitting - centre – demean data by column before fitting (and before transforming out of sample
- copy_data – If True, X will be copied; else, it may be overwritten
- random_state – Pass for reproducible output across multiple function calls
- c (
Union
[Iterable
[float
],float
,None
]) – l1 regularisation parameter - max_iter (
int
) – the maximum number of iterations to perform in the inner optimization loop - initialization (
str
) – intialization for optimisation. ‘unregularized’ uses CCA or PLS solution,’random’ uses random initialization,’uniform’ uses uniform initialization of weights and scores - tol (
float
) – tolerance value used for early stopping - mu (
Union
[Iterable
[float
],float
,None
]) – - lam (
Union
[Iterable
[float
],float
,None
]) –
Param: eta:
-
correlations
(views, y=None, **kwargs)¶ Predicts the correlation for the given data using the fit model
Parameters: - views (
Iterable
[ndarray
]) – list/tuple of numpy arrays or array likes with the same number of rows (samples) - kwargs – any additional keyword arguments required by the given model
Returns: all_corrs: an array of the pairwise correlations (k,k,self.latent_dims) where k is the number of views
Return type: np.ndarray
- views (
-
fit
(views, y=None, **kwargs)¶ Fits the model by running an inner loop to convergence and then using either CCA or PLS deflation
Parameters: views ( Iterable
[ndarray
]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
-
fit_transform
(views, y=None, **kwargs)¶ Fits and then transforms the training data
Parameters: - views (
Iterable
[ndarray
]) – list/tuple of numpy arrays or array likes with the same number of rows (samples) - kwargs – any additional keyword arguments required by the given model
- views (
-
get_loadings
(views, y=None, **kwargs)¶ Returns the model loadings for each view for the given data
Parameters: - views (
Iterable
[ndarray
]) – list/tuple of numpy arrays or array likes with the same number of rows (samples) - kwargs – any additional keyword arguments required by the given model
- views (
-
score
(views, y=None, **kwargs)¶ Return the coefficient of determination of the prediction.
The coefficient of determination \(R^2\) is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares
((y_true - y_pred)** 2).sum()
and \(v\) is the total sum of squares((y_true - y_true.mean()) ** 2).sum()
. The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.Parameters: - X (array-like of shape (n_samples, n_features)) – Test samples. For some estimators this may be a precomputed
kernel matrix or a list of generic objects instead with shape
(n_samples, n_samples_fitted)
, wheren_samples_fitted
is the number of samples used in the fitting for the estimator. - y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – True values for X.
- sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.
Returns: score – \(R^2\) of
self.predict(X)
wrt. y.Return type: float
Notes
The \(R^2\) score used when calling
score
on a regressor usesmultioutput='uniform_average'
from version 0.23 to keep consistent with default value ofr2_score()
. This influences thescore
method of all the multioutput regressors (except forMultiOutputRegressor
).- X (array-like of shape (n_samples, n_features)) – Test samples. For some estimators this may be a precomputed
kernel matrix or a list of generic objects instead with shape
-
transform
(views, y=None, **kwargs)¶ Transforms data given a fit model
Parameters: - views (
Iterable
[ndarray
]) – numpy arrays with the same number of rows (samples) separated by commas - kwargs – any additional keyword arguments required by the given model
- views (
- latent_dims (
Miscellaneous¶
Sparse Weighted CCA¶
-
class
cca_zoo.models.
SWCCA
(latent_dims=1, scale=True, centre=True, copy_data=True, random_state=None, max_iter=500, initialization='uniform', tol=1e-09, regularisation='l0', c=None, sample_support=None, positive=False)[source]¶ Bases:
cca_zoo.models.iterative._Iterative
A class used to fit SWCCA model
Citation: Example: >>> from cca_zoo.models import SWCCA >>> import numpy as np >>> rng=np.random.RandomState(0) >>> X1 = rng.random((10,5)) >>> X2 = rng.random((10,5)) >>> model = SWCCA(regularisation='l0',c=[2, 2], sample_support=5, random_state=0) >>> model.fit((X1,X2)).score((X1,X2)) array([0.61620969])
Parameters: - latent_dims (
int
) – number of latent dimensions to fit - scale (
bool
) – normalize variance in each column before fitting - centre – demean data by column before fitting (and before transforming out of sample
- copy_data – If True, X will be copied; else, it may be overwritten
- random_state – Pass for reproducible output across multiple function calls
- max_iter (
int
) – the maximum number of iterations to perform in the inner optimization loop - initialization (
str
) – intialization for optimisation. ‘unregularized’ uses CCA or PLS solution,’random’ uses random initialization,’uniform’ uses uniform initialization of weights and scores - tol (
float
) – tolerance value used for early stopping - regularisation – the type of regularisation on the weights either ‘l0’ or ‘l1’
- c (
Union
[Iterable
[Union
[float
,int
]],float
,int
,None
]) – regularisation parameter - sample_support – the l0 norm of the sample weights
- positive – constrain weights to be positive
-
correlations
(views, y=None, **kwargs)¶ Predicts the correlation for the given data using the fit model
Parameters: - views (
Iterable
[ndarray
]) – list/tuple of numpy arrays or array likes with the same number of rows (samples) - kwargs – any additional keyword arguments required by the given model
Returns: all_corrs: an array of the pairwise correlations (k,k,self.latent_dims) where k is the number of views
Return type: np.ndarray
- views (
-
fit
(views, y=None, **kwargs)¶ Fits the model by running an inner loop to convergence and then using either CCA or PLS deflation
Parameters: views ( Iterable
[ndarray
]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
-
fit_transform
(views, y=None, **kwargs)¶ Fits and then transforms the training data
Parameters: - views (
Iterable
[ndarray
]) – list/tuple of numpy arrays or array likes with the same number of rows (samples) - kwargs – any additional keyword arguments required by the given model
- views (
-
get_loadings
(views, y=None, **kwargs)¶ Returns the model loadings for each view for the given data
Parameters: - views (
Iterable
[ndarray
]) – list/tuple of numpy arrays or array likes with the same number of rows (samples) - kwargs – any additional keyword arguments required by the given model
- views (
-
score
(views, y=None, **kwargs)¶ Return the coefficient of determination of the prediction.
The coefficient of determination \(R^2\) is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares
((y_true - y_pred)** 2).sum()
and \(v\) is the total sum of squares((y_true - y_true.mean()) ** 2).sum()
. The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.Parameters: - X (array-like of shape (n_samples, n_features)) – Test samples. For some estimators this may be a precomputed
kernel matrix or a list of generic objects instead with shape
(n_samples, n_samples_fitted)
, wheren_samples_fitted
is the number of samples used in the fitting for the estimator. - y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – True values for X.
- sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.
Returns: score – \(R^2\) of
self.predict(X)
wrt. y.Return type: float
Notes
The \(R^2\) score used when calling
score
on a regressor usesmultioutput='uniform_average'
from version 0.23 to keep consistent with default value ofr2_score()
. This influences thescore
method of all the multioutput regressors (except forMultiOutputRegressor
).- X (array-like of shape (n_samples, n_features)) – Test samples. For some estimators this may be a precomputed
kernel matrix or a list of generic objects instead with shape
-
transform
(views, y=None, **kwargs)¶ Transforms data given a fit model
Parameters: - views (
Iterable
[ndarray
]) – numpy arrays with the same number of rows (samples) separated by commas - kwargs – any additional keyword arguments required by the given model
- views (
- latent_dims (
Base Class¶
-
class
cca_zoo.models.cca_base.
_CCA_Base
(latent_dims=1, scale=True, centre=True, copy_data=True, accept_sparse=False, random_state=None)[source]¶ Bases:
sklearn.base.BaseEstimator
,sklearn.base.MultiOutputMixin
,sklearn.base.RegressorMixin
A class used as the base for methods in the package. Allows methods to inherit fit_transform, predict_corr, and gridsearch_fit when only fit (and transform where it is different to the default) is provided.
-
weights
¶ Type: list of weights for each view
Constructor for _CCA_Base
Parameters: - latent_dims (
int
) – number of latent dimensions to fit - scale – normalize variance in each column before fitting
- centre – demean data by column before fitting (and before transforming out of sample
- copy_data – If True, X will be copied; else, it may be overwritten
- accept_sparse – Whether model can take sparse data as input
- random_state (
Union
[int
,RandomState
,None
]) – Pass for reproducible output across multiple function calls
-
fit
(views, y=None, **kwargs)[source]¶ Fits a given model
Parameters: views ( Iterable
[ndarray
]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)
-
transform
(views, y=None, **kwargs)[source]¶ Transforms data given a fit model
Parameters: - views (
Iterable
[ndarray
]) – numpy arrays with the same number of rows (samples) separated by commas - kwargs – any additional keyword arguments required by the given model
- views (
-
fit_transform
(views, y=None, **kwargs)[source]¶ Fits and then transforms the training data
Parameters: - views (
Iterable
[ndarray
]) – list/tuple of numpy arrays or array likes with the same number of rows (samples) - kwargs – any additional keyword arguments required by the given model
- views (
-
get_loadings
(views, y=None, **kwargs)[source]¶ Returns the model loadings for each view for the given data
Parameters: - views (
Iterable
[ndarray
]) – list/tuple of numpy arrays or array likes with the same number of rows (samples) - kwargs – any additional keyword arguments required by the given model
- views (
-
correlations
(views, y=None, **kwargs)[source]¶ Predicts the correlation for the given data using the fit model
Parameters: - views (
Iterable
[ndarray
]) – list/tuple of numpy arrays or array likes with the same number of rows (samples) - kwargs – any additional keyword arguments required by the given model
Returns: all_corrs: an array of the pairwise correlations (k,k,self.latent_dims) where k is the number of views
Return type: np.ndarray
- views (
-
score
(views, y=None, **kwargs)[source]¶ Return the coefficient of determination of the prediction.
The coefficient of determination \(R^2\) is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares
((y_true - y_pred)** 2).sum()
and \(v\) is the total sum of squares((y_true - y_true.mean()) ** 2).sum()
. The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.Parameters: - X (array-like of shape (n_samples, n_features)) – Test samples. For some estimators this may be a precomputed
kernel matrix or a list of generic objects instead with shape
(n_samples, n_samples_fitted)
, wheren_samples_fitted
is the number of samples used in the fitting for the estimator. - y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – True values for X.
- sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.
Returns: score – \(R^2\) of
self.predict(X)
wrt. y.Return type: float
Notes
The \(R^2\) score used when calling
score
on a regressor usesmultioutput='uniform_average'
from version 0.23 to keep consistent with default value ofr2_score()
. This influences thescore
method of all the multioutput regressors (except forMultiOutputRegressor
).- X (array-like of shape (n_samples, n_features)) – Test samples. For some estimators this may be a precomputed
kernel matrix or a list of generic objects instead with shape
-
_centre_scale
(views)[source]¶ Removes the mean of the training data and standardizes for each view and stores mean and standard deviation during training
Parameters: views ( Iterable
[ndarray
]) – list/tuple of numpy arrays or array likes with the same number of rows (samples)Returns: train_views: the demeaned numpy arrays to be used to fit the model
-