Data

Simulated Data

cca_zoo.data.simulated.generate_covariance_data(n, view_features, latent_dims=1, view_sparsity=None, correlation=1, structure=None, sigma=None, decay=0.5, positive=None, random_state=None)[source]

Function to generate CCA dataset with defined population correlations

Parameters
  • n (int) – number of samples

  • view_sparsity (Optional[List[Union[int, float]]]) – level of sparsity in features in each view either as number of active variables or percentage active

  • view_features (List[int]) – number of features in each view

  • latent_dims (int) – number of latent dimensions

  • correlation (Union[List[float], float]) – correlation either as list with element for each latent dimension or as float which is scaled by ‘decay’

  • structure (Union[str, List[str], None]) – within view covariance structure (‘identity’,’gaussian’,’toeplitz’,’random’)

  • sigma (Union[float, List[float], None]) – gaussian sigma

  • decay (float) – ratio of second signal to first signal

Returns

tuple of numpy arrays: view_1, view_2, true weights from view 1, true weights from view 2, overall covariance structure

Example

>>> from cca_zoo.data import generate_covariance_data
>>> [train_view_1,train_view_2],[true_weights_1,true_weights_2]=generate_covariance_data(200,[10,10],latent_dims=1,correlation=1)
cca_zoo.data.simulated.generate_simple_data(n, view_features, view_sparsity=None, eps=0, transform=False, random_state=None)[source]

Simple latent variable model to generate data with one latent factor

Parameters
  • n (int) – number of samples

  • view_features (List[int]) – number of features view 1

  • view_sparsity (Optional[List[Union[int, float]]]) – number of features view 2

  • eps (float) – gaussian noise std

Returns

view1 matrix, view2 matrix, true weights view 1, true weights view 2

Example

>>> from cca_zoo.data import generate_simple_data
>>> [train_view_1,train_view_2],[true_weights_1,true_weights_2]=generate_covariance_data(200,[10,10])

Utils

class cca_zoo.data.utils.CCA_Dataset(views)[source]

Class that turns numpy arrays into a torch dataset

Parameters

views – list/tuple of numpy arrays or array likes with the same number of rows (samples)