Data¶

Simulated Data¶

cca_zoo.data.simulated.generate_covariance_data(n, view_features, latent_dims=1, view_sparsity=None, correlation=1, structure=None, sigma=None, decay=0.5, positive=None, random_state=None)[source]¶

Function to generate CCA dataset with defined population correlations

Parameters:	n (`int`) – number of samples view_sparsity (`Optional`[`List`[`Union`[`int`, `float`]]]) – level of sparsity in features in each view either as number of active variables or percentage active view_features (`List`[`int`]) – number of features in each view latent_dims (`int`) – number of latent dimensions correlation (`Union`[`List`[`float`], `float`]) – correlation either as list with element for each latent dimension or as float which is scaled by ‘decay’ structure (`Union`[`str`, `List`[`str`], `None`]) – within view covariance structure (‘identity’,’gaussian’,’toeplitz’,’random’) sigma (`Union`[`List`[`float`], `float`, `None`]) – gaussian sigma decay (`float`) – ratio of second signal to first signal
Returns:	tuple of numpy arrays: view_1, view_2, true weights from view 1, true weights from view 2, overall covariance structure
Example:

>>> from cca_zoo.data import generate_covariance_data
>>> [train_view_1,train_view_2],[true_weights_1,true_weights_2]=generate_covariance_data(200,[10,10],latent_dims=1,correlation=1)

cca_zoo.data.simulated.generate_simple_data(n, view_features, view_sparsity=None, eps=0, transform=True, random_state=None)[source]¶

Simple latent variable model to generate data with one latent factor

Parameters:	n (`int`) – number of samples view_features (`List`[`int`]) – number of features view 1 view_sparsity (`Optional`[`List`[`Union`[`int`, `float`]]]) – number of features view 2 eps (`float`) – gaussian noise std
Returns:	view1 matrix, view2 matrix, true weights view 1, true weights view 2
Example:

>>> from cca_zoo.data import generate_simple_data
>>> [train_view_1,train_view_2],[true_weights_1,true_weights_2]=generate_covariance_data(200,[10,10])

Toy Data¶

Helped by https://github.com/bcdutton/AdversarialCanonicalCorrelationAnalysis (hopefully I will have my own implementation of their work soon) Check out their paper at https://arxiv.org/abs/2005.10349

class cca_zoo.data.toy.Split_MNIST_Dataset(mnist_type='MNIST', train=True, flatten=True)[source]¶

Bases: torch.utils.data.dataset.Dataset

Class to generate paired noisy mnist data

Parameters:	mnist_type (`str`) – “MNIST”, “FashionMNIST” or “KMNIST” train (`bool`) – whether this is train or test flatten (`bool`) – whether to flatten the data into array or use 2d images

to_numpy(indices=None)[source]¶

Converts dataset to numpy array form

Parameters:	indices – indices of the samples to extract into numpy arrays

class cca_zoo.data.toy.Noisy_MNIST_Dataset(mnist_type='MNIST', train=True, flatten=True)[source]¶

Bases: torch.utils.data.dataset.Dataset

Class to generate paired noisy mnist data

Parameters:	mnist_type (`str`) – “MNIST”, “FashionMNIST” or “KMNIST” train (`bool`) – whether this is train or test flatten (`bool`) – whether to flatten the data into array or use 2d images

class cca_zoo.data.toy.Tangled_MNIST_Dataset(mnist_type='MNIST', train=True, flatten=True)[source]¶

Bases: torch.utils.data.dataset.Dataset

Class to generate paired tangled MNIST dataset

Parameters:	mnist_type – “MNIST”, “FashionMNIST” or “KMNIST” train – whether this is train or test flatten – whether to flatten the data into array or use 2d images

Utils¶

class cca_zoo.data.utils.CCA_Dataset(views, labels=None)[source]¶

Bases: torch.utils.data.dataset.Dataset

Class that turns numpy arrays into a torch dataset

Parameters:	views – list/tuple of numpy arrays or array likes with the same number of rows (samples) labels – optional labels