Data¶
Simulated Data¶
-
cca_zoo.data.simulated.
generate_covariance_data
(n, view_features, latent_dims=1, view_sparsity=None, correlation=1, structure=None, sigma=None, decay=0.5, positive=None, random_state=None)[source]¶ Function to generate CCA dataset with defined population correlations
Parameters: - n (
int
) – number of samples - view_sparsity (
Optional
[List
[Union
[int
,float
]]]) – level of sparsity in features in each view either as number of active variables or percentage active - view_features (
List
[int
]) – number of features in each view - latent_dims (
int
) – number of latent dimensions - correlation (
Union
[List
[float
],float
]) – correlation either as list with element for each latent dimension or as float which is scaled by ‘decay’ - structure (
Union
[str
,List
[str
],None
]) – within view covariance structure (‘identity’,’gaussian’,’toeplitz’,’random’) - sigma (
Union
[List
[float
],float
,None
]) – gaussian sigma - decay (
float
) – ratio of second signal to first signal
Returns: tuple of numpy arrays: view_1, view_2, true weights from view 1, true weights from view 2, overall covariance structure
Example: >>> from cca_zoo.data import generate_covariance_data >>> [train_view_1,train_view_2],[true_weights_1,true_weights_2]=generate_covariance_data(200,[10,10],latent_dims=1,correlation=1)
- n (
-
cca_zoo.data.simulated.
generate_simple_data
(n, view_features, view_sparsity=None, eps=0, transform=True, random_state=None)[source]¶ Simple latent variable model to generate data with one latent factor
Parameters: - n (
int
) – number of samples - view_features (
List
[int
]) – number of features view 1 - view_sparsity (
Optional
[List
[Union
[int
,float
]]]) – number of features view 2 - eps (
float
) – gaussian noise std
Returns: view1 matrix, view2 matrix, true weights view 1, true weights view 2
Example: >>> from cca_zoo.data import generate_simple_data >>> [train_view_1,train_view_2],[true_weights_1,true_weights_2]=generate_covariance_data(200,[10,10])
- n (
Toy Data¶
Helped by https://github.com/bcdutton/AdversarialCanonicalCorrelationAnalysis (hopefully I will have my own implementation of their work soon) Check out their paper at https://arxiv.org/abs/2005.10349
-
class
cca_zoo.data.toy.
Split_MNIST_Dataset
(mnist_type='MNIST', train=True, flatten=True)[source]¶ Bases:
torch.utils.data.dataset.Dataset
Class to generate paired noisy mnist data
Parameters: - mnist_type (
str
) – “MNIST”, “FashionMNIST” or “KMNIST” - train (
bool
) – whether this is train or test - flatten (
bool
) – whether to flatten the data into array or use 2d images
- mnist_type (
-
class
cca_zoo.data.toy.
Noisy_MNIST_Dataset
(mnist_type='MNIST', train=True, flatten=True)[source]¶ Bases:
torch.utils.data.dataset.Dataset
Class to generate paired noisy mnist data
Parameters: - mnist_type (
str
) – “MNIST”, “FashionMNIST” or “KMNIST” - train (
bool
) – whether this is train or test - flatten (
bool
) – whether to flatten the data into array or use 2d images
- mnist_type (
-
class
cca_zoo.data.toy.
Tangled_MNIST_Dataset
(mnist_type='MNIST', train=True, flatten=True)[source]¶ Bases:
torch.utils.data.dataset.Dataset
Class to generate paired tangled MNIST dataset
Parameters: - mnist_type – “MNIST”, “FashionMNIST” or “KMNIST”
- train – whether this is train or test
- flatten – whether to flatten the data into array or use 2d images