Data
Contents
Simulated Data
- cca_zoo.data.simulated.linear_simulated_data(n: int, view_features: List[int], latent_dims: int = 1, view_sparsity: Optional[List[Union[int, float]]] = None, correlation: Union[List[float], float] = 1, structure: Optional[Union[str, List[str]]] = None, sigma: Optional[Union[float, List[float]]] = None, decay: float = 0.5, positive=None, random_state: Optional[Union[int, RandomState]] = None)[source]
Function to generate CCA dataset with defined population correlations
- Parameters
n – number of samples
view_sparsity – level of sparsity in features in each view either as number of active variables or percentage active
view_features – number of features in each view
latent_dims – number of latent dimensions
correlation – correlation either as list with element for each latent dimension or as float which is scaled by ‘decay’
structure – within view covariance structure (‘identity’,’gaussian’,’toeplitz’,’random’)
sigma – gaussian sigma
decay – ratio of second signal to first signal
- Returns
tuple of numpy arrays: view_1, view_2, true weights from view 1, true weights from view 2, overall covariance structure
- Example
>>> from cca_zoo.data import linear_simulated_data >>> [train_view_1,train_view_2],[true_weights_1,true_weights_2]=linear_simulated_data(200,[10,10],latent_dims=1,correlation=1)
- cca_zoo.data.simulated.simple_simulated_data(n: int, view_features: List[int], view_sparsity: Optional[List[Union[int, float]]] = None, eps: float = 0, transform=False, random_state=None)[source]
Simple latent variable model to generate data with one latent factor
- Parameters
n – number of samples
view_features – number of features view 1
view_sparsity – number of features view 2
eps – gaussian noise std
- Returns
view1 matrix, view2 matrix, true weights view 1, true weights view 2
- Example
>>> from cca_zoo.data import simple_simulated_data >>> [train_view_1,train_view_2],[true_weights_1,true_weights_2]=linear_simulated_data(200,[10,10])