cca_zoo.datasets.load_split_cifar10_data#

class cca_zoo.datasets.load_split_cifar10_data(data_home=None, cache=True)[source]#

Bases:

Load and split the CIFAR-10 dataset into two halves based on color channels.

Parameters: - data_home (str or None, optional): The directory where the CIFAR-10 dataset will be cached. If None, the default Scikit-learn cache directory will be used. - cache (bool, optional): Whether to cache the dataset for faster access.

Returns: - cifar_data (Bunch object): A Scikit-learn Bunch object containing the CIFAR-10 dataset. This object has ‘data’ and ‘target’ attributes.

The function fetches the CIFAR-10 dataset from Scikit-learn’s dataset repository and splits it into two halves: - The first half, X1, contains images with the left 16x32 pixel region (red channel, green channel, and blue channel). - The second half, X2, contains images with the right 16x32 pixel region (red channel, green channel, and blue channel).

Each channel of the images is further reshaped and concatenated to form the final feature matrices X1 and X2.

The Bunch object cifar_data also stores these views as ‘views’ attribute.

Note: - CIFAR-10 is a dataset of 60,000 32x32 color images in 10 different classes, with 6,000 images per class. - The images in CIFAR-10 are stored in row-major order, where each row contains the pixel values for a 32x32 image.

Example usage: >>> cifar_data = load_split_cifar10_data() >>> X1, X2 = cifar_data.views >>> print(X1.shape) # Shape of the first half of the dataset >>> print(X2.shape) # Shape of the second half of the dataset