Mathematical Foundations
Canonical Correlation Analysis (CCA) and Partial Least Squares (PLS) models are effective ways of finding associations between multiple views of data.
PCA
It is helpful to start off by formulating PCA in its mathematical form. The first principle component can be written as the solution to the convex optimisation problem:
WHich is optimized for the singular vectors of the covariance matrix \(X^TX\).
PLS
Now consider two data matrices with the same number of samples \(X_1\) and \(X_2\). It is tempting to write a slightly different optimisation problem:
Which is optimised for the left and right singular vectors of the cross covariance matrix \(X_1^TX_2\)
CCA
To arrive at Canonical Correlation we change the constraints slightly so that they now depend on the variance in each view as well as the weights. This makes them data dependent.
Despite these more complex constraints, CCA is the solution to a generalized eigenvalue problem.
Regularized CCA
Notice that the constraints for PLS and CCA are identical in the case that \(X_i^TX_i=I\). This leads naturally to mixing the constraints and in particular the PLS constraint acts in a similar way to the ‘ridge’ regularisation in Ridge Regression.
Other regularized CCA and PLS
There has been lots of research into more general forms of regularisation for CCA, in particular forms that induce sparsity on the weights. In general these can be written in a form somewhat similar to:
These problems usually have no closed form solution and are typically solved by alternately fixing \(w_1\) and \(w_2\).
Kernel CCA and PLS
By expressing the weights as \(w_i=\alpha_iX_i\) we can transform the CCA problem into its kernel form:
Finally we can also consider more general kernel matrices without loss of generalisation. A similar reparameterization exists for PLS and therefore for ridge regularized CCA.
Multiset CCA
Ridge Regularized CCA can be generalized to find correlations between more than one view with the formulation:
This form is often referred to as SUMCOR CCA because it optimizes for the pairwise sum of . This can be formulated as a generalized eigenvalue problem and therefore solved efficiently.
Generalized CCA
Ridge Regularized CCA can be generalized to find correlations between more than one view with the formulation:
This form is often referred to as MAXVAR CCA since it finds an auxiliary vector \(T\) with fixed unit norm that has maximum sum of variance with each view. This can also be formulated as a generalized eigenvalue problem and therefore solved efficiently.
Deep CCA
The ideas behind CCA can be extended to a general form where instead of linear weights, we consider functions \(f(X_i)\). Where these functions are parameterized by neural networks, we can consider a ‘Deep’ CCA: