{
  "cells": [
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "%matplotlib inline"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "\n# Hyperparameter Selection\n\nThis script will show how to perform hyperparameter selection\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "import numpy as np\nimport pandas as pd\nfrom sklearn.utils.fixes import loguniform\n\nfrom cca_zoo.data import generate_covariance_data\nfrom cca_zoo.model_selection import GridSearchCV, RandomizedSearchCV\nfrom cca_zoo.models import KCCA"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "np.random.seed(42)\nn = 200\np = 100\nq = 100\nlatent_dims = 1\ncv = 3\n\n(X, Y), (tx, ty) = generate_covariance_data(\n    n, view_features=[p, q], latent_dims=latent_dims, correlation=[0.9]\n)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Grid Search\nHyperparameter selection works in a very similar way to in scikit-learn where the main difference is in how we enter the parameter grid.\nWe form a parameter grid with the search space for each view for each parameter.\nThis search space must be entered as a list but can be any of\n- a single value (as in \"kernel\") where this value will be used for each view\n- a list for each view\n- a mixture of a single value for one view and a distribution or list for the other\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "param_grid = {\"kernel\": [\"poly\"], \"c\": [[1e-1], [1e-1, 2e-1]], \"degree\": [[2], [2, 3]]}\nkernel_reg = GridSearchCV(\n    KCCA(latent_dims=latent_dims), param_grid=param_grid, cv=cv, verbose=True\n).fit([X, Y])\nprint(pd.DataFrame(kernel_reg.cv_results_))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Randomized Search\nWith Randomized Search we can additionally use distributions from scikit-learn to define the parameter search space\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "param_grid = {\n    \"kernel\": [\"poly\"],\n    \"c\": [loguniform(1e-1, 2e-1), [1e-1]],\n    \"degree\": [[2], [2, 3]],\n}\nkernel_reg = RandomizedSearchCV(\n    KCCA(latent_dims=latent_dims), param_distributions=param_grid, cv=cv, verbose=True\n).fit([X, Y])\nprint(pd.DataFrame(kernel_reg.cv_results_))"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.7.9"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}