{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Time-Based Aggregation with `window_size` and `stride`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this tutorial, we demonstrate how to use the time-based aggregation feature in the `interpreTS` library. This allows you to specify `window_size` and `stride` using time intervals (e.g., \"5min\", \"1h\", \"7d\") to extract features from time-series data."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Install and Import the Library"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np\n",
    "from interpreTS.core.feature_extractor import FeatureExtractor, Features\n",
    "from interpreTS.utils.data_conversion import convert_to_time_series\n",
    "from interpreTS.utils.data_validation import validate_time_series_data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "interpreTS version: 0.5.0\n"
     ]
    }
   ],
   "source": [
    "import interpreTS\n",
    "print(f\"interpreTS version: {interpreTS.__version__}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create Sample Data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's create a simple dataset with timestamps at 5-minute intervals and random values. This will serve as our input time-series data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Sample data:\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>value</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>timestamp</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2023-01-01 00:00:00</th>\n",
       "      <td>0.496714</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2023-01-01 00:05:00</th>\n",
       "      <td>-0.138264</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2023-01-01 00:10:00</th>\n",
       "      <td>0.647689</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2023-01-01 00:15:00</th>\n",
       "      <td>1.523030</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2023-01-01 00:20:00</th>\n",
       "      <td>-0.234153</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                        value\n",
       "timestamp                    \n",
       "2023-01-01 00:00:00  0.496714\n",
       "2023-01-01 00:05:00 -0.138264\n",
       "2023-01-01 00:10:00  0.647689\n",
       "2023-01-01 00:15:00  1.523030\n",
       "2023-01-01 00:20:00 -0.234153"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Generate sample data\n",
    "np.random.seed(42)\n",
    "data = pd.DataFrame({\n",
    "    \"timestamp\": pd.date_range(start=\"2023-01-01\", periods=1000, freq=\"5min\"),\n",
    "    \"value\": np.random.randn(1000)\n",
    "})\n",
    "data.set_index(\"timestamp\", inplace=True)\n",
    "\n",
    "print(\"Sample data:\")\n",
    "display(data.head())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Validate and Convert the Data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The library requires data to be in a specific time-series format. We use `validate_time_series_data` to validate the input and `convert_to_time_series` to prepare the data for feature extraction."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "try:\n",
    "    validate_time_series_data(data)\n",
    "except (TypeError, ValueError) as e:\n",
    "    print(f\"Validation error: {e}\")\n",
    "\n",
    "# Convert data to interpreTS time-series format\n",
    "ts_data = convert_to_time_series(data)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Extract Features with Time-Based Aggregation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, we can extract features using a time-based `window_size` and `stride`. This example uses:\n",
    "\n",
    "- `window_size` = `\"1h\"` (1-hour windows)\n",
    "- `stride` = `\"30min\"` (shift the window every 30 minutes).\n",
    "\n",
    "The selected features are:\n",
    "\n",
    "- `Features.LENGTH`: The number of observations in each window.\n",
    "- `Features.MEAN`: The average value of the data in the window.\n",
    "- `Features.VARIANCE`: The variance of the data in the window."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Extracted Features:\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>length_value</th>\n",
       "      <th>mean_value</th>\n",
       "      <th>variance_value</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>12</td>\n",
       "      <td>0.295955</td>\n",
       "      <td>0.507737</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>12</td>\n",
       "      <td>-0.263877</td>\n",
       "      <td>0.944084</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>12</td>\n",
       "      <td>-0.591232</td>\n",
       "      <td>0.916108</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>12</td>\n",
       "      <td>-0.378230</td>\n",
       "      <td>0.629702</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>12</td>\n",
       "      <td>-0.193335</td>\n",
       "      <td>0.737572</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   length_value  mean_value  variance_value\n",
       "0            12    0.295955        0.507737\n",
       "1            12   -0.263877        0.944084\n",
       "2            12   -0.591232        0.916108\n",
       "3            12   -0.378230        0.629702\n",
       "4            12   -0.193335        0.737572"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Initialize the FeatureExtractor\n",
    "feature_extractor = FeatureExtractor(\n",
    "    features=[\n",
    "        Features.LENGTH, \n",
    "        Features.MEAN, \n",
    "        Features.VARIANCE\n",
    "    ],\n",
    "    window_size=\"1h\",\n",
    "    stride=\"30min\"\n",
    ")\n",
    "\n",
    "# Extract features\n",
    "features = feature_extractor.extract_features(ts_data.data)\n",
    "\n",
    "print(\"Extracted Features:\")\n",
    "display(features.head())\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "After running the above code, you will get a DataFrame containing extracted features for each window of data. Each row represents a window, and the columns correspond to the computed features."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}