{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Time-Based Aggregation with `window_size` and `stride`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this tutorial, we demonstrate how to use the time-based aggregation feature in the `interpreTS` library. This allows you to specify `window_size` and `stride` using time intervals (e.g., \"5min\", \"1h\", \"7d\") to extract features from time-series data." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Install and Import the Library" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "from interpreTS.core.feature_extractor import FeatureExtractor, Features\n", "from interpreTS.utils.data_conversion import convert_to_time_series\n", "from interpreTS.utils.data_validation import validate_time_series_data" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "interpreTS version: 0.5.0\n" ] } ], "source": [ "import interpreTS\n", "print(f\"interpreTS version: {interpreTS.__version__}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Create Sample Data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's create a simple dataset with timestamps at 5-minute intervals and random values. This will serve as our input time-series data." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Sample data:\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
value
timestamp
2023-01-01 00:00:000.496714
2023-01-01 00:05:00-0.138264
2023-01-01 00:10:000.647689
2023-01-01 00:15:001.523030
2023-01-01 00:20:00-0.234153
\n", "
" ], "text/plain": [ " value\n", "timestamp \n", "2023-01-01 00:00:00 0.496714\n", "2023-01-01 00:05:00 -0.138264\n", "2023-01-01 00:10:00 0.647689\n", "2023-01-01 00:15:00 1.523030\n", "2023-01-01 00:20:00 -0.234153" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Generate sample data\n", "np.random.seed(42)\n", "data = pd.DataFrame({\n", " \"timestamp\": pd.date_range(start=\"2023-01-01\", periods=1000, freq=\"5min\"),\n", " \"value\": np.random.randn(1000)\n", "})\n", "data.set_index(\"timestamp\", inplace=True)\n", "\n", "print(\"Sample data:\")\n", "display(data.head())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Validate and Convert the Data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The library requires data to be in a specific time-series format. We use `validate_time_series_data` to validate the input and `convert_to_time_series` to prepare the data for feature extraction." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "try:\n", " validate_time_series_data(data)\n", "except (TypeError, ValueError) as e:\n", " print(f\"Validation error: {e}\")\n", "\n", "# Convert data to interpreTS time-series format\n", "ts_data = convert_to_time_series(data)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Extract Features with Time-Based Aggregation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, we can extract features using a time-based `window_size` and `stride`. This example uses:\n", "\n", "- `window_size` = `\"1h\"` (1-hour windows)\n", "- `stride` = `\"30min\"` (shift the window every 30 minutes).\n", "\n", "The selected features are:\n", "\n", "- `Features.LENGTH`: The number of observations in each window.\n", "- `Features.MEAN`: The average value of the data in the window.\n", "- `Features.VARIANCE`: The variance of the data in the window." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Extracted Features:\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
length_valuemean_valuevariance_value
0120.2959550.507737
112-0.2638770.944084
212-0.5912320.916108
312-0.3782300.629702
412-0.1933350.737572
\n", "
" ], "text/plain": [ " length_value mean_value variance_value\n", "0 12 0.295955 0.507737\n", "1 12 -0.263877 0.944084\n", "2 12 -0.591232 0.916108\n", "3 12 -0.378230 0.629702\n", "4 12 -0.193335 0.737572" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Initialize the FeatureExtractor\n", "feature_extractor = FeatureExtractor(\n", " features=[\n", " Features.LENGTH, \n", " Features.MEAN, \n", " Features.VARIANCE\n", " ],\n", " window_size=\"1h\",\n", " stride=\"30min\"\n", ")\n", "\n", "# Extract features\n", "features = feature_extractor.extract_features(ts_data.data)\n", "\n", "print(\"Extracted Features:\")\n", "display(features.head())\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "After running the above code, you will get a DataFrame containing extracted features for each window of data. Each row represents a window, and the columns correspond to the computed features." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.2" } }, "nbformat": 4, "nbformat_minor": 2 }