{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Time-Based Aggregation with `window_size` and `stride`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In this tutorial, we demonstrate how to use the time-based aggregation feature in the `interpreTS` library. This allows you to specify `window_size` and `stride` using time intervals (e.g., \"5min\", \"1h\", \"7d\") to extract features from time-series data."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Install and Import the Library"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"from interpreTS.core.feature_extractor import FeatureExtractor, Features\n",
"from interpreTS.utils.data_conversion import convert_to_time_series\n",
"from interpreTS.utils.data_validation import validate_time_series_data"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"interpreTS version: 0.5.0\n"
]
}
],
"source": [
"import interpreTS\n",
"print(f\"interpreTS version: {interpreTS.__version__}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create Sample Data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's create a simple dataset with timestamps at 5-minute intervals and random values. This will serve as our input time-series data."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Sample data:\n"
]
},
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" value | \n",
"
\n",
" \n",
" timestamp | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 2023-01-01 00:00:00 | \n",
" 0.496714 | \n",
"
\n",
" \n",
" 2023-01-01 00:05:00 | \n",
" -0.138264 | \n",
"
\n",
" \n",
" 2023-01-01 00:10:00 | \n",
" 0.647689 | \n",
"
\n",
" \n",
" 2023-01-01 00:15:00 | \n",
" 1.523030 | \n",
"
\n",
" \n",
" 2023-01-01 00:20:00 | \n",
" -0.234153 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" value\n",
"timestamp \n",
"2023-01-01 00:00:00 0.496714\n",
"2023-01-01 00:05:00 -0.138264\n",
"2023-01-01 00:10:00 0.647689\n",
"2023-01-01 00:15:00 1.523030\n",
"2023-01-01 00:20:00 -0.234153"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Generate sample data\n",
"np.random.seed(42)\n",
"data = pd.DataFrame({\n",
" \"timestamp\": pd.date_range(start=\"2023-01-01\", periods=1000, freq=\"5min\"),\n",
" \"value\": np.random.randn(1000)\n",
"})\n",
"data.set_index(\"timestamp\", inplace=True)\n",
"\n",
"print(\"Sample data:\")\n",
"display(data.head())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Validate and Convert the Data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The library requires data to be in a specific time-series format. We use `validate_time_series_data` to validate the input and `convert_to_time_series` to prepare the data for feature extraction."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"try:\n",
" validate_time_series_data(data)\n",
"except (TypeError, ValueError) as e:\n",
" print(f\"Validation error: {e}\")\n",
"\n",
"# Convert data to interpreTS time-series format\n",
"ts_data = convert_to_time_series(data)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Extract Features with Time-Based Aggregation"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, we can extract features using a time-based `window_size` and `stride`. This example uses:\n",
"\n",
"- `window_size` = `\"1h\"` (1-hour windows)\n",
"- `stride` = `\"30min\"` (shift the window every 30 minutes).\n",
"\n",
"The selected features are:\n",
"\n",
"- `Features.LENGTH`: The number of observations in each window.\n",
"- `Features.MEAN`: The average value of the data in the window.\n",
"- `Features.VARIANCE`: The variance of the data in the window."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Extracted Features:\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" length_value | \n",
" mean_value | \n",
" variance_value | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 12 | \n",
" 0.295955 | \n",
" 0.507737 | \n",
"
\n",
" \n",
" 1 | \n",
" 12 | \n",
" -0.263877 | \n",
" 0.944084 | \n",
"
\n",
" \n",
" 2 | \n",
" 12 | \n",
" -0.591232 | \n",
" 0.916108 | \n",
"
\n",
" \n",
" 3 | \n",
" 12 | \n",
" -0.378230 | \n",
" 0.629702 | \n",
"
\n",
" \n",
" 4 | \n",
" 12 | \n",
" -0.193335 | \n",
" 0.737572 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" length_value mean_value variance_value\n",
"0 12 0.295955 0.507737\n",
"1 12 -0.263877 0.944084\n",
"2 12 -0.591232 0.916108\n",
"3 12 -0.378230 0.629702\n",
"4 12 -0.193335 0.737572"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Initialize the FeatureExtractor\n",
"feature_extractor = FeatureExtractor(\n",
" features=[\n",
" Features.LENGTH, \n",
" Features.MEAN, \n",
" Features.VARIANCE\n",
" ],\n",
" window_size=\"1h\",\n",
" stride=\"30min\"\n",
")\n",
"\n",
"# Extract features\n",
"features = feature_extractor.extract_features(ts_data.data)\n",
"\n",
"print(\"Extracted Features:\")\n",
"display(features.head())\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"After running the above code, you will get a DataFrame containing extracted features for each window of data. Each row represents a window, and the columns correspond to the computed features."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.2"
}
},
"nbformat": 4,
"nbformat_minor": 2
}