Time-Based Aggregation with `window_size` and `stride`#

In this tutorial, we demonstrate how to use the time-based aggregation feature in the interpreTS library. This allows you to specify window_size and stride using time intervals (e.g., “5min”, “1h”, “7d”) to extract features from time-series data.

Install and Import the Library#

[ ]:

import pandas as pd
import numpy as np
from interpreTS.core.feature_extractor import FeatureExtractor, Features
from interpreTS.utils.data_conversion import convert_to_time_series
from interpreTS.utils.data_validation import validate_time_series_data

[2]:

import interpreTS
print(f"interpreTS version: {interpreTS.__version__}")

interpreTS version: 0.5.0

Create Sample Data#

Let’s create a simple dataset with timestamps at 5-minute intervals and random values. This will serve as our input time-series data.

[3]:

# Generate sample data
np.random.seed(42)
data = pd.DataFrame({
    "timestamp": pd.date_range(start="2023-01-01", periods=1000, freq="5min"),
    "value": np.random.randn(1000)
})
data.set_index("timestamp", inplace=True)

print("Sample data:")
display(data.head())

Sample data:

	value
timestamp
2023-01-01 00:00:00	0.496714
2023-01-01 00:05:00	-0.138264
2023-01-01 00:10:00	0.647689
2023-01-01 00:15:00	1.523030
2023-01-01 00:20:00	-0.234153

Validate and Convert the Data#

The library requires data to be in a specific time-series format. We use validate_time_series_data to validate the input and convert_to_time_series to prepare the data for feature extraction.

[4]:

try:
    validate_time_series_data(data)
except (TypeError, ValueError) as e:
    print(f"Validation error: {e}")

# Convert data to interpreTS time-series format
ts_data = convert_to_time_series(data)

Extract Features with Time-Based Aggregation#

Now, we can extract features using a time-based window_size and stride. This example uses:

window_size = "1h" (1-hour windows)
stride = "30min" (shift the window every 30 minutes).

The selected features are:

Features.LENGTH: The number of observations in each window.
Features.MEAN: The average value of the data in the window.
Features.VARIANCE: The variance of the data in the window.

[5]:

# Initialize the FeatureExtractor
feature_extractor = FeatureExtractor(
    features=[
        Features.LENGTH,
        Features.MEAN,
        Features.VARIANCE
    ],
    window_size="1h",
    stride="30min"
)

# Extract features
features = feature_extractor.extract_features(ts_data.data)

print("Extracted Features:")
display(features.head())

Extracted Features:

	length_value	mean_value	variance_value
0	12	0.295955	0.507737
1	12	-0.263877	0.944084
2	12	-0.591232	0.916108
3	12	-0.378230	0.629702
4	12	-0.193335	0.737572

After running the above code, you will get a DataFrame containing extracted features for each window of data. Each row represents a window, and the columns correspond to the computed features.

Time-Based Aggregation with window_size and stride#

Install and Import the Library#

Create Sample Data#

Validate and Convert the Data#

Extract Features with Time-Based Aggregation#

This Page

Time-Based Aggregation with `window_size` and `stride`#