Time-Based Aggregation with window_size and stride#

In this tutorial, we demonstrate how to use the time-based aggregation feature in the interpreTS library. This allows you to specify window_size and stride using time intervals (e.g., “5min”, “1h”, “7d”) to extract features from time-series data.

Install and Import the Library#

[ ]:
import pandas as pd
import numpy as np
from interpreTS.core.feature_extractor import FeatureExtractor, Features
from interpreTS.utils.data_conversion import convert_to_time_series
from interpreTS.utils.data_validation import validate_time_series_data
[2]:
import interpreTS
print(f"interpreTS version: {interpreTS.__version__}")
interpreTS version: 0.5.0

Create Sample Data#

Let’s create a simple dataset with timestamps at 5-minute intervals and random values. This will serve as our input time-series data.

[3]:
# Generate sample data
np.random.seed(42)
data = pd.DataFrame({
    "timestamp": pd.date_range(start="2023-01-01", periods=1000, freq="5min"),
    "value": np.random.randn(1000)
})
data.set_index("timestamp", inplace=True)

print("Sample data:")
display(data.head())
Sample data:
value
timestamp
2023-01-01 00:00:00 0.496714
2023-01-01 00:05:00 -0.138264
2023-01-01 00:10:00 0.647689
2023-01-01 00:15:00 1.523030
2023-01-01 00:20:00 -0.234153

Validate and Convert the Data#

The library requires data to be in a specific time-series format. We use validate_time_series_data to validate the input and convert_to_time_series to prepare the data for feature extraction.

[4]:
try:
    validate_time_series_data(data)
except (TypeError, ValueError) as e:
    print(f"Validation error: {e}")

# Convert data to interpreTS time-series format
ts_data = convert_to_time_series(data)

Extract Features with Time-Based Aggregation#

Now, we can extract features using a time-based window_size and stride. This example uses:

  • window_size = "1h" (1-hour windows)

  • stride = "30min" (shift the window every 30 minutes).

The selected features are:

  • Features.LENGTH: The number of observations in each window.

  • Features.MEAN: The average value of the data in the window.

  • Features.VARIANCE: The variance of the data in the window.

[5]:
# Initialize the FeatureExtractor
feature_extractor = FeatureExtractor(
    features=[
        Features.LENGTH,
        Features.MEAN,
        Features.VARIANCE
    ],
    window_size="1h",
    stride="30min"
)

# Extract features
features = feature_extractor.extract_features(ts_data.data)

print("Extracted Features:")
display(features.head())

Extracted Features:
length_value mean_value variance_value
0 12 0.295955 0.507737
1 12 -0.263877 0.944084
2 12 -0.591232 0.916108
3 12 -0.378230 0.629702
4 12 -0.193335 0.737572

After running the above code, you will get a DataFrame containing extracted features for each window of data. Each row represents a window, and the columns correspond to the computed features.