Time-Based Aggregation with window_size and stride#
In this tutorial, we demonstrate how to use the time-based aggregation feature in the interpreTS library. This allows you to specify window_size and stride using time intervals (e.g., “5min”, “1h”, “7d”) to extract features from time-series data.
Install and Import the Library#
[ ]:
import pandas as pd
import numpy as np
from interpreTS.core.feature_extractor import FeatureExtractor, Features
from interpreTS.utils.data_conversion import convert_to_time_series
from interpreTS.utils.data_validation import validate_time_series_data
[2]:
import interpreTS
print(f"interpreTS version: {interpreTS.__version__}")
interpreTS version: 0.5.0
Create Sample Data#
Let’s create a simple dataset with timestamps at 5-minute intervals and random values. This will serve as our input time-series data.
[3]:
# Generate sample data
np.random.seed(42)
data = pd.DataFrame({
"timestamp": pd.date_range(start="2023-01-01", periods=1000, freq="5min"),
"value": np.random.randn(1000)
})
data.set_index("timestamp", inplace=True)
print("Sample data:")
display(data.head())
Sample data:
| value | |
|---|---|
| timestamp | |
| 2023-01-01 00:00:00 | 0.496714 |
| 2023-01-01 00:05:00 | -0.138264 |
| 2023-01-01 00:10:00 | 0.647689 |
| 2023-01-01 00:15:00 | 1.523030 |
| 2023-01-01 00:20:00 | -0.234153 |
Validate and Convert the Data#
The library requires data to be in a specific time-series format. We use validate_time_series_data to validate the input and convert_to_time_series to prepare the data for feature extraction.
[4]:
try:
validate_time_series_data(data)
except (TypeError, ValueError) as e:
print(f"Validation error: {e}")
# Convert data to interpreTS time-series format
ts_data = convert_to_time_series(data)
Extract Features with Time-Based Aggregation#
Now, we can extract features using a time-based window_size and stride. This example uses:
window_size="1h"(1-hour windows)stride="30min"(shift the window every 30 minutes).
The selected features are:
Features.LENGTH: The number of observations in each window.Features.MEAN: The average value of the data in the window.Features.VARIANCE: The variance of the data in the window.
[5]:
# Initialize the FeatureExtractor
feature_extractor = FeatureExtractor(
features=[
Features.LENGTH,
Features.MEAN,
Features.VARIANCE
],
window_size="1h",
stride="30min"
)
# Extract features
features = feature_extractor.extract_features(ts_data.data)
print("Extracted Features:")
display(features.head())
Extracted Features:
| length_value | mean_value | variance_value | |
|---|---|---|---|
| 0 | 12 | 0.295955 | 0.507737 |
| 1 | 12 | -0.263877 | 0.944084 |
| 2 | 12 | -0.591232 | 0.916108 |
| 3 | 12 | -0.378230 | 0.629702 |
| 4 | 12 | -0.193335 | 0.737572 |
After running the above code, you will get a DataFrame containing extracted features for each window of data. Each row represents a window, and the columns correspond to the computed features.