Time-Based Aggregation with window_size
and stride
#
In this tutorial, we demonstrate how to use the time-based aggregation feature in the interpreTS
library. This allows you to specify window_size
and stride
using time intervals (e.g., “5min”, “1h”, “7d”) to extract features from time-series data.
Install and Import the Library#
[ ]:
import pandas as pd
import numpy as np
from interpreTS.core.feature_extractor import FeatureExtractor, Features
from interpreTS.utils.data_conversion import convert_to_time_series
from interpreTS.utils.data_validation import validate_time_series_data
[2]:
import interpreTS
print(f"interpreTS version: {interpreTS.__version__}")
interpreTS version: 0.5.0
Create Sample Data#
Let’s create a simple dataset with timestamps at 5-minute intervals and random values. This will serve as our input time-series data.
[3]:
# Generate sample data
np.random.seed(42)
data = pd.DataFrame({
"timestamp": pd.date_range(start="2023-01-01", periods=1000, freq="5min"),
"value": np.random.randn(1000)
})
data.set_index("timestamp", inplace=True)
print("Sample data:")
display(data.head())
Sample data:
value | |
---|---|
timestamp | |
2023-01-01 00:00:00 | 0.496714 |
2023-01-01 00:05:00 | -0.138264 |
2023-01-01 00:10:00 | 0.647689 |
2023-01-01 00:15:00 | 1.523030 |
2023-01-01 00:20:00 | -0.234153 |
Validate and Convert the Data#
The library requires data to be in a specific time-series format. We use validate_time_series_data
to validate the input and convert_to_time_series
to prepare the data for feature extraction.
[4]:
try:
validate_time_series_data(data)
except (TypeError, ValueError) as e:
print(f"Validation error: {e}")
# Convert data to interpreTS time-series format
ts_data = convert_to_time_series(data)
Extract Features with Time-Based Aggregation#
Now, we can extract features using a time-based window_size
and stride
. This example uses:
window_size
="1h"
(1-hour windows)stride
="30min"
(shift the window every 30 minutes).
The selected features are:
Features.LENGTH
: The number of observations in each window.Features.MEAN
: The average value of the data in the window.Features.VARIANCE
: The variance of the data in the window.
[5]:
# Initialize the FeatureExtractor
feature_extractor = FeatureExtractor(
features=[
Features.LENGTH,
Features.MEAN,
Features.VARIANCE
],
window_size="1h",
stride="30min"
)
# Extract features
features = feature_extractor.extract_features(ts_data.data)
print("Extracted Features:")
display(features.head())
Extracted Features:
length_value | mean_value | variance_value | |
---|---|---|---|
0 | 12 | 0.295955 | 0.507737 |
1 | 12 | -0.263877 | 0.944084 |
2 | 12 | -0.591232 | 0.916108 |
3 | 12 | -0.378230 | 0.629702 |
4 | 12 | -0.193335 | 0.737572 |
After running the above code, you will get a DataFrame containing extracted features for each window of data. Each row represents a window, and the columns correspond to the computed features.