Basic Usage of interpreTS Library#

This notebook demonstrates how to use the interpreTS library for feature extraction from time series data.

Step 1: Import Libraries#

[ ]:

import pandas as pd
import numpy as np
from interpreTS.core.feature_extractor import FeatureExtractor, Features

[2]:

import interpreTS
print(f"interpreTS version: {interpreTS.__version__}")

interpreTS version: 0.5.0

Step 2: Prepare Sample Time Series Data#

[3]:

# Create a sample time series DataFrame
np.random.seed(42)  # For reproducibility
data = pd.DataFrame({
    "timestamp": pd.date_range(start="2023-01-01", periods=100, freq="D"),
    "value": np.random.randn(100),
    "id": np.repeat([1, 2], 50)  # Two different time series (IDs 1 and 2)
})

# Display the first few rows of the data
print("Sample data:")
print(data.head())

Sample data:
   timestamp     value  id
0 2023-01-01  0.496714   1
1 2023-01-02 -0.138264   1
2 2023-01-03  0.647689   1
3 2023-01-04  1.523030   1
4 2023-01-05 -0.234153   1

Step 3: Initialize FeatureExtractor#

The FeatureExtractor class is the central component of the library. You can specify features to extract, the time window size, and other parameters.

[4]:

# Initialize the FeatureExtractor
extractor = FeatureExtractor(
    features=[
        Features.MEAN,
        Features.VARIANCE,
        Features.HETEROGENEITY,
        Features.SPIKENESS
    ],  # Specify features to extract
    window_size=10,  # Rolling window size of 10 samples
    stride=5,  # Step size of 5 samples
    id_column="id",  # Group by 'id' column
    feature_column="value"  # Extract features from the 'value' column
)

Step 4: Extract Features#

Use the extract_features method to calculate features for the specified rolling windows.

[5]:

# Extract features
features = extractor.extract_features(data)

# Display the extracted features
print("Extracted features:")
print(features.head())

Extracted features:
   mean_value  variance_value  heterogeneity_value  spikeness_value
0    0.448061        0.470467             1.613638         0.412307
1   -0.213979        1.032079             5.004545        -0.131802
2   -0.790658        0.513464             0.955311         0.015516
3   -0.424293        0.698953             2.077002         1.013266
4   -0.221844        0.596187             3.668792         0.630831

Step 5: Visualize Extracted Features#

Visualize the extracted features to understand the time series’ behavior better.

[6]:

import matplotlib.pyplot as plt
%matplotlib inline

# Plot one of the extracted features over time
plt.figure(figsize=(10, 6))
plt.plot(features.index, features['mean_value'], label="Mean (value)")
plt.title("Extracted Mean Feature Over Time")
plt.xlabel("Index")
plt.ylabel("Mean Value")
plt.legend()
plt.grid()
plt.show()

../_images/notebooks_basic_usage_11_0.png

Step 6: Add a Custom Feature#

You can also add a custom feature to the library.

[7]:

# Define a custom feature function
def calculate_range(data):
    return data.max() - data.min()

# Register the custom feature
extractor.add_custom_feature(
    name="RANGE",
    function=calculate_range,
    metadata={
        "level": "easy",
        "description": "Range of values in the window (max - min)."
    }
)

# Extract features again, including the custom feature
features_with_custom = extractor.extract_features(data)

# Display the features with the custom feature
print("Extracted features with custom feature:")
print(features_with_custom.head())

Custom feature 'RANGE' added successfully.
Extracted features with custom feature:
   mean_value  variance_value  heterogeneity_value  spikeness_value
0    0.448061        0.470467             1.613638         0.412307
1   -0.213979        1.032079             5.004545        -0.131802
2   -0.790658        0.513464             0.955311         0.015516
3   -0.424293        0.698953             2.077002         1.013266
4   -0.221844        0.596187             3.668792         0.630831

Step 7: Use the Library with Time-Based Windows#

[ ]:

data['timestamp'] = pd.to_datetime(data['timestamp'])
data.set_index('timestamp', inplace=True)
data.sort_index(inplace=True)
data = data.asfreq('1D')
data.fillna(method='ffill', inplace=True)

# Initialize the FeatureExtractor with time-based windows
time_based_extractor = FeatureExtractor(
    features=[Features.MEAN, Features.VARIANCE],
    window_size="10d",
    stride="5d",
    id_column="id",
    sort_column="timestamp",
    feature_column="value"
)

time_based_features = time_based_extractor.extract_features(data)

print("Extracted features with time-based windows:")
print(time_based_features.head())

Extracted features with time-based windows:
   mean_value  variance_value
0    0.448061        0.470467
1   -0.213979        1.032079
2   -0.790658        0.513464
3   -0.424293        0.698953
4   -0.221844        0.596187

C:\Users\nisia\AppData\Local\Temp\ipykernel_28680\2252922758.py:5: FutureWarning: DataFrame.fillna with 'method' is deprecated and will raise in a future version. Use obj.ffill() or obj.bfill() instead.
  data.fillna(method='ffill', inplace=True)

Step 8: Use Advanced Features (e.g., HETEROGENEITY)#

Heterogeneity measures the coefficient of variation, providing insights into variability.

[9]:

# Initialize the FeatureExtractor for heterogeneity
heterogeneity_extractor = FeatureExtractor(
    features=[Features.HETEROGENEITY],
    window_size=20,
    stride=10,
    id_column="id",
    feature_column="value"
)

# Extract heterogeneity
heterogeneity_features = heterogeneity_extractor.extract_features(data)

# Display the extracted heterogeneity feature
print("Extracted heterogeneity features:")
print(heterogeneity_features.head())

Extracted heterogeneity features:
   heterogeneity_value
0             5.604416
1             1.615857
2             3.639584
3             3.564077
4             8.506366