Data classification with interpreTS#

First, we ensure that the required libraries are installed

[ ]:
%pip install sktime scikit-learn

In this tutorial, we show how you can use interpreTS for data classification.

[ ]:
import numpy as np
import interpreTS as it
from sktime.datasets import load_arrow_head
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, accuracy_score
[3]:
#prepare data
X, y = load_arrow_head(return_type="pd-multiindex")
instance_ids = np.unique(X.index.get_level_values(0))
train_ids, test_ids = train_test_split(instance_ids, test_size=0.2, random_state=42)

X_train = X.loc[train_ids]
X_test = X.loc[test_ids]
train_indices = [np.where(instance_ids == id_)[0][0] for id_ in train_ids]
test_indices = [np.where(instance_ids == id_)[0][0] for id_ in test_ids]

y_train = y[train_indices]
y_test = y[test_indices]

print("Train set size:", X_train.shape, y_train.shape)
print("Test set size:", X_test.shape, y_test.shape)
X.head()
Train set size: (42168, 1) (168,)
Test set size: (10793, 1) (43,)
[3]:
dim_0
0 0 -1.963009
1 -1.957825
2 -1.956145
3 -1.938289
4 -1.896657
[4]:
# create a feature extractor
t = it.FeatureExtractor(window_size=251, stride=251, features="for-ml")
X_train_ts = t.extract_features(X_train)
X_test_ts = t.extract_features(X_test)
X_test_ts.head()
[4]:
absolute_energy_level_0 absolute_energy_level_1 absolute_energy_dim_0 binarize_mean_level_0 binarize_mean_level_1 binarize_mean_dim_0 dominant_level_0 dominant_level_1 dominant_dim_0 entropy_level_0 ... trough_dim_0 variance_level_0 variance_level_1 variance_dim_0 seasonality_strength_level_0 seasonality_strength_level_1 seasonality_strength_dim_0 trend_strength_level_0 trend_strength_level_1 trend_strength_dim_0
0 225900 5239625 250.000000 0.0 0.501992 0.498008 30.0 225.0 -0.705068 0.0 ... -2.168225 0.0 5250.0 0.996016 0.0 0.976096 0.952867 0.0 1.0 1.495348e-04
1 7512179 5239625 250.000001 0.0 0.501992 0.553785 173.0 225.0 0.906215 0.0 ... -1.628334 0.0 5250.0 0.996016 0.0 0.976096 0.973132 0.0 1.0 1.773299e-05
2 4919600 5239625 249.999999 0.0 0.501992 0.517928 140.0 225.0 1.005348 0.0 ... -1.981786 0.0 5250.0 0.996016 0.0 0.976096 0.962001 0.0 1.0 3.883215e-07
3 1411875 5239625 250.000000 0.0 0.501992 0.521912 75.0 225.0 0.141633 0.0 ... -2.048952 0.0 5250.0 0.996016 0.0 0.976096 0.955270 0.0 1.0 1.608024e-04
4 903600 5239625 250.000000 0.0 0.501992 0.537849 60.0 225.0 -0.837348 0.0 ... -1.886216 0.0 5250.0 0.996016 0.0 0.976096 0.964042 0.0 1.0 1.305618e-04

5 rows × 57 columns

[5]:
# Initialize the classifier
clf = RandomForestClassifier(random_state=42)

# Train the classifier
clf.fit(X_train_ts, y_train)
y_pred = clf.predict(X_test_ts)

print("Accuracy:", accuracy_score(y_test, y_pred))
print("Classification Report:\n", classification_report(y_test, y_pred))
Accuracy: 1.0
Classification Report:
               precision    recall  f1-score   support

           0       1.00      1.00      1.00        17
           1       1.00      1.00      1.00        13
           2       1.00      1.00      1.00        13

    accuracy                           1.00        43
   macro avg       1.00      1.00      1.00        43
weighted avg       1.00      1.00      1.00        43