Data classification with interpreTS#

First, we ensure that the required libraries are installed

[ ]:

%pip install sktime scikit-learn

In this tutorial, we show how you can use interpreTS for data classification.

[ ]:

import numpy as np
import interpreTS as it
from sktime.datasets import load_arrow_head
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, accuracy_score

[3]:

#prepare data
X, y = load_arrow_head(return_type="pd-multiindex")
instance_ids = np.unique(X.index.get_level_values(0))
train_ids, test_ids = train_test_split(instance_ids, test_size=0.2, random_state=42)

X_train = X.loc[train_ids]
X_test = X.loc[test_ids]
train_indices = [np.where(instance_ids == id_)[0][0] for id_ in train_ids]
test_indices = [np.where(instance_ids == id_)[0][0] for id_ in test_ids]

y_train = y[train_indices]
y_test = y[test_indices]

print("Train set size:", X_train.shape, y_train.shape)
print("Test set size:", X_test.shape, y_test.shape)
X.head()

Train set size: (42168, 1) (168,)
Test set size: (10793, 1) (43,)

[3]:

		dim_0
0	0	-1.963009
	1	-1.957825
	2	-1.956145
	3	-1.938289
	4	-1.896657

[4]:

# create a feature extractor
t = it.FeatureExtractor(window_size=251, stride=251, features="for-ml")
X_train_ts = t.extract_features(X_train)
X_test_ts = t.extract_features(X_test)
X_test_ts.head()

[4]:

	absolute_energy_level_0	absolute_energy_level_1	absolute_energy_dim_0	binarize_mean_level_1	binarize_mean_dim_0	dominant_level_0	dominant_level_1	dominant_dim_0	...	trough_dim_0	variance_level_1	variance_dim_0	seasonality_strength_level_1	seasonality_strength_dim_0	trend_strength_level_1	trend_strength_dim_0
0	225900	5239625	250.000000	0.501992	0.498008	30.0	225.0	-0.705068	...	-2.168225	5250.0	0.996016	0.976096	0.952867	1.0	1.495348e-04
1	7512179	5239625	250.000001	0.501992	0.553785	173.0	225.0	0.906215	...	-1.628334	5250.0	0.996016	0.976096	0.973132	1.0	1.773299e-05
2	4919600	5239625	249.999999	0.501992	0.517928	140.0	225.0	1.005348	...	-1.981786	5250.0	0.996016	0.976096	0.962001	1.0	3.883215e-07
3	1411875	5239625	250.000000	0.501992	0.521912	75.0	225.0	0.141633	...	-2.048952	5250.0	0.996016	0.976096	0.955270	1.0	1.608024e-04
4	903600	5239625	250.000000	0.501992	0.537849	60.0	225.0	-0.837348	...	-1.886216	5250.0	0.996016	0.976096	0.964042	1.0	1.305618e-04

5 rows × 57 columns

[5]:

# Initialize the classifier
clf = RandomForestClassifier(random_state=42)

# Train the classifier
clf.fit(X_train_ts, y_train)
y_pred = clf.predict(X_test_ts)

print("Accuracy:", accuracy_score(y_test, y_pred))
print("Classification Report:\n", classification_report(y_test, y_pred))

Accuracy: 1.0
Classification Report:
               precision    recall  f1-score   support

           0       1.00      1.00      1.00        17
           1       1.00      1.00      1.00        13
           2       1.00      1.00      1.00        13

    accuracy                           1.00        43
   macro avg       1.00      1.00      1.00        43
weighted avg       1.00      1.00      1.00        43

Data classification with interpreTS#

This Page