Data classification with interpreTS#
First, we ensure that the required libraries are installed
[ ]:
%pip install sktime scikit-learn
In this tutorial, we show how you can use interpreTS for data classification.
[ ]:
import numpy as np
import interpreTS as it
from sktime.datasets import load_arrow_head
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, accuracy_score
[3]:
#prepare data
X, y = load_arrow_head(return_type="pd-multiindex")
instance_ids = np.unique(X.index.get_level_values(0))
train_ids, test_ids = train_test_split(instance_ids, test_size=0.2, random_state=42)
X_train = X.loc[train_ids]
X_test = X.loc[test_ids]
train_indices = [np.where(instance_ids == id_)[0][0] for id_ in train_ids]
test_indices = [np.where(instance_ids == id_)[0][0] for id_ in test_ids]
y_train = y[train_indices]
y_test = y[test_indices]
print("Train set size:", X_train.shape, y_train.shape)
print("Test set size:", X_test.shape, y_test.shape)
X.head()
Train set size: (42168, 1) (168,)
Test set size: (10793, 1) (43,)
[3]:
dim_0 | ||
---|---|---|
0 | 0 | -1.963009 |
1 | -1.957825 | |
2 | -1.956145 | |
3 | -1.938289 | |
4 | -1.896657 |
[4]:
# create a feature extractor
t = it.FeatureExtractor(window_size=251, stride=251, features="for-ml")
X_train_ts = t.extract_features(X_train)
X_test_ts = t.extract_features(X_test)
X_test_ts.head()
[4]:
absolute_energy_level_0 | absolute_energy_level_1 | absolute_energy_dim_0 | binarize_mean_level_0 | binarize_mean_level_1 | binarize_mean_dim_0 | dominant_level_0 | dominant_level_1 | dominant_dim_0 | entropy_level_0 | ... | trough_dim_0 | variance_level_0 | variance_level_1 | variance_dim_0 | seasonality_strength_level_0 | seasonality_strength_level_1 | seasonality_strength_dim_0 | trend_strength_level_0 | trend_strength_level_1 | trend_strength_dim_0 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 225900 | 5239625 | 250.000000 | 0.0 | 0.501992 | 0.498008 | 30.0 | 225.0 | -0.705068 | 0.0 | ... | -2.168225 | 0.0 | 5250.0 | 0.996016 | 0.0 | 0.976096 | 0.952867 | 0.0 | 1.0 | 1.495348e-04 |
1 | 7512179 | 5239625 | 250.000001 | 0.0 | 0.501992 | 0.553785 | 173.0 | 225.0 | 0.906215 | 0.0 | ... | -1.628334 | 0.0 | 5250.0 | 0.996016 | 0.0 | 0.976096 | 0.973132 | 0.0 | 1.0 | 1.773299e-05 |
2 | 4919600 | 5239625 | 249.999999 | 0.0 | 0.501992 | 0.517928 | 140.0 | 225.0 | 1.005348 | 0.0 | ... | -1.981786 | 0.0 | 5250.0 | 0.996016 | 0.0 | 0.976096 | 0.962001 | 0.0 | 1.0 | 3.883215e-07 |
3 | 1411875 | 5239625 | 250.000000 | 0.0 | 0.501992 | 0.521912 | 75.0 | 225.0 | 0.141633 | 0.0 | ... | -2.048952 | 0.0 | 5250.0 | 0.996016 | 0.0 | 0.976096 | 0.955270 | 0.0 | 1.0 | 1.608024e-04 |
4 | 903600 | 5239625 | 250.000000 | 0.0 | 0.501992 | 0.537849 | 60.0 | 225.0 | -0.837348 | 0.0 | ... | -1.886216 | 0.0 | 5250.0 | 0.996016 | 0.0 | 0.976096 | 0.964042 | 0.0 | 1.0 | 1.305618e-04 |
5 rows × 57 columns
[5]:
# Initialize the classifier
clf = RandomForestClassifier(random_state=42)
# Train the classifier
clf.fit(X_train_ts, y_train)
y_pred = clf.predict(X_test_ts)
print("Accuracy:", accuracy_score(y_test, y_pred))
print("Classification Report:\n", classification_report(y_test, y_pred))
Accuracy: 1.0
Classification Report:
precision recall f1-score support
0 1.00 1.00 1.00 17
1 1.00 1.00 1.00 13
2 1.00 1.00 1.00 13
accuracy 1.00 43
macro avg 1.00 1.00 1.00 43
weighted avg 1.00 1.00 1.00 43