{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "### Basic Usage of interpreTS Library\n", "\n", "This notebook demonstrates how to use the interpreTS library for feature extraction from time series data." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Step 1: Import Libraries\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "from interpreTS.core.feature_extractor import FeatureExtractor, Features" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "interpreTS version: 0.5.0\n" ] } ], "source": [ "import interpreTS\n", "print(f\"interpreTS version: {interpreTS.__version__}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Step 2: Prepare Sample Time Series Data" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Sample data:\n", " timestamp value id\n", "0 2023-01-01 0.496714 1\n", "1 2023-01-02 -0.138264 1\n", "2 2023-01-03 0.647689 1\n", "3 2023-01-04 1.523030 1\n", "4 2023-01-05 -0.234153 1\n" ] } ], "source": [ "# Create a sample time series DataFrame\n", "np.random.seed(42) # For reproducibility\n", "data = pd.DataFrame({\n", " \"timestamp\": pd.date_range(start=\"2023-01-01\", periods=100, freq=\"D\"),\n", " \"value\": np.random.randn(100),\n", " \"id\": np.repeat([1, 2], 50) # Two different time series (IDs 1 and 2)\n", "})\n", "\n", "# Display the first few rows of the data\n", "print(\"Sample data:\")\n", "print(data.head())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Step 3: Initialize FeatureExtractor\n", "\n", "The FeatureExtractor class is the central component of the library. You can specify features to extract, the time window size, and other parameters." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "# Initialize the FeatureExtractor\n", "extractor = FeatureExtractor(\n", " features=[\n", " Features.MEAN,\n", " Features.VARIANCE,\n", " Features.HETEROGENEITY,\n", " Features.SPIKENESS\n", " ], # Specify features to extract\n", " window_size=10, # Rolling window size of 10 samples\n", " stride=5, # Step size of 5 samples\n", " id_column=\"id\", # Group by 'id' column\n", " feature_column=\"value\" # Extract features from the 'value' column\n", ")\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Step 4: Extract Features\n", "\n", "Use the extract_features method to calculate features for the specified rolling windows." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Extracted features:\n", " mean_value variance_value heterogeneity_value spikeness_value\n", "0 0.448061 0.470467 1.613638 0.412307\n", "1 -0.213979 1.032079 5.004545 -0.131802\n", "2 -0.790658 0.513464 0.955311 0.015516\n", "3 -0.424293 0.698953 2.077002 1.013266\n", "4 -0.221844 0.596187 3.668792 0.630831\n" ] } ], "source": [ "# Extract features\n", "features = extractor.extract_features(data)\n", "\n", "# Display the extracted features\n", "print(\"Extracted features:\")\n", "print(features.head())\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Step 5: Visualize Extracted Features\n", "\n", "Visualize the extracted features to understand the time series' behavior better." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import matplotlib.pyplot as plt\n", "%matplotlib inline\n", "\n", "# Plot one of the extracted features over time\n", "plt.figure(figsize=(10, 6))\n", "plt.plot(features.index, features['mean_value'], label=\"Mean (value)\")\n", "plt.title(\"Extracted Mean Feature Over Time\")\n", "plt.xlabel(\"Index\")\n", "plt.ylabel(\"Mean Value\")\n", "plt.legend()\n", "plt.grid()\n", "plt.show()\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Step 6: Add a Custom Feature\n", "\n", "You can also add a custom feature to the library." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Custom feature 'RANGE' added successfully.\n", "Extracted features with custom feature:\n", " mean_value variance_value heterogeneity_value spikeness_value\n", "0 0.448061 0.470467 1.613638 0.412307\n", "1 -0.213979 1.032079 5.004545 -0.131802\n", "2 -0.790658 0.513464 0.955311 0.015516\n", "3 -0.424293 0.698953 2.077002 1.013266\n", "4 -0.221844 0.596187 3.668792 0.630831\n" ] } ], "source": [ "# Define a custom feature function\n", "def calculate_range(data):\n", " return data.max() - data.min()\n", "\n", "# Register the custom feature\n", "extractor.add_custom_feature(\n", " name=\"RANGE\",\n", " function=calculate_range,\n", " metadata={\n", " \"level\": \"easy\",\n", " \"description\": \"Range of values in the window (max - min).\"\n", " }\n", ")\n", "\n", "# Extract features again, including the custom feature\n", "features_with_custom = extractor.extract_features(data)\n", "\n", "# Display the features with the custom feature\n", "print(\"Extracted features with custom feature:\")\n", "print(features_with_custom.head())\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Step 7: Use the Library with Time-Based Windows" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Extracted features with time-based windows:\n", " mean_value variance_value\n", "0 0.448061 0.470467\n", "1 -0.213979 1.032079\n", "2 -0.790658 0.513464\n", "3 -0.424293 0.698953\n", "4 -0.221844 0.596187\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "C:\\Users\\nisia\\AppData\\Local\\Temp\\ipykernel_28680\\2252922758.py:5: FutureWarning: DataFrame.fillna with 'method' is deprecated and will raise in a future version. Use obj.ffill() or obj.bfill() instead.\n", " data.fillna(method='ffill', inplace=True)\n" ] } ], "source": [ "data['timestamp'] = pd.to_datetime(data['timestamp'])\n", "data.set_index('timestamp', inplace=True)\n", "data.sort_index(inplace=True)\n", "data = data.asfreq('1D')\n", "data.fillna(method='ffill', inplace=True)\n", "\n", "# Initialize the FeatureExtractor with time-based windows\n", "time_based_extractor = FeatureExtractor(\n", " features=[Features.MEAN, Features.VARIANCE],\n", " window_size=\"10d\",\n", " stride=\"5d\",\n", " id_column=\"id\",\n", " sort_column=\"timestamp\",\n", " feature_column=\"value\"\n", ")\n", "\n", "time_based_features = time_based_extractor.extract_features(data)\n", "\n", "print(\"Extracted features with time-based windows:\")\n", "print(time_based_features.head())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Step 8: Use Advanced Features (e.g., HETEROGENEITY)\n", "\n", "Heterogeneity measures the coefficient of variation, providing insights into variability." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Extracted heterogeneity features:\n", " heterogeneity_value\n", "0 5.604416\n", "1 1.615857\n", "2 3.639584\n", "3 3.564077\n", "4 8.506366\n" ] } ], "source": [ "# Initialize the FeatureExtractor for heterogeneity\n", "heterogeneity_extractor = FeatureExtractor(\n", " features=[Features.HETEROGENEITY],\n", " window_size=20,\n", " stride=10,\n", " id_column=\"id\",\n", " feature_column=\"value\"\n", ")\n", "\n", "# Extract heterogeneity\n", "heterogeneity_features = heterogeneity_extractor.extract_features(data)\n", "\n", "# Display the extracted heterogeneity feature\n", "print(\"Extracted heterogeneity features:\")\n", "print(heterogeneity_features.head())\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.2" } }, "nbformat": 4, "nbformat_minor": 2 }