interpreTS package#

Subpackages#

Module contents#

interpreTS: Python library designed for extracting meaningful and interpretable features from time series data to support the creation of interpretable and explainable predictive models.

Available imports:
FeatureExtractor (from interpreTS.core.feature_extractor):

A class responsible for extracting specified features from time series data.

Features (from interpreTS.utils.feature_loader):

An enumeration or collection that defines available feature types for extraction.

FeatureLoader (from interpreTS.utils.feature_loader):

A utility class for loading and managing feature definitions.

validate_time_series_data (from interpreTS.utils.data_validation):

A function to ensure that input time series data meets the required format and standards for processing.

generate_feature_descriptions (from interpreTS.utils.data_manager):

A function that generates human-readable descriptions for extracted features, aiding interpretability.

Dependencies:
  • pandas: 2.2.3

  • numpy: None

  • statsmodels: 0.14.4

  • langchain_community: 0.3.14

  • langchain: 0.3.14

  • openai: 1.59.4

  • streamlit: 1.41.1

  • scikit-learn: None

  • joblib: 1.4.2

  • tqdm: 4.67.1

  • dask: 2024.12.1

  • scipy: 1.15.0

  • pillow: 11.1.0

Authors:
  • Sławomir Put,

  • Martyna Żur,

  • Weronika Wołowczyk

  • Jarosław Strzelczyk,

  • Piotr Krupiński

  • Martyna Kramarz

  • Łukasz Wróbel

class interpreTS.FeatureExtractor(features=None, feature_params=None, window_size=nan, stride=1, id_column=None, sort_column=None, feature_column=None, group_by=None)[source]#

Bases: object

CAN_USE_NAN = ['missing_points', 'peak', 'spikeness', 'trough', 'seasonality_strength']#
DEFAULT_FEATURES_BIG = ['absolute_energy', 'binarize_mean', 'change_in_variance', 'crossing_points', 'distance_to_last_trend_change', 'dominant', 'entropy', 'flat_spots', 'heterogeneity', 'linearity', 'length', 'mean', 'missing_points', 'peak', 'significant_changes', 'spikeness', 'stability', 'std_1st_der', 'trough', 'variance', 'mean_change', 'seasonality_strength', 'trend_strength', 'change_in_variance']#
DEFAULT_FEATURES_SMALL = ['length', 'mean', 'variance', 'stability', 'entropy', 'spikeness', 'seasonality_strength']#
FEATURES_ALL = ['above_9th_decile', 'below_1st_decile', 'absolute_energy', 'binarize_mean', 'change_in_variance', 'crossing_points', 'distance_to_last_trend_change', 'dominant', 'entropy', 'flat_spots', 'heterogeneity', 'linearity', 'length', 'mean', 'missing_points', 'outliers_iqr', 'outliers_std', 'peak', 'significant_changes', 'spikeness', 'stability', 'std_1st_der', 'trough', 'variance', 'mean_change', 'seasonality_strength', 'trend_strength', 'variability_in_sub_periods', 'change_in_variance']#
FOR_ML = ['absolute_energy', 'binarize_mean', 'dominant', 'entropy', 'flat_spots', 'heterogeneity', 'linearity', 'length', 'mean', 'missing_points', 'peak', 'significant_changes', 'spikeness', 'stability', 'std_1st_der', 'trough', 'variance', 'seasonality_strength', 'trend_strength']#
add_custom_feature(name, function, metadata=None, params=None)[source]#

Add a custom feature to the FeatureExtractor with optional parameters.

Parameters:
  • name (str) – The name of the custom feature.

  • function (callable) – A function that computes the feature. It should accept a Pandas Series and optional parameters as input.

  • metadata (dict, optional) – A dictionary containing metadata about the feature, such as its interpretability level and description. - level (str): Interpretability level (‘easy’, ‘moderate’, ‘advanced’). - description (str): Description of the feature.

  • params (dict, optional) –

    A dictionary of parameters to be passed to the feature function when it is executed. Example: {

    ’level’: ‘easy’ | ‘moderate’ | ‘advanced’, ‘description’: ‘Description of the feature.’

    }

Raises:

ValueError – If the feature name already exists or the function is not callable.

extract_features(data, progress_callback=None, mode='sequential', n_jobs=-1)[source]#

Extract features from a time series dataset.

Parameters:
  • data (pd.DataFrame or pd.Series) – The time series data for which features are to be extracted.

  • progress_callback (function, optional) – A function to report progress, which takes a single argument: progress percentage (0-100).

  • mode (str, optional) – The mode of processing. Can be ‘parallel’ for multi-threaded processing or ‘sequential’ for single-threaded processing with real-time progress reporting.

  • n_jobs (int, optional) – The number of jobs (processes) to run in parallel. Default is -1 (use all available CPUs).

Returns:

A DataFrame containing calculated features for each window.

Return type:

pd.DataFrame

extract_features_stream(data_stream, progress_callback=None)[source]#

Extract features from a stream of time series data.

Parameters:
  • data_stream (iterable) – An iterable that yields incoming data points as dictionaries with keys corresponding to column names.

  • progress_callback (function, optional) – A function to report progress, which takes a single argument: the total number of processed points.

Yields:

dict – A dictionary containing the calculated features for the current window.

group_data(data)[source]#

Group data based on the group_by column.

Parameters:

data (pd.DataFrame) – Input data.

Returns:

Grouped data.

Return type:

iterable

group_features_by_interpretability()[source]#

Group features by their interpretability levels.

Returns:

A dictionary where keys are interpretability levels (‘easy’, ‘moderate’, ‘advanced’), and values are lists of feature names.

Return type:

dict

head(features_df, n=5)[source]#

Returns the first n rows of the resulting DataFrame from the extract_features function.

Parameters:
  • features_df (pd.DataFrame) – The resulting DataFrame from the extract_features function.

  • n (int, optional (default 5)) – The number of rows to return. If n is negative, returns all rows except the last |n| rows.

Returns:

The first n rows of the DataFrame.

Return type:

pd.DataFrame

validate_data_frequency(grouped_data)[source]#

Validate that data has a consistent and defined frequency if window_size or stride are time-based.

Parameters:

data (pd.DataFrame) – The time series data to validate.

Raises:

ValueError – If data frequency is not defined or inconsistent.

class interpreTS.FeatureLoader[source]#

Bases: object

static available_features()[source]#

Returns a list of all available features.

Returns:

List of feature names.

Return type:

list

generate_feature_options()[source]#

Generate a dictionary mapping human-readable feature names to their corresponding constants.

Returns:

A dictionary where keys are human-readable feature names (capitalized) and values are feature constants.

Return type:

dict

class interpreTS.Features[source]#

Bases: object

ABOVE_9TH_DECILE = 'above_9th_decile'#
ABSOLUTE_ENERGY = 'absolute_energy'#
BELOW_1ST_DECILE = 'below_1st_decile'#
BINARIZE_MEAN = 'binarize_mean'#
CHANGE_IN_VARIANCE = 'change_in_variance'#
CROSSING_POINTS = 'crossing_points'#
DISTANCE_TO_LAST_TREND_CHANGE = 'distance_to_last_trend_change'#
DOMINANT = 'dominant'#
ENTROPY = 'entropy'#
FLAT_SPOTS = 'flat_spots'#
HETEROGENEITY = 'heterogeneity'#
LENGTH = 'length'#
LINEARITY = 'linearity'#
MEAN = 'mean'#
MEAN_CHANGE = 'mean_change'#
MISSING_POINTS = 'missing_points'#
OUTLIERS_IQR = 'outliers_iqr'#
OUTLIERS_STD = 'outliers_std'#
PEAK = 'peak'#
SEASONALITY_STRENGTH = 'seasonality_strength'#
SIGNIFICANT_CHANGES = 'significant_changes'#
SPIKENESS = 'spikeness'#
STABILITY = 'stability'#
STD_1ST_DER = 'std_1st_der'#
TREND_STRENGTH = 'trend_strength'#
TROUGH = 'trough'#
VARIABILITY_IN_SUB_PERIODS = 'variability_in_sub_periods'#
VARIANCE = 'variance'#
interpreTS.generate_feature_descriptions(self, extracted_features)[source]#

Generate textual descriptions for extracted features.

Parameters:

extracted_features (dict) – A dictionary where keys are feature names and values are their calculated values.

Returns:

A dictionary where keys are feature names and values are textual descriptions.

Return type:

dict

interpreTS.validate_time_series_data(data, feature_name=None, validation_requirements=None, **kwargs)[source]#

Validate the input time series data against dynamically provided requirements.

Parameters:
  • data (pd.Series, pd.DataFrame, or np.ndarray) – The time series data to be validated.

  • feature_name (str, optional) – The name of the feature to validate.

  • validation_requirements (dict, optional) – A dictionary specifying the validation requirements for each feature.

  • **kwargs (dict) – Additional validation parameters (overrides validation_requirements).

Returns:

True if the data is valid; raises an error otherwise.

Return type:

bool

Raises:
  • TypeError – If data is not a pd.Series, pd.DataFrame, or np.ndarray.

  • ValueError – If any validation requirement is not met.