interpreTS package#

Subpackages#

Module contents#

interpreTS: Python library designed for extracting meaningful and interpretable features from time series data to support the creation of interpretable and explainable predictive models.

Available imports:

FeatureExtractor (from interpreTS.core.feature_extractor):: A class responsible for extracting specified features from time series data.
Features (from interpreTS.utils.feature_loader):: An enumeration or collection that defines available feature types for extraction.
FeatureLoader (from interpreTS.utils.feature_loader):: A utility class for loading and managing feature definitions.
validate_time_series_data (from interpreTS.utils.data_validation):: A function to ensure that input time series data meets the required format and standards for processing.
generate_feature_descriptions (from interpreTS.utils.data_manager):: A function that generates human-readable descriptions for extracted features, aiding interpretability.

Dependencies:

pandas: 2.2.3
numpy: None
statsmodels: 0.14.4
langchain_community: 0.3.14
langchain: 0.3.14
openai: 1.59.4
streamlit: 1.41.1
scikit-learn: None
joblib: 1.4.2
tqdm: 4.67.1
dask: 2024.12.1
scipy: 1.15.0
pillow: 11.1.0

Authors:

Sławomir Put,
Martyna Żur,
Weronika Wołowczyk
Jarosław Strzelczyk,
Piotr Krupiński
Martyna Kramarz
Łukasz Wróbel

class interpreTS.FeatureExtractor(features=None, feature_params=None, window_size=nan, stride=1, id_column=None, sort_column=None, feature_column=None, group_by=None)[source]#

Bases: object

CAN_USE_NAN = ['missing_points', 'peak', 'spikeness', 'trough', 'seasonality_strength']#

DEFAULT_FEATURES_BIG = ['absolute_energy', 'binarize_mean', 'change_in_variance', 'crossing_points', 'distance_to_last_trend_change', 'dominant', 'entropy', 'flat_spots', 'heterogeneity', 'linearity', 'length', 'mean', 'missing_points', 'peak', 'significant_changes', 'spikeness', 'stability', 'std_1st_der', 'trough', 'variance', 'mean_change', 'seasonality_strength', 'trend_strength', 'change_in_variance']#

DEFAULT_FEATURES_SMALL = ['length', 'mean', 'variance', 'stability', 'entropy', 'spikeness', 'seasonality_strength']#

FEATURES_ALL = ['above_9th_decile', 'below_1st_decile', 'absolute_energy', 'binarize_mean', 'change_in_variance', 'crossing_points', 'distance_to_last_trend_change', 'dominant', 'entropy', 'flat_spots', 'heterogeneity', 'linearity', 'length', 'mean', 'missing_points', 'outliers_iqr', 'outliers_std', 'peak', 'significant_changes', 'spikeness', 'stability', 'std_1st_der', 'trough', 'variance', 'mean_change', 'seasonality_strength', 'trend_strength', 'variability_in_sub_periods', 'change_in_variance']#

FOR_ML = ['absolute_energy', 'binarize_mean', 'dominant', 'entropy', 'flat_spots', 'heterogeneity', 'linearity', 'length', 'mean', 'missing_points', 'peak', 'significant_changes', 'spikeness', 'stability', 'std_1st_der', 'trough', 'variance', 'seasonality_strength', 'trend_strength']#

add_custom_feature(name, function, metadata=None, params=None)[source]#

Add a custom feature to the FeatureExtractor with optional parameters.

Parameters:

name (str) – The name of the custom feature.
function (callable) – A function that computes the feature. It should accept a Pandas Series and optional parameters as input.
metadata (dict, optional) – A dictionary containing metadata about the feature, such as its interpretability level and description. - level (str): Interpretability level (‘easy’, ‘moderate’, ‘advanced’). - description (str): Description of the feature.
params (dict, optional) –
A dictionary of parameters to be passed to the feature function when it is executed. Example: {

’level’: ‘easy’ | ‘moderate’ | ‘advanced’, ‘description’: ‘Description of the feature.’

}

Raises:

ValueError – If the feature name already exists or the function is not callable.

extract_features(data, progress_callback=None, mode='sequential', n_jobs=-1)[source]#

Extract features from a time series dataset.

Parameters:

data (pd.DataFrame or pd.Series) – The time series data for which features are to be extracted.
progress_callback (function, optional) – A function to report progress, which takes a single argument: progress percentage (0-100).
mode (str, optional) – The mode of processing. Can be ‘parallel’ for multi-threaded processing or ‘sequential’ for single-threaded processing with real-time progress reporting.
n_jobs (int, optional) – The number of jobs (processes) to run in parallel. Default is -1 (use all available CPUs).

Returns:

A DataFrame containing calculated features for each window.

Return type:

pd.DataFrame

extract_features_stream(data_stream, progress_callback=None)[source]#

Extract features from a stream of time series data.

Parameters:

data_stream (iterable) – An iterable that yields incoming data points as dictionaries with keys corresponding to column names.
progress_callback (function, optional) – A function to report progress, which takes a single argument: the total number of processed points.

Yields:

dict – A dictionary containing the calculated features for the current window.

group_data(data)[source]#

Group data based on the group_by column.

Parameters:: data (pd.DataFrame) – Input data.
Returns:: Grouped data.
Return type:: iterable

group_features_by_interpretability()[source]#

Group features by their interpretability levels.

Returns:: A dictionary where keys are interpretability levels (‘easy’, ‘moderate’, ‘advanced’), and values are lists of feature names.
Return type:: dict

head(features_df, n=5)[source]#

Returns the first n rows of the resulting DataFrame from the extract_features function.

Parameters:

features_df (pd.DataFrame) – The resulting DataFrame from the extract_features function.
n (int, optional (default 5)) – The number of rows to return. If n is negative, returns all rows except the last |n| rows.

Returns:

The first n rows of the DataFrame.

Return type:

pd.DataFrame

validate_data_frequency(grouped_data)[source]#

Validate that data has a consistent and defined frequency if window_size or stride are time-based.

Parameters:: data (pd.DataFrame) – The time series data to validate.
Raises:: ValueError – If data frequency is not defined or inconsistent.

class interpreTS.FeatureLoader[source]#

Bases: object

static available_features()[source]#

Returns a list of all available features.

Returns:: List of feature names.
Return type:: list

generate_feature_options()[source]#

Generate a dictionary mapping human-readable feature names to their corresponding constants.

Returns:: A dictionary where keys are human-readable feature names (capitalized) and values are feature constants.
Return type:: dict

class interpreTS.Features[source]#

Bases: object

ABOVE_9TH_DECILE = 'above_9th_decile'#

ABSOLUTE_ENERGY = 'absolute_energy'#

BELOW_1ST_DECILE = 'below_1st_decile'#

BINARIZE_MEAN = 'binarize_mean'#

CHANGE_IN_VARIANCE = 'change_in_variance'#

CROSSING_POINTS = 'crossing_points'#

DISTANCE_TO_LAST_TREND_CHANGE = 'distance_to_last_trend_change'#

DOMINANT = 'dominant'#

ENTROPY = 'entropy'#

FLAT_SPOTS = 'flat_spots'#

HETEROGENEITY = 'heterogeneity'#

LENGTH = 'length'#

LINEARITY = 'linearity'#

MEAN = 'mean'#

MEAN_CHANGE = 'mean_change'#

MISSING_POINTS = 'missing_points'#

OUTLIERS_IQR = 'outliers_iqr'#

OUTLIERS_STD = 'outliers_std'#

PEAK = 'peak'#

SEASONALITY_STRENGTH = 'seasonality_strength'#

SIGNIFICANT_CHANGES = 'significant_changes'#

SPIKENESS = 'spikeness'#

STABILITY = 'stability'#

STD_1ST_DER = 'std_1st_der'#

TREND_STRENGTH = 'trend_strength'#

TROUGH = 'trough'#

VARIABILITY_IN_SUB_PERIODS = 'variability_in_sub_periods'#

VARIANCE = 'variance'#

interpreTS.generate_feature_descriptions(self, extracted_features)[source]#

Generate textual descriptions for extracted features.

Parameters:: extracted_features (dict) – A dictionary where keys are feature names and values are their calculated values.
Returns:: A dictionary where keys are feature names and values are textual descriptions.
Return type:: dict

interpreTS.validate_time_series_data(data, feature_name=None, validation_requirements=None, **kwargs)[source]#

Validate the input time series data against dynamically provided requirements.

Parameters:

data (pd.Series, pd.DataFrame, or np.ndarray) – The time series data to be validated.
feature_name (str, optional) – The name of the feature to validate.
validation_requirements (dict, optional) – A dictionary specifying the validation requirements for each feature.
**kwargs (dict) – Additional validation parameters (overrides validation_requirements).

Returns:

True if the data is valid; raises an error otherwise.

Return type:

bool

Raises:

TypeError – If data is not a pd.Series, pd.DataFrame, or np.ndarray.
ValueError – If any validation requirement is not met.

interpreTS package#

Subpackages#

Module contents#

This Page