interpreTS package#
Subpackages#
- interpreTS.core package
- Subpackages
- interpreTS.core.features package
- Submodules
- interpreTS.core.features.feature_length module
- interpreTS.core.features.feature_mean module
- interpreTS.core.features.feature_peak module
- interpreTS.core.features.feature_spikeness module
- interpreTS.core.features.feature_std_1st_der module
- interpreTS.core.features.feature_trough module
- interpreTS.core.features.feature_variance module
- interpreTS.core.features.histogram_dominant module
- interpreTS.core.features.seasonality_strength module
- interpreTS.core.features.trend_strength module
- Module contents
- interpreTS.core.features package
- Submodules
- interpreTS.core.feature_extractor module
FeatureExtractor
FeatureExtractor.CAN_USE_NAN
FeatureExtractor.DEFAULT_FEATURES_BIG
FeatureExtractor.DEFAULT_FEATURES_SMALL
FeatureExtractor.FEATURES_ALL
FeatureExtractor.FOR_ML
FeatureExtractor.add_custom_feature()
FeatureExtractor.extract_features()
FeatureExtractor.extract_features_stream()
FeatureExtractor.group_data()
FeatureExtractor.group_features_by_interpretability()
FeatureExtractor.head()
FeatureExtractor.validate_data_frequency()
- interpreTS.core.time_series_data module
- Module contents
- Subpackages
- interpreTS.utils package
Module contents#
interpreTS: Python library designed for extracting meaningful and interpretable features from time series data to support the creation of interpretable and explainable predictive models.
- Available imports:
- FeatureExtractor (from interpreTS.core.feature_extractor):
A class responsible for extracting specified features from time series data.
- Features (from interpreTS.utils.feature_loader):
An enumeration or collection that defines available feature types for extraction.
- FeatureLoader (from interpreTS.utils.feature_loader):
A utility class for loading and managing feature definitions.
- validate_time_series_data (from interpreTS.utils.data_validation):
A function to ensure that input time series data meets the required format and standards for processing.
- generate_feature_descriptions (from interpreTS.utils.data_manager):
A function that generates human-readable descriptions for extracted features, aiding interpretability.
- Dependencies:
pandas: 2.2.3
numpy: None
statsmodels: 0.14.4
langchain_community: 0.3.14
langchain: 0.3.14
openai: 1.59.4
streamlit: 1.41.1
scikit-learn: None
joblib: 1.4.2
tqdm: 4.67.1
dask: 2024.12.1
scipy: 1.15.0
pillow: 11.1.0
- Authors:
Sławomir Put,
Martyna Żur,
Weronika Wołowczyk
Jarosław Strzelczyk,
Piotr Krupiński
Martyna Kramarz
Łukasz Wróbel
- class interpreTS.FeatureExtractor(features=None, feature_params=None, window_size=nan, stride=1, id_column=None, sort_column=None, feature_column=None, group_by=None)[source]#
Bases:
object
- CAN_USE_NAN = ['missing_points', 'peak', 'spikeness', 'trough', 'seasonality_strength']#
- DEFAULT_FEATURES_BIG = ['absolute_energy', 'binarize_mean', 'change_in_variance', 'crossing_points', 'distance_to_last_trend_change', 'dominant', 'entropy', 'flat_spots', 'heterogeneity', 'linearity', 'length', 'mean', 'missing_points', 'peak', 'significant_changes', 'spikeness', 'stability', 'std_1st_der', 'trough', 'variance', 'mean_change', 'seasonality_strength', 'trend_strength', 'change_in_variance']#
- DEFAULT_FEATURES_SMALL = ['length', 'mean', 'variance', 'stability', 'entropy', 'spikeness', 'seasonality_strength']#
- FEATURES_ALL = ['above_9th_decile', 'below_1st_decile', 'absolute_energy', 'binarize_mean', 'change_in_variance', 'crossing_points', 'distance_to_last_trend_change', 'dominant', 'entropy', 'flat_spots', 'heterogeneity', 'linearity', 'length', 'mean', 'missing_points', 'outliers_iqr', 'outliers_std', 'peak', 'significant_changes', 'spikeness', 'stability', 'std_1st_der', 'trough', 'variance', 'mean_change', 'seasonality_strength', 'trend_strength', 'variability_in_sub_periods', 'change_in_variance']#
- FOR_ML = ['absolute_energy', 'binarize_mean', 'dominant', 'entropy', 'flat_spots', 'heterogeneity', 'linearity', 'length', 'mean', 'missing_points', 'peak', 'significant_changes', 'spikeness', 'stability', 'std_1st_der', 'trough', 'variance', 'seasonality_strength', 'trend_strength']#
- add_custom_feature(name, function, metadata=None, params=None)[source]#
Add a custom feature to the FeatureExtractor with optional parameters.
- Parameters:
name (str) – The name of the custom feature.
function (callable) – A function that computes the feature. It should accept a Pandas Series and optional parameters as input.
metadata (dict, optional) – A dictionary containing metadata about the feature, such as its interpretability level and description. - level (str): Interpretability level (‘easy’, ‘moderate’, ‘advanced’). - description (str): Description of the feature.
params (dict, optional) –
A dictionary of parameters to be passed to the feature function when it is executed. Example: {
’level’: ‘easy’ | ‘moderate’ | ‘advanced’, ‘description’: ‘Description of the feature.’
}
- Raises:
ValueError – If the feature name already exists or the function is not callable.
- extract_features(data, progress_callback=None, mode='sequential', n_jobs=-1)[source]#
Extract features from a time series dataset.
- Parameters:
data (pd.DataFrame or pd.Series) – The time series data for which features are to be extracted.
progress_callback (function, optional) – A function to report progress, which takes a single argument: progress percentage (0-100).
mode (str, optional) – The mode of processing. Can be ‘parallel’ for multi-threaded processing or ‘sequential’ for single-threaded processing with real-time progress reporting.
n_jobs (int, optional) – The number of jobs (processes) to run in parallel. Default is -1 (use all available CPUs).
- Returns:
A DataFrame containing calculated features for each window.
- Return type:
pd.DataFrame
- extract_features_stream(data_stream, progress_callback=None)[source]#
Extract features from a stream of time series data.
- Parameters:
data_stream (iterable) – An iterable that yields incoming data points as dictionaries with keys corresponding to column names.
progress_callback (function, optional) – A function to report progress, which takes a single argument: the total number of processed points.
- Yields:
dict – A dictionary containing the calculated features for the current window.
- group_data(data)[source]#
Group data based on the group_by column.
- Parameters:
data (pd.DataFrame) – Input data.
- Returns:
Grouped data.
- Return type:
iterable
- group_features_by_interpretability()[source]#
Group features by their interpretability levels.
- Returns:
A dictionary where keys are interpretability levels (‘easy’, ‘moderate’, ‘advanced’), and values are lists of feature names.
- Return type:
dict
- head(features_df, n=5)[source]#
Returns the first n rows of the resulting DataFrame from the extract_features function.
- Parameters:
features_df (pd.DataFrame) – The resulting DataFrame from the extract_features function.
n (int, optional (default 5)) – The number of rows to return. If n is negative, returns all rows except the last |n| rows.
- Returns:
The first n rows of the DataFrame.
- Return type:
pd.DataFrame
- class interpreTS.FeatureLoader[source]#
Bases:
object
- class interpreTS.Features[source]#
Bases:
object
- ABOVE_9TH_DECILE = 'above_9th_decile'#
- ABSOLUTE_ENERGY = 'absolute_energy'#
- BELOW_1ST_DECILE = 'below_1st_decile'#
- BINARIZE_MEAN = 'binarize_mean'#
- CHANGE_IN_VARIANCE = 'change_in_variance'#
- CROSSING_POINTS = 'crossing_points'#
- DISTANCE_TO_LAST_TREND_CHANGE = 'distance_to_last_trend_change'#
- DOMINANT = 'dominant'#
- ENTROPY = 'entropy'#
- FLAT_SPOTS = 'flat_spots'#
- HETEROGENEITY = 'heterogeneity'#
- LENGTH = 'length'#
- LINEARITY = 'linearity'#
- MEAN = 'mean'#
- MEAN_CHANGE = 'mean_change'#
- MISSING_POINTS = 'missing_points'#
- OUTLIERS_IQR = 'outliers_iqr'#
- OUTLIERS_STD = 'outliers_std'#
- PEAK = 'peak'#
- SEASONALITY_STRENGTH = 'seasonality_strength'#
- SIGNIFICANT_CHANGES = 'significant_changes'#
- SPIKENESS = 'spikeness'#
- STABILITY = 'stability'#
- STD_1ST_DER = 'std_1st_der'#
- TREND_STRENGTH = 'trend_strength'#
- TROUGH = 'trough'#
- VARIABILITY_IN_SUB_PERIODS = 'variability_in_sub_periods'#
- VARIANCE = 'variance'#
- interpreTS.generate_feature_descriptions(self, extracted_features)[source]#
Generate textual descriptions for extracted features.
- Parameters:
extracted_features (dict) – A dictionary where keys are feature names and values are their calculated values.
- Returns:
A dictionary where keys are feature names and values are textual descriptions.
- Return type:
dict
- interpreTS.validate_time_series_data(data, feature_name=None, validation_requirements=None, **kwargs)[source]#
Validate the input time series data against dynamically provided requirements.
- Parameters:
data (pd.Series, pd.DataFrame, or np.ndarray) – The time series data to be validated.
feature_name (str, optional) – The name of the feature to validate.
validation_requirements (dict, optional) – A dictionary specifying the validation requirements for each feature.
**kwargs (dict) – Additional validation parameters (overrides validation_requirements).
- Returns:
True if the data is valid; raises an error otherwise.
- Return type:
bool
- Raises:
TypeError – If data is not a pd.Series, pd.DataFrame, or np.ndarray.
ValueError – If any validation requirement is not met.