interpreTS.core package#

Subpackages#

Submodules#

interpreTS.core.feature_extractor module#

class interpreTS.core.feature_extractor.FeatureExtractor(features=None, feature_params=None, window_size=nan, stride=1, id_column=None, sort_column=None, feature_column=None, group_by=None)[source]#

Bases: object

CAN_USE_NAN = ['missing_points', 'peak', 'spikeness', 'trough', 'seasonality_strength']#
DEFAULT_FEATURES_BIG = ['absolute_energy', 'binarize_mean', 'change_in_variance', 'crossing_points', 'distance_to_last_trend_change', 'dominant', 'entropy', 'flat_spots', 'heterogeneity', 'linearity', 'length', 'mean', 'missing_points', 'peak', 'significant_changes', 'spikeness', 'stability', 'std_1st_der', 'trough', 'variance', 'mean_change', 'seasonality_strength', 'trend_strength', 'change_in_variance']#
DEFAULT_FEATURES_SMALL = ['length', 'mean', 'variance', 'stability', 'entropy', 'spikeness', 'seasonality_strength']#
FEATURES_ALL = ['above_9th_decile', 'below_1st_decile', 'absolute_energy', 'binarize_mean', 'change_in_variance', 'crossing_points', 'distance_to_last_trend_change', 'dominant', 'entropy', 'flat_spots', 'heterogeneity', 'linearity', 'length', 'mean', 'missing_points', 'outliers_iqr', 'outliers_std', 'peak', 'significant_changes', 'spikeness', 'stability', 'std_1st_der', 'trough', 'variance', 'mean_change', 'seasonality_strength', 'trend_strength', 'variability_in_sub_periods', 'change_in_variance']#
FOR_ML = ['absolute_energy', 'binarize_mean', 'dominant', 'entropy', 'flat_spots', 'heterogeneity', 'linearity', 'length', 'mean', 'missing_points', 'peak', 'significant_changes', 'spikeness', 'stability', 'std_1st_der', 'trough', 'variance', 'seasonality_strength', 'trend_strength']#
add_custom_feature(name, function, metadata=None, params=None)[source]#

Add a custom feature to the FeatureExtractor with optional parameters.

Parameters:
  • name (str) – The name of the custom feature.

  • function (callable) – A function that computes the feature. It should accept a Pandas Series and optional parameters as input.

  • metadata (dict, optional) – A dictionary containing metadata about the feature, such as its interpretability level and description. - level (str): Interpretability level (‘easy’, ‘moderate’, ‘advanced’). - description (str): Description of the feature.

  • params (dict, optional) –

    A dictionary of parameters to be passed to the feature function when it is executed. Example: {

    ’level’: ‘easy’ | ‘moderate’ | ‘advanced’, ‘description’: ‘Description of the feature.’

    }

Raises:

ValueError – If the feature name already exists or the function is not callable.

extract_features(data, progress_callback=None, mode='sequential', n_jobs=-1)[source]#

Extract features from a time series dataset.

Parameters:
  • data (pd.DataFrame or pd.Series) – The time series data for which features are to be extracted.

  • progress_callback (function, optional) – A function to report progress, which takes a single argument: progress percentage (0-100).

  • mode (str, optional) – The mode of processing. Can be ‘parallel’ for multi-threaded processing or ‘sequential’ for single-threaded processing with real-time progress reporting.

  • n_jobs (int, optional) – The number of jobs (processes) to run in parallel. Default is -1 (use all available CPUs).

Returns:

A DataFrame containing calculated features for each window.

Return type:

pd.DataFrame

extract_features_stream(data_stream, progress_callback=None)[source]#

Extract features from a stream of time series data.

Parameters:
  • data_stream (iterable) – An iterable that yields incoming data points as dictionaries with keys corresponding to column names.

  • progress_callback (function, optional) – A function to report progress, which takes a single argument: the total number of processed points.

Yields:

dict – A dictionary containing the calculated features for the current window.

group_data(data)[source]#

Group data based on the group_by column.

Parameters:

data (pd.DataFrame) – Input data.

Returns:

Grouped data.

Return type:

iterable

group_features_by_interpretability()[source]#

Group features by their interpretability levels.

Returns:

A dictionary where keys are interpretability levels (‘easy’, ‘moderate’, ‘advanced’), and values are lists of feature names.

Return type:

dict

head(features_df, n=5)[source]#

Returns the first n rows of the resulting DataFrame from the extract_features function.

Parameters:
  • features_df (pd.DataFrame) – The resulting DataFrame from the extract_features function.

  • n (int, optional (default 5)) – The number of rows to return. If n is negative, returns all rows except the last |n| rows.

Returns:

The first n rows of the DataFrame.

Return type:

pd.DataFrame

validate_data_frequency(grouped_data)[source]#

Validate that data has a consistent and defined frequency if window_size or stride are time-based.

Parameters:

data (pd.DataFrame) – The time series data to validate.

Raises:

ValueError – If data frequency is not defined or inconsistent.

interpreTS.core.time_series_data module#

class interpreTS.core.time_series_data.TimeSeriesData(data)[source]#

Bases: object

A class to manage and process time series data.

resample(interval)[source]#

Resample the time series data to a specified interval.

Parameters:

interval (str) – The interval to resample the data, e.g., ‘D’ for daily, ‘H’ for hourly.

Returns:

A new TimeSeriesData object with resampled data.

Return type:

TimeSeriesData

Examples

>>> data = pd.Series([1, 2, 3, 4, 5], index=pd.date_range("2023-01-01", periods=5, freq="D"))
>>> ts_data = TimeSeriesData(data)
>>> resampled_data = ts_data.resample("2D")
split(train_size=0.7)[source]#

Split the time series data into training and test sets.

Parameters:

train_size (float, optional) – The proportion of the data to use for training, by default 0.7.

Returns:

A tuple containing the training and test sets as TimeSeriesData objects.

Return type:

tuple of TimeSeriesData

Examples

>>> data = pd.Series([1, 2, 3, 4, 5])
>>> ts_data = TimeSeriesData(data)
>>> train, test = ts_data.split(0.6)

Module contents#